Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
Autor(a) principal: | |
---|---|
Data de Publicação: | 2013 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/18642 |
Resumo: | The mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected. |
id |
RCAP_cc83647e5767bd436f9bfe2bb9aa284e |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/18642 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distancesGenomic analysisInter-nucleotide distancesGeometric distributionsDNAThe mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected.International Academic Press2017-10-26T09:36:04Z2013-01-01T00:00:00Z2013info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/18642eng2310-507010.19139/soic.v1i1.6Freitas, A.Afreixo, V.Escudeiro, S.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:36:03Zoai:ria.ua.pt:10773/18642Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:34.457395Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
title |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
spellingShingle |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances Freitas, A. Genomic analysis Inter-nucleotide distances Geometric distributions DNA |
title_short |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
title_full |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
title_fullStr |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
title_full_unstemmed |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
title_sort |
Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances |
author |
Freitas, A. |
author_facet |
Freitas, A. Afreixo, V. Escudeiro, S. |
author_role |
author |
author2 |
Afreixo, V. Escudeiro, S. |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Freitas, A. Afreixo, V. Escudeiro, S. |
dc.subject.por.fl_str_mv |
Genomic analysis Inter-nucleotide distances Geometric distributions DNA |
topic |
Genomic analysis Inter-nucleotide distances Geometric distributions DNA |
description |
The mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-01-01T00:00:00Z 2013 2017-10-26T09:36:04Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/18642 |
url |
http://hdl.handle.net/10773/18642 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2310-5070 10.19139/soic.v1i1.6 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
International Academic Press |
publisher.none.fl_str_mv |
International Academic Press |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137587145736192 |