Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances

Detalhes bibliográficos
Autor(a) principal: Freitas, A.
Data de Publicação: 2013
Outros Autores: Afreixo, V., Escudeiro, S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/18642
Resumo: The mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected.
id RCAP_cc83647e5767bd436f9bfe2bb9aa284e
oai_identifier_str oai:ria.ua.pt:10773/18642
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Mixture models of geometric distributions in genomic analysis of inter-nucleotide distancesGenomic analysisInter-nucleotide distancesGeometric distributionsDNAThe mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected.International Academic Press2017-10-26T09:36:04Z2013-01-01T00:00:00Z2013info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/18642eng2310-507010.19139/soic.v1i1.6Freitas, A.Afreixo, V.Escudeiro, S.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:36:03Zoai:ria.ua.pt:10773/18642Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:34.457395Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
title Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
spellingShingle Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
Freitas, A.
Genomic analysis
Inter-nucleotide distances
Geometric distributions
DNA
title_short Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
title_full Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
title_fullStr Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
title_full_unstemmed Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
title_sort Mixture models of geometric distributions in genomic analysis of inter-nucleotide distances
author Freitas, A.
author_facet Freitas, A.
Afreixo, V.
Escudeiro, S.
author_role author
author2 Afreixo, V.
Escudeiro, S.
author2_role author
author
dc.contributor.author.fl_str_mv Freitas, A.
Afreixo, V.
Escudeiro, S.
dc.subject.por.fl_str_mv Genomic analysis
Inter-nucleotide distances
Geometric distributions
DNA
topic Genomic analysis
Inter-nucleotide distances
Geometric distributions
DNA
description The mapping defined by inter-nucleotide distances (InD) provides a reversible numerical representation of the primary structure of DNA. If nucleotides were independently placed along the genome, a finite mixture model of four geometric distributions could be fitted to the InD where the four marginal distributions would be the expected distributions of the four nucleotide types. We analyze a finite mixture model of geometric distributions (f2), with marginals not explicitly addressed to the nucleotide types, as an approximation to the InD. We use BIC in the composite likelihood framework for choosing the number of components of the mixture and the EM algorithm for estimating the model parameters. Based on divergence profiles, an experimental study was carried out on the complete genomes of 45 species to evaluate f2. Although the proposed model is not suited to the InD, our analysis shows that divergence profiles involving the empirical distribution of the InD are also exhibited by profiles involving f2. It suggests that statistical regularities of the InD can be described by the model f2. Some characteristics of the DNA sequences captured by the model f2 are illustrated. In particular, clusterings of subgroups of eukaryotes (primates, mammalians, animals and plants) are detected.
publishDate 2013
dc.date.none.fl_str_mv 2013-01-01T00:00:00Z
2013
2017-10-26T09:36:04Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/18642
url http://hdl.handle.net/10773/18642
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2310-5070
10.19139/soic.v1i1.6
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv International Academic Press
publisher.none.fl_str_mv International Academic Press
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137587145736192