DNA word analysis based on the distribution of the distances between symmetric words

Detalhes bibliográficos
Autor(a) principal: Tavares, Ana Helena
Data de Publicação: 2017
Outros Autores: Pinho, Armando J., Silva, Raquel M., Rodrigues, João M. O. S., Bastos, Carlos A. C., Ferreira, Paulo J. S. G., Afreixo, Vera
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/18855
Resumo: We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected
id RCAP_27f9e98e3284a44d227cf59468b15703
oai_identifier_str oai:ria.ua.pt:10773/18855
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling DNA word analysis based on the distribution of the distances between symmetric wordsWe address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expectedNature Publishing Group2017-11-16T15:24:07Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/18855eng2045-232210.1038/s41598-017-00646-2Tavares, Ana HelenaPinho, Armando J.Silva, Raquel M.Rodrigues, João M. O. S.Bastos, Carlos A. C.Ferreira, Paulo J. S. G.Afreixo, Verainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:36:11Zoai:ria.ua.pt:10773/18855Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:37.428421Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv DNA word analysis based on the distribution of the distances between symmetric words
title DNA word analysis based on the distribution of the distances between symmetric words
spellingShingle DNA word analysis based on the distribution of the distances between symmetric words
Tavares, Ana Helena
title_short DNA word analysis based on the distribution of the distances between symmetric words
title_full DNA word analysis based on the distribution of the distances between symmetric words
title_fullStr DNA word analysis based on the distribution of the distances between symmetric words
title_full_unstemmed DNA word analysis based on the distribution of the distances between symmetric words
title_sort DNA word analysis based on the distribution of the distances between symmetric words
author Tavares, Ana Helena
author_facet Tavares, Ana Helena
Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
author_role author
author2 Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Tavares, Ana Helena
Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
description We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected
publishDate 2017
dc.date.none.fl_str_mv 2017-11-16T15:24:07Z
2017-01-01T00:00:00Z
2017
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/18855
url http://hdl.handle.net/10773/18855
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2045-2322
10.1038/s41598-017-00646-2
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Nature Publishing Group
publisher.none.fl_str_mv Nature Publishing Group
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137587879739392