DNA word analysis based on the distribution of the distances between symmetric words
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/18855 |
Resumo: | We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected |
id |
RCAP_27f9e98e3284a44d227cf59468b15703 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/18855 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
DNA word analysis based on the distribution of the distances between symmetric wordsWe address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expectedNature Publishing Group2017-11-16T15:24:07Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/18855eng2045-232210.1038/s41598-017-00646-2Tavares, Ana HelenaPinho, Armando J.Silva, Raquel M.Rodrigues, João M. O. S.Bastos, Carlos A. C.Ferreira, Paulo J. S. G.Afreixo, Verainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:36:11Zoai:ria.ua.pt:10773/18855Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:37.428421Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
DNA word analysis based on the distribution of the distances between symmetric words |
title |
DNA word analysis based on the distribution of the distances between symmetric words |
spellingShingle |
DNA word analysis based on the distribution of the distances between symmetric words Tavares, Ana Helena |
title_short |
DNA word analysis based on the distribution of the distances between symmetric words |
title_full |
DNA word analysis based on the distribution of the distances between symmetric words |
title_fullStr |
DNA word analysis based on the distribution of the distances between symmetric words |
title_full_unstemmed |
DNA word analysis based on the distribution of the distances between symmetric words |
title_sort |
DNA word analysis based on the distribution of the distances between symmetric words |
author |
Tavares, Ana Helena |
author_facet |
Tavares, Ana Helena Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera |
author_role |
author |
author2 |
Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Tavares, Ana Helena Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera |
description |
We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-11-16T15:24:07Z 2017-01-01T00:00:00Z 2017 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/18855 |
url |
http://hdl.handle.net/10773/18855 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2045-2322 10.1038/s41598-017-00646-2 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Nature Publishing Group |
publisher.none.fl_str_mv |
Nature Publishing Group |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137587879739392 |