DNA word analysis based on the distribution of the distances between symmetric words

Tavares, Ana Helena; Pinho, Armando J.; Silva, Raquel M.; Rodrigues, João M. O. S.; Bastos, Carlos A. C.; Ferreira, Paulo J. S. G.; Afreixo, Vera

DNA word analysis based on the distribution of the distances between symmetric words

Detalhes bibliográficos
Autor(a) principal:	Tavares, Ana Helena
Data de Publicação:	2017
Outros Autores:	Pinho, Armando J., Silva, Raquel M., Rodrigues, João M. O. S., Bastos, Carlos A. C., Ferreira, Paulo J. S. G., Afreixo, Vera
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/18855
Resumo:	We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected

Metadados do item

id	RCAP_27f9e98e3284a44d227cf59468b15703
oai_identifier_str	oai:ria.ua.pt:10773/18855
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	DNA word analysis based on the distribution of the distances between symmetric wordsWe address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expectedNature Publishing Group2017-11-16T15:24:07Z2017-01-01T00:00:00Z2017info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/18855eng2045-232210.1038/s41598-017-00646-2Tavares, Ana HelenaPinho, Armando J.Silva, Raquel M.Rodrigues, João M. O. S.Bastos, Carlos A. C.Ferreira, Paulo J. S. G.Afreixo, Verainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:36:11Zoai:ria.ua.pt:10773/18855Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:53:37.428421Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	DNA word analysis based on the distribution of the distances between symmetric words
title	DNA word analysis based on the distribution of the distances between symmetric words
spellingShingle	DNA word analysis based on the distribution of the distances between symmetric words Tavares, Ana Helena
title_short	DNA word analysis based on the distribution of the distances between symmetric words
title_full	DNA word analysis based on the distribution of the distances between symmetric words
title_fullStr	DNA word analysis based on the distribution of the distances between symmetric words
title_full_unstemmed	DNA word analysis based on the distribution of the distances between symmetric words
title_sort	DNA word analysis based on the distribution of the distances between symmetric words
author	Tavares, Ana Helena
author_facet	Tavares, Ana Helena Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera
author_role	author
author2	Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera
author2_role	author author author author author author
dc.contributor.author.fl_str_mv	Tavares, Ana Helena Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera
description	We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected
publishDate	2017
dc.date.none.fl_str_mv	2017-11-16T15:24:07Z 2017-01-01T00:00:00Z 2017
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/18855
url	http://hdl.handle.net/10773/18855
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	2045-2322 10.1038/s41598-017-00646-2
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Nature Publishing Group
publisher.none.fl_str_mv	Nature Publishing Group
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137587879739392

DNA word analysis based on the distribution of the distances between symmetric words

Registros relacionados