Computing distribution of scale independent motifs in biological sequences
Autor(a) principal: | |
---|---|
Data de Publicação: | 2006 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/24875 |
Resumo: | The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. |
id |
RCAP_2a5cb886eb09d8543648429f1e9499aa |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/24875 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Computing distribution of scale independent motifs in biological sequencesPROMOTER SEQUENCESCHAOS GAME REPRESENTATIONGENOMIC SIGNATUREDNA-SEQUENCESDISCRETE SEQUENCESThe use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNAlmeida, Joana S.Vinga, Susana2017-11-02T23:00:18Z2006-10-182006-10-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article11application/pdfhttp://hdl.handle.net/10362/24875eng1748-7188PURE: 116444https://doi.org/10.1186/1748-7188-1-18info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:12:59Zoai:run.unl.pt:10362/24875Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:28:09.178813Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Computing distribution of scale independent motifs in biological sequences |
title |
Computing distribution of scale independent motifs in biological sequences |
spellingShingle |
Computing distribution of scale independent motifs in biological sequences Almeida, Joana S. PROMOTER SEQUENCES CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DNA-SEQUENCES DISCRETE SEQUENCES |
title_short |
Computing distribution of scale independent motifs in biological sequences |
title_full |
Computing distribution of scale independent motifs in biological sequences |
title_fullStr |
Computing distribution of scale independent motifs in biological sequences |
title_full_unstemmed |
Computing distribution of scale independent motifs in biological sequences |
title_sort |
Computing distribution of scale independent motifs in biological sequences |
author |
Almeida, Joana S. |
author_facet |
Almeida, Joana S. Vinga, Susana |
author_role |
author |
author2 |
Vinga, Susana |
author2_role |
author |
dc.contributor.none.fl_str_mv |
NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM) RUN |
dc.contributor.author.fl_str_mv |
Almeida, Joana S. Vinga, Susana |
dc.subject.por.fl_str_mv |
PROMOTER SEQUENCES CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DNA-SEQUENCES DISCRETE SEQUENCES |
topic |
PROMOTER SEQUENCES CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DNA-SEQUENCES DISCRETE SEQUENCES |
description |
The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. |
publishDate |
2006 |
dc.date.none.fl_str_mv |
2006-10-18 2006-10-18T00:00:00Z 2017-11-02T23:00:18Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/24875 |
url |
http://hdl.handle.net/10362/24875 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1748-7188 PURE: 116444 https://doi.org/10.1186/1748-7188-1-18 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
11 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137908005797888 |