Computing distribution of scale independent motifs in biological sequences

Detalhes bibliográficos
Autor(a) principal: Almeida, Joana S.
Data de Publicação: 2006
Outros Autores: Vinga, Susana
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/24875
Resumo: The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.
id RCAP_2a5cb886eb09d8543648429f1e9499aa
oai_identifier_str oai:run.unl.pt:10362/24875
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Computing distribution of scale independent motifs in biological sequencesPROMOTER SEQUENCESCHAOS GAME REPRESENTATIONGENOMIC SIGNATUREDNA-SEQUENCESDISCRETE SEQUENCESThe use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNAlmeida, Joana S.Vinga, Susana2017-11-02T23:00:18Z2006-10-182006-10-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article11application/pdfhttp://hdl.handle.net/10362/24875eng1748-7188PURE: 116444https://doi.org/10.1186/1748-7188-1-18info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:12:59Zoai:run.unl.pt:10362/24875Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:28:09.178813Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Computing distribution of scale independent motifs in biological sequences
title Computing distribution of scale independent motifs in biological sequences
spellingShingle Computing distribution of scale independent motifs in biological sequences
Almeida, Joana S.
PROMOTER SEQUENCES
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DNA-SEQUENCES
DISCRETE SEQUENCES
title_short Computing distribution of scale independent motifs in biological sequences
title_full Computing distribution of scale independent motifs in biological sequences
title_fullStr Computing distribution of scale independent motifs in biological sequences
title_full_unstemmed Computing distribution of scale independent motifs in biological sequences
title_sort Computing distribution of scale independent motifs in biological sequences
author Almeida, Joana S.
author_facet Almeida, Joana S.
Vinga, Susana
author_role author
author2 Vinga, Susana
author2_role author
dc.contributor.none.fl_str_mv NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)
RUN
dc.contributor.author.fl_str_mv Almeida, Joana S.
Vinga, Susana
dc.subject.por.fl_str_mv PROMOTER SEQUENCES
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DNA-SEQUENCES
DISCRETE SEQUENCES
topic PROMOTER SEQUENCES
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DNA-SEQUENCES
DISCRETE SEQUENCES
description The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.
publishDate 2006
dc.date.none.fl_str_mv 2006-10-18
2006-10-18T00:00:00Z
2017-11-02T23:00:18Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/24875
url http://hdl.handle.net/10362/24875
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1748-7188
PURE: 116444
https://doi.org/10.1186/1748-7188-1-18
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 11
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137908005797888