Local Renyi entropic profiles of DNA sequences

Detalhes bibliográficos
Autor(a) principal: Vinga, S.
Data de Publicação: 2007
Outros Autores: Almeida, J. S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/24670
Resumo: Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
id RCAP_13d24a3268050bb55e7a0fe7bcacbc28
oai_identifier_str oai:run.unl.pt:10362/24670
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Local Renyi entropic profiles of DNA sequencesPROMOTER SEQUENCESIDENTIFICATIONCHAOS GAME REPRESENTATIONGENOMIC SIGNATUREDISCRETE SEQUENCESUPTAKE SIGNAL SEQUENCESHAEMOPHILUS-INFLUENZAEMODELSBackground: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNVinga, S.Almeida, J. S.2017-10-27T22:01:08Z2007-10-162007-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article19application/pdfhttp://hdl.handle.net/10362/24670eng1471-2105PURE: 422789https://doi.org/10.1186/1471-2105-8-393info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:12:53Zoai:run.unl.pt:10362/24670Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:28:06.578044Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Local Renyi entropic profiles of DNA sequences
title Local Renyi entropic profiles of DNA sequences
spellingShingle Local Renyi entropic profiles of DNA sequences
Vinga, S.
PROMOTER SEQUENCES
IDENTIFICATION
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DISCRETE SEQUENCES
UPTAKE SIGNAL SEQUENCES
HAEMOPHILUS-INFLUENZAE
MODELS
title_short Local Renyi entropic profiles of DNA sequences
title_full Local Renyi entropic profiles of DNA sequences
title_fullStr Local Renyi entropic profiles of DNA sequences
title_full_unstemmed Local Renyi entropic profiles of DNA sequences
title_sort Local Renyi entropic profiles of DNA sequences
author Vinga, S.
author_facet Vinga, S.
Almeida, J. S.
author_role author
author2 Almeida, J. S.
author2_role author
dc.contributor.none.fl_str_mv NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)
RUN
dc.contributor.author.fl_str_mv Vinga, S.
Almeida, J. S.
dc.subject.por.fl_str_mv PROMOTER SEQUENCES
IDENTIFICATION
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DISCRETE SEQUENCES
UPTAKE SIGNAL SEQUENCES
HAEMOPHILUS-INFLUENZAE
MODELS
topic PROMOTER SEQUENCES
IDENTIFICATION
CHAOS GAME REPRESENTATION
GENOMIC SIGNATURE
DISCRETE SEQUENCES
UPTAKE SIGNAL SEQUENCES
HAEMOPHILUS-INFLUENZAE
MODELS
description Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
publishDate 2007
dc.date.none.fl_str_mv 2007-10-16
2007-10-16T00:00:00Z
2017-10-27T22:01:08Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/24670
url http://hdl.handle.net/10362/24670
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1471-2105
PURE: 422789
https://doi.org/10.1186/1471-2105-8-393
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 19
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137907954417664