Local Renyi entropic profiles of DNA sequences
Autor(a) principal: | |
---|---|
Data de Publicação: | 2007 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/24670 |
Resumo: | Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. |
id |
RCAP_13d24a3268050bb55e7a0fe7bcacbc28 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/24670 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Local Renyi entropic profiles of DNA sequencesPROMOTER SEQUENCESIDENTIFICATIONCHAOS GAME REPRESENTATIONGENOMIC SIGNATUREDISCRETE SEQUENCESUPTAKE SIGNAL SEQUENCESHAEMOPHILUS-INFLUENZAEMODELSBackground: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNVinga, S.Almeida, J. S.2017-10-27T22:01:08Z2007-10-162007-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article19application/pdfhttp://hdl.handle.net/10362/24670eng1471-2105PURE: 422789https://doi.org/10.1186/1471-2105-8-393info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:12:53Zoai:run.unl.pt:10362/24670Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:28:06.578044Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Local Renyi entropic profiles of DNA sequences |
title |
Local Renyi entropic profiles of DNA sequences |
spellingShingle |
Local Renyi entropic profiles of DNA sequences Vinga, S. PROMOTER SEQUENCES IDENTIFICATION CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DISCRETE SEQUENCES UPTAKE SIGNAL SEQUENCES HAEMOPHILUS-INFLUENZAE MODELS |
title_short |
Local Renyi entropic profiles of DNA sequences |
title_full |
Local Renyi entropic profiles of DNA sequences |
title_fullStr |
Local Renyi entropic profiles of DNA sequences |
title_full_unstemmed |
Local Renyi entropic profiles of DNA sequences |
title_sort |
Local Renyi entropic profiles of DNA sequences |
author |
Vinga, S. |
author_facet |
Vinga, S. Almeida, J. S. |
author_role |
author |
author2 |
Almeida, J. S. |
author2_role |
author |
dc.contributor.none.fl_str_mv |
NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM) RUN |
dc.contributor.author.fl_str_mv |
Vinga, S. Almeida, J. S. |
dc.subject.por.fl_str_mv |
PROMOTER SEQUENCES IDENTIFICATION CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DISCRETE SEQUENCES UPTAKE SIGNAL SEQUENCES HAEMOPHILUS-INFLUENZAE MODELS |
topic |
PROMOTER SEQUENCES IDENTIFICATION CHAOS GAME REPRESENTATION GENOMIC SIGNATURE DISCRETE SEQUENCES UPTAKE SIGNAL SEQUENCES HAEMOPHILUS-INFLUENZAE MODELS |
description |
Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Renyi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/similar to svinga/ep/. Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. |
publishDate |
2007 |
dc.date.none.fl_str_mv |
2007-10-16 2007-10-16T00:00:00Z 2017-10-27T22:01:08Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/24670 |
url |
http://hdl.handle.net/10362/24670 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1471-2105 PURE: 422789 https://doi.org/10.1186/1471-2105-8-393 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
19 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137907954417664 |