An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance

Detalhes bibliográficos
Autor(a) principal: Casimiro, Ana
Data de Publicação: 2008
Outros Autores: Vinga, Susana, Freitas,  Ana T, Oliveira, Arlindo L.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/23352
Resumo: BACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.
id RCAP_95c73b63f63781fc9c59daf05dc1a415
oai_identifier_str oai:run.unl.pt:10362/23352
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevanceIDENTIFICATIONSTRUCTURED MOTIFSELEMENTSGENE-EXPRESSIONTATA BOXSEQUENCESDATABASEACTIVATIONDISCOVERYSACCHAROMYCES-CEREVISIAESACCHAROMYCES-CEREVISIAESTRUCTURED MOTIFSGENE-EXPRESSIONTATA BOXSEQUENCESIDENTIFICATIONDATABASEACTIVATIONDISCOVERYELEMENTSBACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNCasimiro, AnaVinga, SusanaFreitas,  Ana TOliveira, Arlindo L.2017-09-18T22:01:11Z2008-02-072008-02-07T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article13application/pdfhttp://hdl.handle.net/10362/23352eng1471-2105PURE: 423024https://doi.org/10.1186/1471-2105-9-89info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:11:36Zoai:run.unl.pt:10362/23352Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:27:46.081415Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
title An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
spellingShingle An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
Casimiro, Ana
IDENTIFICATION
STRUCTURED MOTIFS
ELEMENTS
GENE-EXPRESSION
TATA BOX
SEQUENCES
DATABASE
ACTIVATION
DISCOVERY
SACCHAROMYCES-CEREVISIAE
SACCHAROMYCES-CEREVISIAE
STRUCTURED MOTIFS
GENE-EXPRESSION
TATA BOX
SEQUENCES
IDENTIFICATION
DATABASE
ACTIVATION
DISCOVERY
ELEMENTS
title_short An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
title_full An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
title_fullStr An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
title_full_unstemmed An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
title_sort An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
author Casimiro, Ana
author_facet Casimiro, Ana
Vinga, Susana
Freitas,  Ana T
Oliveira, Arlindo L.
author_role author
author2 Vinga, Susana
Freitas,  Ana T
Oliveira, Arlindo L.
author2_role author
author
author
dc.contributor.none.fl_str_mv NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)
RUN
dc.contributor.author.fl_str_mv Casimiro, Ana
Vinga, Susana
Freitas,  Ana T
Oliveira, Arlindo L.
dc.subject.por.fl_str_mv IDENTIFICATION
STRUCTURED MOTIFS
ELEMENTS
GENE-EXPRESSION
TATA BOX
SEQUENCES
DATABASE
ACTIVATION
DISCOVERY
SACCHAROMYCES-CEREVISIAE
SACCHAROMYCES-CEREVISIAE
STRUCTURED MOTIFS
GENE-EXPRESSION
TATA BOX
SEQUENCES
IDENTIFICATION
DATABASE
ACTIVATION
DISCOVERY
ELEMENTS
topic IDENTIFICATION
STRUCTURED MOTIFS
ELEMENTS
GENE-EXPRESSION
TATA BOX
SEQUENCES
DATABASE
ACTIVATION
DISCOVERY
SACCHAROMYCES-CEREVISIAE
SACCHAROMYCES-CEREVISIAE
STRUCTURED MOTIFS
GENE-EXPRESSION
TATA BOX
SEQUENCES
IDENTIFICATION
DATABASE
ACTIVATION
DISCOVERY
ELEMENTS
description BACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.
publishDate 2008
dc.date.none.fl_str_mv 2008-02-07
2008-02-07T00:00:00Z
2017-09-18T22:01:11Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/23352
url http://hdl.handle.net/10362/23352
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1471-2105
PURE: 423024
https://doi.org/10.1186/1471-2105-9-89
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 13
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137904977510400