An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance
Autor(a) principal: | |
---|---|
Data de Publicação: | 2008 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/23352 |
Resumo: | BACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets. |
id |
RCAP_95c73b63f63781fc9c59daf05dc1a415 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/23352 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevanceIDENTIFICATIONSTRUCTURED MOTIFSELEMENTSGENE-EXPRESSIONTATA BOXSEQUENCESDATABASEACTIVATIONDISCOVERYSACCHAROMYCES-CEREVISIAESACCHAROMYCES-CEREVISIAESTRUCTURED MOTIFSGENE-EXPRESSIONTATA BOXSEQUENCESIDENTIFICATIONDATABASEACTIVATIONDISCOVERYELEMENTSBACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM)RUNCasimiro, AnaVinga, SusanaFreitas, Ana TOliveira, Arlindo L.2017-09-18T22:01:11Z2008-02-072008-02-07T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article13application/pdfhttp://hdl.handle.net/10362/23352eng1471-2105PURE: 423024https://doi.org/10.1186/1471-2105-9-89info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:11:36Zoai:run.unl.pt:10362/23352Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:27:46.081415Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
title |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
spellingShingle |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance Casimiro, Ana IDENTIFICATION STRUCTURED MOTIFS ELEMENTS GENE-EXPRESSION TATA BOX SEQUENCES DATABASE ACTIVATION DISCOVERY SACCHAROMYCES-CEREVISIAE SACCHAROMYCES-CEREVISIAE STRUCTURED MOTIFS GENE-EXPRESSION TATA BOX SEQUENCES IDENTIFICATION DATABASE ACTIVATION DISCOVERY ELEMENTS |
title_short |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
title_full |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
title_fullStr |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
title_full_unstemmed |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
title_sort |
An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance |
author |
Casimiro, Ana |
author_facet |
Casimiro, Ana Vinga, Susana Freitas, Ana T Oliveira, Arlindo L. |
author_role |
author |
author2 |
Vinga, Susana Freitas, Ana T Oliveira, Arlindo L. |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
NOVA Medical School|Faculdade de Ciências Médicas (NMS|FCM) RUN |
dc.contributor.author.fl_str_mv |
Casimiro, Ana Vinga, Susana Freitas, Ana T Oliveira, Arlindo L. |
dc.subject.por.fl_str_mv |
IDENTIFICATION STRUCTURED MOTIFS ELEMENTS GENE-EXPRESSION TATA BOX SEQUENCES DATABASE ACTIVATION DISCOVERY SACCHAROMYCES-CEREVISIAE SACCHAROMYCES-CEREVISIAE STRUCTURED MOTIFS GENE-EXPRESSION TATA BOX SEQUENCES IDENTIFICATION DATABASE ACTIVATION DISCOVERY ELEMENTS |
topic |
IDENTIFICATION STRUCTURED MOTIFS ELEMENTS GENE-EXPRESSION TATA BOX SEQUENCES DATABASE ACTIVATION DISCOVERY SACCHAROMYCES-CEREVISIAE SACCHAROMYCES-CEREVISIAE STRUCTURED MOTIFS GENE-EXPRESSION TATA BOX SEQUENCES IDENTIFICATION DATABASE ACTIVATION DISCOVERY ELEMENTS |
description |
BACKGROUND: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially.RESULTS: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery.CONCLUSION: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets. |
publishDate |
2008 |
dc.date.none.fl_str_mv |
2008-02-07 2008-02-07T00:00:00Z 2017-09-18T22:01:11Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/23352 |
url |
http://hdl.handle.net/10362/23352 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1471-2105 PURE: 423024 https://doi.org/10.1186/1471-2105-9-89 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
13 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137904977510400 |