Statistical, computational and visualization methodologies to unveil gene primary structure features

Detalhes bibliográficos
Autor(a) principal: Pinheiro, M.
Data de Publicação: 2006
Outros Autores: Afreixo, V., Moura, G., Freitas, A., Santos, M. A. S., Oliveira, J. L.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/27687
Resumo: Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.
id RCAP_86ab3275815499d0e5db58c65b6eaec9
oai_identifier_str oai:ria.ua.pt:10773/27687
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Statistical, computational and visualization methodologies to unveil gene primary structure featuresBioinformatics softwareCodon contextCodon biasContingency tablesResidual analysisCluster analysisGene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.Schattauer2020-02-27T10:33:35Z2006-01-01T00:00:00Z2006info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/27687eng0026-127010.1267/METH06020163Pinheiro, M.Afreixo, V.Moura, G.Freitas, A.Santos, M. A. S.Oliveira, J. L.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:53:36Zoai:ria.ua.pt:10773/27687Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:00:23.342173Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Statistical, computational and visualization methodologies to unveil gene primary structure features
title Statistical, computational and visualization methodologies to unveil gene primary structure features
spellingShingle Statistical, computational and visualization methodologies to unveil gene primary structure features
Pinheiro, M.
Bioinformatics software
Codon context
Codon bias
Contingency tables
Residual analysis
Cluster analysis
title_short Statistical, computational and visualization methodologies to unveil gene primary structure features
title_full Statistical, computational and visualization methodologies to unveil gene primary structure features
title_fullStr Statistical, computational and visualization methodologies to unveil gene primary structure features
title_full_unstemmed Statistical, computational and visualization methodologies to unveil gene primary structure features
title_sort Statistical, computational and visualization methodologies to unveil gene primary structure features
author Pinheiro, M.
author_facet Pinheiro, M.
Afreixo, V.
Moura, G.
Freitas, A.
Santos, M. A. S.
Oliveira, J. L.
author_role author
author2 Afreixo, V.
Moura, G.
Freitas, A.
Santos, M. A. S.
Oliveira, J. L.
author2_role author
author
author
author
author
dc.contributor.author.fl_str_mv Pinheiro, M.
Afreixo, V.
Moura, G.
Freitas, A.
Santos, M. A. S.
Oliveira, J. L.
dc.subject.por.fl_str_mv Bioinformatics software
Codon context
Codon bias
Contingency tables
Residual analysis
Cluster analysis
topic Bioinformatics software
Codon context
Codon bias
Contingency tables
Residual analysis
Cluster analysis
description Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.
publishDate 2006
dc.date.none.fl_str_mv 2006-01-01T00:00:00Z
2006
2020-02-27T10:33:35Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/27687
url http://hdl.handle.net/10773/27687
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0026-1270
10.1267/METH06020163
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Schattauer
publisher.none.fl_str_mv Schattauer
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137659476508672