Statistical, computational and visualization methodologies to unveil gene primary structure features

Pinheiro, M.; Afreixo, V.; Moura, G.; Freitas, A.; Santos, M. A. S.; Oliveira, J. L.

Statistical, computational and visualization methodologies to unveil gene primary structure features

Detalhes bibliográficos
Autor(a) principal:	Pinheiro, M.
Data de Publicação:	2006
Outros Autores:	Afreixo, V., Moura, G., Freitas, A., Santos, M. A. S., Oliveira, J. L.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/27687
Resumo:	Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.

Metadados do item

id	RCAP_86ab3275815499d0e5db58c65b6eaec9
oai_identifier_str	oai:ria.ua.pt:10773/27687
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Statistical, computational and visualization methodologies to unveil gene primary structure featuresBioinformatics softwareCodon contextCodon biasContingency tablesResidual analysisCluster analysisGene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.Schattauer2020-02-27T10:33:35Z2006-01-01T00:00:00Z2006info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/27687eng0026-127010.1267/METH06020163Pinheiro, M.Afreixo, V.Moura, G.Freitas, A.Santos, M. A. S.Oliveira, J. L.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:53:36Zoai:ria.ua.pt:10773/27687Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:00:23.342173Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Statistical, computational and visualization methodologies to unveil gene primary structure features
title	Statistical, computational and visualization methodologies to unveil gene primary structure features
spellingShingle	Statistical, computational and visualization methodologies to unveil gene primary structure features Pinheiro, M. Bioinformatics software Codon context Codon bias Contingency tables Residual analysis Cluster analysis
title_short	Statistical, computational and visualization methodologies to unveil gene primary structure features
title_full	Statistical, computational and visualization methodologies to unveil gene primary structure features
title_fullStr	Statistical, computational and visualization methodologies to unveil gene primary structure features
title_full_unstemmed	Statistical, computational and visualization methodologies to unveil gene primary structure features
title_sort	Statistical, computational and visualization methodologies to unveil gene primary structure features
author	Pinheiro, M.
author_facet	Pinheiro, M. Afreixo, V. Moura, G. Freitas, A. Santos, M. A. S. Oliveira, J. L.
author_role	author
author2	Afreixo, V. Moura, G. Freitas, A. Santos, M. A. S. Oliveira, J. L.
author2_role	author author author author author
dc.contributor.author.fl_str_mv	Pinheiro, M. Afreixo, V. Moura, G. Freitas, A. Santos, M. A. S. Oliveira, J. L.
dc.subject.por.fl_str_mv	Bioinformatics software Codon context Codon bias Contingency tables Residual analysis Cluster analysis
topic	Bioinformatics software Codon context Codon bias Contingency tables Residual analysis Cluster analysis
description	Gene sequence features such as codon bias, codon context, and codon expansion (e.g. trinucleotide repeats) can be better understood at the genomic scale level by combining statistical methodologies with advanced computer algorithms and data visualization through sophisticated graphical interfaces. This paper presents the ANACONDA system, a bioinformatics application for gene primary structure analysis. Codon usage tables using absolute metrics and software for multivariate analysis of codon and amino acid usage are available in public databases. However, they do not provide easy computational and statistical tools to carry out detailed gene primary structure analysis on a genomic scale. We propose the usage of several statistical methods--contingency table analysis, residual analysis, multivariate analysis (cluster analysis)--to analyze the codon bias under various aspects (degree of association, contexts and clustering). The developed solution is a software application that provides a user-guided analysis of codon sequences considering several contexts and codon usage on a genomic scale. The utilization of this tool in our molecular biology laboratory is focused on particular genomes, especially those from Saccharomyces cerevisiae, Candida albicans and Escherichia coli. In order to illustrate the applicability and output layouts of the software these species are herein used as examples. The statistical tools incorporated in the system are allowing to obtain global views of important sequence features. It is expected that the results obtained will permit identification of general rules that govern codon context and codon usage in any genome. Additionally, identification of genes containing expanded codons that arise as a consequence of erroneous DNA replication events will permit uncovering new genes associated with human disease.
publishDate	2006
dc.date.none.fl_str_mv	2006-01-01T00:00:00Z 2006 2020-02-27T10:33:35Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/27687
url	http://hdl.handle.net/10773/27687
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	0026-1270 10.1267/METH06020163
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Schattauer
publisher.none.fl_str_mv	Schattauer
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137659476508672

Statistical, computational and visualization methodologies to unveil gene primary structure features

Registros relacionados