SamPler - a novel method for selecting parameters for gene functional annotation routines

Detalhes bibliográficos
Autor(a) principal: Cruz, Fernando João Pereira da
Data de Publicação: 2019
Outros Autores: Lagoa, Davide Rafael Santos, Mendes, João, Rocha, I., Ferreira, Eugénio C., Rocha, Miguel, Dias, Oscar
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/61324
Resumo: Background: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. Results: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. Conclusions: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.
id RCAP_8aed24feca2f6554a7ddf53e8f7530cc
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/61324
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling SamPler - a novel method for selecting parameters for gene functional annotation routinesSamPlerAnnotation routinesParametrizationMerlinScience & TechnologyBackground: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. Results: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. Conclusions: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of [UID/BIO/ 04469] unit and COMPETE 2020 [POCI-01-0145-FEDER-006684] and BioTecNorte operation [NORTE-01-0145-FEDER-000004] funded by the European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte. The authors thank the project DDDeCaF - Bioinformatics Services for Data-Driven Design of Cell Factories and Communities, Ref. H2020-LEIT-BIO-2015-1 686070–1, funded by the European Commission.info:eu-repo/semantics/publishedVersionSpringer NatureUniversidade do MinhoCruz, Fernando João Pereira daLagoa, Davide Rafael SantosMendes, JoãoRocha, I.Ferreira, Eugénio C.Rocha, MiguelDias, Oscar2019-09-052019-09-05T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/61324engCruz, Fernado; Lagoa, D.; Mendes, João; Rocha, Isabel; Ferreira, Eugénio C.; Rocha, Miguel; Dias, Oscar, SamPler - a novel method for selecting parameters for gene functional annotation routines. BMC Bioinformatics, 20(454), 20191471-21051471-210510.1186/s12859-019-3038-431488049http://www.biomedcentral.com/bmcbioinformaticsinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:46:54Zoai:repositorium.sdum.uminho.pt:1822/61324Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:44:56.531657Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv SamPler - a novel method for selecting parameters for gene functional annotation routines
title SamPler - a novel method for selecting parameters for gene functional annotation routines
spellingShingle SamPler - a novel method for selecting parameters for gene functional annotation routines
Cruz, Fernando João Pereira da
SamPler
Annotation routines
Parametrization
Merlin
Science & Technology
title_short SamPler - a novel method for selecting parameters for gene functional annotation routines
title_full SamPler - a novel method for selecting parameters for gene functional annotation routines
title_fullStr SamPler - a novel method for selecting parameters for gene functional annotation routines
title_full_unstemmed SamPler - a novel method for selecting parameters for gene functional annotation routines
title_sort SamPler - a novel method for selecting parameters for gene functional annotation routines
author Cruz, Fernando João Pereira da
author_facet Cruz, Fernando João Pereira da
Lagoa, Davide Rafael Santos
Mendes, João
Rocha, I.
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
author_role author
author2 Lagoa, Davide Rafael Santos
Mendes, João
Rocha, I.
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
author2_role author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Cruz, Fernando João Pereira da
Lagoa, Davide Rafael Santos
Mendes, João
Rocha, I.
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
dc.subject.por.fl_str_mv SamPler
Annotation routines
Parametrization
Merlin
Science & Technology
topic SamPler
Annotation routines
Parametrization
Merlin
Science & Technology
description Background: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. Results: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. Conclusions: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool.
publishDate 2019
dc.date.none.fl_str_mv 2019-09-05
2019-09-05T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/61324
url http://hdl.handle.net/1822/61324
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Cruz, Fernado; Lagoa, D.; Mendes, João; Rocha, Isabel; Ferreira, Eugénio C.; Rocha, Miguel; Dias, Oscar, SamPler - a novel method for selecting parameters for gene functional annotation routines. BMC Bioinformatics, 20(454), 2019
1471-2105
1471-2105
10.1186/s12859-019-3038-4
31488049
http://www.biomedcentral.com/bmcbioinformatics
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer Nature
publisher.none.fl_str_mv Springer Nature
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133011938115584