GOAnnotator: linking protein GO annotations to evidence text

Detalhes bibliográficos
Autor(a) principal: Couto, Francisco M.
Data de Publicação: 2005
Outros Autores: Silva, Mário J., Lee, Vivian, Dimmer, Emily, Camon, Evelyn, Apweiler, Rolf, Kirsch, Harald, Rebholz-Schuhmann, Dietrich
Tipo de documento: Relatório
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10451/14234
Resumo: Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/
id RCAP_1b5f4c93b9ac62d5267e1f32ba2bd1dc
oai_identifier_str oai:repositorio.ul.pt:10451/14234
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling GOAnnotator: linking protein GO annotations to evidence textBioinformatics (genome or protein) databasesText miningAnnotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/Department of Informatics, University of LisbonRepositório da Universidade de LisboaCouto, Francisco M.Silva, Mário J.Lee, VivianDimmer, EmilyCamon, EvelynApweiler, RolfKirsch, HaraldRebholz-Schuhmann, Dietrich2009-02-10T13:11:45Z2005-122005-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14234porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:55Zoai:repositorio.ul.pt:10451/14234Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:03.495990Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv GOAnnotator: linking protein GO annotations to evidence text
title GOAnnotator: linking protein GO annotations to evidence text
spellingShingle GOAnnotator: linking protein GO annotations to evidence text
Couto, Francisco M.
Bioinformatics (genome or protein) databases
Text mining
title_short GOAnnotator: linking protein GO annotations to evidence text
title_full GOAnnotator: linking protein GO annotations to evidence text
title_fullStr GOAnnotator: linking protein GO annotations to evidence text
title_full_unstemmed GOAnnotator: linking protein GO annotations to evidence text
title_sort GOAnnotator: linking protein GO annotations to evidence text
author Couto, Francisco M.
author_facet Couto, Francisco M.
Silva, Mário J.
Lee, Vivian
Dimmer, Emily
Camon, Evelyn
Apweiler, Rolf
Kirsch, Harald
Rebholz-Schuhmann, Dietrich
author_role author
author2 Silva, Mário J.
Lee, Vivian
Dimmer, Emily
Camon, Evelyn
Apweiler, Rolf
Kirsch, Harald
Rebholz-Schuhmann, Dietrich
author2_role author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Couto, Francisco M.
Silva, Mário J.
Lee, Vivian
Dimmer, Emily
Camon, Evelyn
Apweiler, Rolf
Kirsch, Harald
Rebholz-Schuhmann, Dietrich
dc.subject.por.fl_str_mv Bioinformatics (genome or protein) databases
Text mining
topic Bioinformatics (genome or protein) databases
Text mining
description Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/
publishDate 2005
dc.date.none.fl_str_mv 2005-12
2005-12-01T00:00:00Z
2009-02-10T13:11:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/report
format report
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/14234
url http://hdl.handle.net/10451/14234
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Department of Informatics, University of Lisbon
publisher.none.fl_str_mv Department of Informatics, University of Lisbon
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134259356631040