GOAnnotator: linking protein GO annotations to evidence text

Couto, Francisco M.; Silva, Mário J.; Lee, Vivian; Dimmer, Emily; Camon, Evelyn; Apweiler, Rolf; Kirsch, Harald; Rebholz-Schuhmann, Dietrich

GOAnnotator: linking protein GO annotations to evidence text

Detalhes bibliográficos
Autor(a) principal:	Couto, Francisco M.
Data de Publicação:	2005
Outros Autores:	Silva, Mário J., Lee, Vivian, Dimmer, Emily, Camon, Evelyn, Apweiler, Rolf, Kirsch, Harald, Rebholz-Schuhmann, Dietrich
Tipo de documento:	Relatório
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10451/14234
Resumo:	Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/

Metadados do item

id	RCAP_1b5f4c93b9ac62d5267e1f32ba2bd1dc
oai_identifier_str	oai:repositorio.ul.pt:10451/14234
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	GOAnnotator: linking protein GO annotations to evidence textBioinformatics (genome or protein) databasesText miningAnnotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/Department of Informatics, University of LisbonRepositório da Universidade de LisboaCouto, Francisco M.Silva, Mário J.Lee, VivianDimmer, EmilyCamon, EvelynApweiler, RolfKirsch, HaraldRebholz-Schuhmann, Dietrich2009-02-10T13:11:45Z2005-122005-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14234porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:55Zoai:repositorio.ul.pt:10451/14234Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:03.495990Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	GOAnnotator: linking protein GO annotations to evidence text
title	GOAnnotator: linking protein GO annotations to evidence text
spellingShingle	GOAnnotator: linking protein GO annotations to evidence text Couto, Francisco M. Bioinformatics (genome or protein) databases Text mining
title_short	GOAnnotator: linking protein GO annotations to evidence text
title_full	GOAnnotator: linking protein GO annotations to evidence text
title_fullStr	GOAnnotator: linking protein GO annotations to evidence text
title_full_unstemmed	GOAnnotator: linking protein GO annotations to evidence text
title_sort	GOAnnotator: linking protein GO annotations to evidence text
author	Couto, Francisco M.
author_facet	Couto, Francisco M. Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich
author_role	author
author2	Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich
author2_role	author author author author author author author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Couto, Francisco M. Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich
dc.subject.por.fl_str_mv	Bioinformatics (genome or protein) databases Text mining
topic	Bioinformatics (genome or protein) databases Text mining
description	Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/
publishDate	2005
dc.date.none.fl_str_mv	2005-12 2005-12-01T00:00:00Z 2009-02-10T13:11:45Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/report
format	report
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/14234
url	http://hdl.handle.net/10451/14234
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Department of Informatics, University of Lisbon
publisher.none.fl_str_mv	Department of Informatics, University of Lisbon
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134259356631040

GOAnnotator: linking protein GO annotations to evidence text

Registros relacionados