GOAnnotator: linking protein GO annotations to evidence text
Autor(a) principal: | |
---|---|
Data de Publicação: | 2005 |
Outros Autores: | , , , , , , |
Tipo de documento: | Relatório |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10451/14234 |
Resumo: | Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/ |
id |
RCAP_1b5f4c93b9ac62d5267e1f32ba2bd1dc |
---|---|
oai_identifier_str |
oai:repositorio.ul.pt:10451/14234 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
GOAnnotator: linking protein GO annotations to evidence textBioinformatics (genome or protein) databasesText miningAnnotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/Department of Informatics, University of LisbonRepositório da Universidade de LisboaCouto, Francisco M.Silva, Mário J.Lee, VivianDimmer, EmilyCamon, EvelynApweiler, RolfKirsch, HaraldRebholz-Schuhmann, Dietrich2009-02-10T13:11:45Z2005-122005-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/reportapplication/pdfhttp://hdl.handle.net/10451/14234porinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-08T15:59:55Zoai:repositorio.ul.pt:10451/14234Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:36:03.495990Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
GOAnnotator: linking protein GO annotations to evidence text |
title |
GOAnnotator: linking protein GO annotations to evidence text |
spellingShingle |
GOAnnotator: linking protein GO annotations to evidence text Couto, Francisco M. Bioinformatics (genome or protein) databases Text mining |
title_short |
GOAnnotator: linking protein GO annotations to evidence text |
title_full |
GOAnnotator: linking protein GO annotations to evidence text |
title_fullStr |
GOAnnotator: linking protein GO annotations to evidence text |
title_full_unstemmed |
GOAnnotator: linking protein GO annotations to evidence text |
title_sort |
GOAnnotator: linking protein GO annotations to evidence text |
author |
Couto, Francisco M. |
author_facet |
Couto, Francisco M. Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich |
author_role |
author |
author2 |
Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich |
author2_role |
author author author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Couto, Francisco M. Silva, Mário J. Lee, Vivian Dimmer, Emily Camon, Evelyn Apweiler, Rolf Kirsch, Harald Rebholz-Schuhmann, Dietrich |
dc.subject.por.fl_str_mv |
Bioinformatics (genome or protein) databases Text mining |
topic |
Bioinformatics (genome or protein) databases Text mining |
description |
Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators. In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins. The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one, which achieved high precision, which is crucial for the efficient support of GO curators. GOAnnotator is available at: http://xldb.fc.ul.pt/rebil/tools/goa/ |
publishDate |
2005 |
dc.date.none.fl_str_mv |
2005-12 2005-12-01T00:00:00Z 2009-02-10T13:11:45Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/report |
format |
report |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/14234 |
url |
http://hdl.handle.net/10451/14234 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
publisher.none.fl_str_mv |
Department of Informatics, University of Lisbon |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134259356631040 |