Text mining for the biocuration workflow

Hirschman, L.; Burns, G. A. P. C.; Krallinger, M.; Arighi, C.; Cohen, K. B.; Valencia, A.; Hu, C. H.; Chatr-Aryamontri, A.; Dowell, K. G.; Huala, E.; Lourenço, Anália; Nash, R.; Veuthey, A. L.; Wiegers, T.; Winter, A. G.

Text mining for the biocuration workflow

Detalhes bibliográficos
Autor(a) principal:	Hirschman, L.
Data de Publicação:	2012
Outros Autores:	Burns, G. A. P. C., Krallinger, M., Arighi, C., Cohen, K. B., Valencia, A., Hu, C. H., Chatr-Aryamontri, A., Dowell, K. G., Huala, E., Lourenço, Anália, Nash, R., Veuthey, A. L., Wiegers, T., Winter, A. G.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/1822/23460
Resumo:	Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

Metadados do item

id	RCAP_da256eee5bc55de95974e87659ab98fc
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/23460
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Text mining for the biocuration workflowScience & TechnologyMolecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.National Science Foundation (grant IIS-0844419 to L.H.); US National Institutes of Health National Library of Medicine (grant 1G08LM10720-01 to C.N.A. and C. H. W.); Work related to BioCreative III was supported by the US National Science Foundation (grant DBI-0850319 to C.N.A., L.H., C.H.W.); the US National Institute of General Medical Sciences (grant R01-GM083871 to G.A.P.C.B.); the National Science Foundation (DBI-0849977 to G.A.P.G.B); the European Union Seventh Framework MICROME project (Grant Agreement Number 222886-2 to M.K. and A.V.); the US National Science Foundation IGERT (Grant 0221625 to K.G.D) and a PhRMA Foundation predoctoral fellowship in informatics; US National Science Foundation (grant DBI-0850219 to E.H.); US National Human Genome Research Institute (grant HG001315 to R.N.); National Institutes of Health (NIH) (grant 2U01HG02712-04 to A.L.V.) and European Commission contract FELICS (grant 021902RII3); National Institute of Environmental Health Sciences (NIEHS) and the National Library of Medicine (NLM) (R01ES014065 to T.W.); NIEHS (R01ES014065-04S1 to T.W.); National Institutes of Health National Center for Research Resources(P20RR016463 to T.W.); Biotechnology and Biological Sciences Research Council of the UK (grant BB/F010486/1 to A.G.W); the National Institutes of Health National Center for Research Resources (1R01RR024031 to A.G.W); the European Commission FP7 Program (2007223411 to A.G.W). Funding for open access charge: The MITRE Corporation.Oxford University PressOxford PressUniversidade do MinhoHirschman, L.Burns, G. A. P. C.Krallinger, M.Arighi, C.Cohen, K. B.Valencia, A.Hu, C. H.Chatr-Aryamontri, A.Dowell, K. G.Huala, E.Lourenço, AnáliaNash, R.Veuthey, A. L.Wiegers, T.Winter, A. G.20122012-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/23460eng1758-04631758-046310.1093/database/bas02022513129info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:30:31Zoai:repositorium.sdum.uminho.pt:1822/23460Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:25:42.690444Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Text mining for the biocuration workflow
title	Text mining for the biocuration workflow
spellingShingle	Text mining for the biocuration workflow Hirschman, L. Science & Technology
title_short	Text mining for the biocuration workflow
title_full	Text mining for the biocuration workflow
title_fullStr	Text mining for the biocuration workflow
title_full_unstemmed	Text mining for the biocuration workflow
title_sort	Text mining for the biocuration workflow
author	Hirschman, L.
author_facet	Hirschman, L. Burns, G. A. P. C. Krallinger, M. Arighi, C. Cohen, K. B. Valencia, A. Hu, C. H. Chatr-Aryamontri, A. Dowell, K. G. Huala, E. Lourenço, Anália Nash, R. Veuthey, A. L. Wiegers, T. Winter, A. G.
author_role	author
author2	Burns, G. A. P. C. Krallinger, M. Arighi, C. Cohen, K. B. Valencia, A. Hu, C. H. Chatr-Aryamontri, A. Dowell, K. G. Huala, E. Lourenço, Anália Nash, R. Veuthey, A. L. Wiegers, T. Winter, A. G.
author2_role	author author author author author author author author author author author author author author
dc.contributor.none.fl_str_mv	Universidade do Minho
dc.contributor.author.fl_str_mv	Hirschman, L. Burns, G. A. P. C. Krallinger, M. Arighi, C. Cohen, K. B. Valencia, A. Hu, C. H. Chatr-Aryamontri, A. Dowell, K. G. Huala, E. Lourenço, Anália Nash, R. Veuthey, A. L. Wiegers, T. Winter, A. G.
dc.subject.por.fl_str_mv	Science & Technology
topic	Science & Technology
description	Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
publishDate	2012
dc.date.none.fl_str_mv	2012 2012-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1822/23460
url	http://hdl.handle.net/1822/23460
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1758-0463 1758-0463 10.1093/database/bas020 22513129
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Oxford University Press Oxford Press
publisher.none.fl_str_mv	Oxford University Press Oxford Press
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799132742072401920

Text mining for the biocuration workflow

Registros relacionados