Semantic annotation of biological concepts interplaying microbial cellular responses

Detalhes bibliográficos
Autor(a) principal: Carreira, Rafael
Data de Publicação: 2011
Outros Autores: Carneiro, S., Pereira, Rui C., Rocha, Miguel, Rocha, I., Ferreira, Eugénio C., Lourenço, Anália
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/16826
Resumo: Background Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules), proteins (transcription factors, enzymes and transporters), small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts) and compounds (most frequently annotated concepts), whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.
id RCAP_8277e1b9acb6ffe4a1e5dd7225e19b57
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/16826
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Semantic annotation of biological concepts interplaying microbial cellular responsesScience & TechnologyBackground Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules), proteins (transcription factors, enzymes and transporters), small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts) and compounds (most frequently annotated concepts), whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.This work is partly funded by SYSINBIO, an European Coordination and Support action (call FP7-KBBE-2007-1) in the field of model driven metabolic engineering, and the Portuguese FCT (Fundacao para a Ciencia e Tecnologia) funded MIT-Portugal Program in Bioengineering (MIT-Pt/BS-BB/0082/2008). The work of Rafael Carreira, Sonia Carneiro and Rui Pereira are supported by PhD grants from FCT (refs. SFRH/BD/66201/2009, SFRH/BD/22863/2005 and SFRH/BD/51111/2010, respectively).BioMed Central (BMC)Universidade do MinhoCarreira, RafaelCarneiro, S.Pereira, Rui C.Rocha, MiguelRocha, I.Ferreira, Eugénio C.Lourenço, Anália20112011-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/16826eng1471-210510.1186/1471-2105-12-46022122862info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T11:57:40Zoai:repositorium.sdum.uminho.pt:1822/16826Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:47:21.505304Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Semantic annotation of biological concepts interplaying microbial cellular responses
title Semantic annotation of biological concepts interplaying microbial cellular responses
spellingShingle Semantic annotation of biological concepts interplaying microbial cellular responses
Carreira, Rafael
Science & Technology
title_short Semantic annotation of biological concepts interplaying microbial cellular responses
title_full Semantic annotation of biological concepts interplaying microbial cellular responses
title_fullStr Semantic annotation of biological concepts interplaying microbial cellular responses
title_full_unstemmed Semantic annotation of biological concepts interplaying microbial cellular responses
title_sort Semantic annotation of biological concepts interplaying microbial cellular responses
author Carreira, Rafael
author_facet Carreira, Rafael
Carneiro, S.
Pereira, Rui C.
Rocha, Miguel
Rocha, I.
Ferreira, Eugénio C.
Lourenço, Anália
author_role author
author2 Carneiro, S.
Pereira, Rui C.
Rocha, Miguel
Rocha, I.
Ferreira, Eugénio C.
Lourenço, Anália
author2_role author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Carreira, Rafael
Carneiro, S.
Pereira, Rui C.
Rocha, Miguel
Rocha, I.
Ferreira, Eugénio C.
Lourenço, Anália
dc.subject.por.fl_str_mv Science & Technology
topic Science & Technology
description Background Automated extraction systems have become a time saving necessity in Systems Biology. Considerable human effort is needed to model, analyse and simulate biological networks. Thus, one of the challenges posed to Biomedical Text Mining tools is that of learning to recognise a wide variety of biological concepts with different functional roles to assist in these processes. Results Here, we present a novel corpus concerning the integrated cellular responses to nutrient starvation in the model-organism Escherichia coli. Our corpus is a unique resource in that it annotates biomedical concepts that play a functional role in expression, regulation and metabolism. Namely, it includes annotations for genetic information carriers (genes and DNA, RNA molecules), proteins (transcription factors, enzymes and transporters), small metabolites, physiological states and laboratory techniques. The corpus consists of 130 full-text papers with a total of 59043 annotations for 3649 different biomedical concepts; the two dominant classes are genes (highest number of unique concepts) and compounds (most frequently annotated concepts), whereas other important cellular concepts such as proteins account for no more than 10% of the annotated concepts. Conclusions To the best of our knowledge, a corpus that details such a wide range of biological concepts has never been presented to the text mining community. The inter-annotator agreement statistics provide evidence of the importance of a consolidated background when dealing with such complex descriptions, the ambiguities naturally arising from the terminology and their impact for modelling purposes. Availability is granted for the full-text corpora of 130 freely accessible documents, the annotation scheme and the annotation guidelines. Also, we include a corpus of 340 abstracts.
publishDate 2011
dc.date.none.fl_str_mv 2011
2011-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/16826
url https://hdl.handle.net/1822/16826
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1471-2105
10.1186/1471-2105-12-460
22122862
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv BioMed Central (BMC)
publisher.none.fl_str_mv BioMed Central (BMC)
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132231271186432