Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

Detalhes bibliográficos
Autor(a) principal: Santos, André Fernandes
Data de Publicação: 2012
Outros Autores: Nogueira, R., Lourenço, Anália
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/32581
Resumo: Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.
id RCAP_f0b387d90c15d628abfc1bbe2e43581b
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/32581
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domainText miningBiotechnology applicationsProcedure optimizationScience & TechnologyScientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.Universidad de SalamancaUniversidade do MinhoSantos, André FernandesNogueira, R.Lourenço, Anália20122012-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/32581engSantos, A.F.; Nogueira, R.; Lourenço, A. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain Advances in Distributed Computing and Artificial Intelligence Journal 1(1 (Special Issue #1)) 1-8, 2012.2255-28632255-286310.14201/ADCAIJ20121118http://adcaij.usal.es/info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:47:14Zoai:repositorium.sdum.uminho.pt:1822/32581Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:45:20.292080Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
title Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
spellingShingle Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
Santos, André Fernandes
Text mining
Biotechnology applications
Procedure optimization
Science & Technology
title_short Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
title_full Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
title_fullStr Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
title_full_unstemmed Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
title_sort Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain
author Santos, André Fernandes
author_facet Santos, André Fernandes
Nogueira, R.
Lourenço, Anália
author_role author
author2 Nogueira, R.
Lourenço, Anália
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Santos, André Fernandes
Nogueira, R.
Lourenço, Anália
dc.subject.por.fl_str_mv Text mining
Biotechnology applications
Procedure optimization
Science & Technology
topic Text mining
Biotechnology applications
Procedure optimization
Science & Technology
description Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.
publishDate 2012
dc.date.none.fl_str_mv 2012
2012-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/32581
url http://hdl.handle.net/1822/32581
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Santos, A.F.; Nogueira, R.; Lourenço, A. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain Advances in Distributed Computing and Artificial Intelligence Journal 1(1 (Special Issue #1)) 1-8, 2012.
2255-2863
2255-2863
10.14201/ADCAIJ20121118
http://adcaij.usal.es/
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de Salamanca
publisher.none.fl_str_mv Universidad de Salamanca
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133017535414272