Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome

Detalhes bibliográficos
Autor(a) principal: Pérez-Pérez, Martín
Data de Publicação: 2022
Outros Autores: Ferreira, Tânia, Lourenço, Anália Maria Garcia, Igrejas, Gilberto, Fdez-Riverola, Florentino
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/76493
Resumo: "Available online 11 November 2021"
id RCAP_dc7d4b9f71d3bd03d823d6109bf04286
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/76493
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliomeLiterature miningDocument classificationSemi-automatic curationOntology-based representationGluten bibliomeScience & Technology"Available online 11 November 2021"The increasing number of scientific research documents published keeps growing at an unprecedented rate, making it increasingly difficult to access practical information within a target domain. This situation is motivating a growing interest in applying text mining techniques for the automatic processing of text resources to structure the information that helps researchers to find information of interest and infer knowledge of practical use. However, the automatic processing of research documents requires the previous existence of large, manually annotated text corpora to develop robust and accurate text mining processing methods and machine learning models. In this context, semi-automatic extraction techniques based on structured data and state-of-the-art biomedical tools appear to have significant potential to enhance curator productivity and reduce the costs of document curation. In this line, this work proposes a semi-automatic machine learning workflow and a NER+Ontology boosting technique for the automatic classification of biomedical literature. The practical relevance of the proposed approach has been proven in the curation of 4,115 gluten-related documents extracted from PubMed and contrasted against the word embedding alternative. Comparing the results of the experiments, the proposed NER+Ontology technique is an effective alternative to other state-of-the-art document representation techniques to process the existing biomedical literature.This work was supported by: the Associate Laboratory for Green Chemistry - LAQV financed by the Portuguese Foundation for Science and Technology (FCT/MCTES) Ref. UID/QUI/50006/2020; the Portuguese Foundation for Science and Technology (FCT/MCTES) under the scope of the strategic funding of UIDB/04469/2020 unit and BioTecNorte operation funded by the European Regional Development Fund (ERDF) under the scope of Norte2020— Programa Operacional Regional do Norte. Ref. NORTE-01-0145-FEDER-000004; the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019-2022) funded by the European Regional Development Fund (ERDF)-Ref. ED431G2019/06. The authors also acknowledge the postdoctoral fellowship [ED481B-2019-032] of Martín Pérez-Pérez, funded by Xunta de Galicia.info:eu-repo/semantics/publishedVersionElsevierUniversidade do MinhoPérez-Pérez, MartínFerreira, TâniaLourenço, Anália Maria GarciaIgrejas, GilbertoFdez-Riverola, Florentino2022-052022-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/76493engPérez-Pérez, Martín; Ferreira, Tânia; Lourenço, Anália; Igrejas, Gilberto; Fdez-Riverola, Florentino, Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome. Neurocomputing, 484, 223-237, 20220925-231210.1016/j.neucom.2021.10.100https://www.journals.elsevier.com/neurocomputinginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:02:19Zoai:repositorium.sdum.uminho.pt:1822/76493Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:52:17.273091Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
title Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
spellingShingle Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
Pérez-Pérez, Martín
Literature mining
Document classification
Semi-automatic curation
Ontology-based representation
Gluten bibliome
Science & Technology
title_short Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
title_full Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
title_fullStr Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
title_full_unstemmed Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
title_sort Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome
author Pérez-Pérez, Martín
author_facet Pérez-Pérez, Martín
Ferreira, Tânia
Lourenço, Anália Maria Garcia
Igrejas, Gilberto
Fdez-Riverola, Florentino
author_role author
author2 Ferreira, Tânia
Lourenço, Anália Maria Garcia
Igrejas, Gilberto
Fdez-Riverola, Florentino
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Pérez-Pérez, Martín
Ferreira, Tânia
Lourenço, Anália Maria Garcia
Igrejas, Gilberto
Fdez-Riverola, Florentino
dc.subject.por.fl_str_mv Literature mining
Document classification
Semi-automatic curation
Ontology-based representation
Gluten bibliome
Science & Technology
topic Literature mining
Document classification
Semi-automatic curation
Ontology-based representation
Gluten bibliome
Science & Technology
description "Available online 11 November 2021"
publishDate 2022
dc.date.none.fl_str_mv 2022-05
2022-05-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/76493
url https://hdl.handle.net/1822/76493
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Pérez-Pérez, Martín; Ferreira, Tânia; Lourenço, Anália; Igrejas, Gilberto; Fdez-Riverola, Florentino, Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome. Neurocomputing, 484, 223-237, 2022
0925-2312
10.1016/j.neucom.2021.10.100
https://www.journals.elsevier.com/neurocomputing
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132298690428928