Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/151069 |
Resumo: | Funding Information: SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was supported by: the Associate Laboratory for Green Chemistry - LAQV financed by the Portuguese Foundation for Science and Technology (FCT/MCTES) Ref. UID/QUI/50006/2020. Ref. NORTE-01-0145-FEDER-000004; the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019-2022) funded by the European Regional Development Fund (ERDF)-Ref. ED431G2019/06. The authors also acknowledge the postdoctoral fellowship [ED481B-2019-032] of Martín Pérez-Pérez, funded by Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUG. Publisher Copyright: © 2021 The Author(s) |
id |
RCAP_dd2d2093e7f3f77e1870e2d332b43a17 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/151069 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representationThe case of gluten bibliomeDocument classificationGluten bibliomeLiterature miningOntology-based representationSemi-automatic curationComputer Science ApplicationsCognitive NeuroscienceArtificial IntelligenceFunding Information: SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was supported by: the Associate Laboratory for Green Chemistry - LAQV financed by the Portuguese Foundation for Science and Technology (FCT/MCTES) Ref. UID/QUI/50006/2020. Ref. NORTE-01-0145-FEDER-000004; the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019-2022) funded by the European Regional Development Fund (ERDF)-Ref. ED431G2019/06. The authors also acknowledge the postdoctoral fellowship [ED481B-2019-032] of Martín Pérez-Pérez, funded by Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUG. Publisher Copyright: © 2021 The Author(s)The increasing number of scientific research documents published keeps growing at an unprecedented rate, making it increasingly difficult to access practical information within a target domain. This situation is motivating a growing interest in applying text mining techniques for the automatic processing of text resources to structure the information that helps researchers to find information of interest and infer knowledge of practical use. However, the automatic processing of research documents requires the previous existence of large, manually annotated text corpora to develop robust and accurate text mining processing methods and machine learning models. In this context, semi-automatic extraction techniques based on structured data and state-of-the-art biomedical tools appear to have significant potential to enhance curator productivity and reduce the costs of document curation. In this line, this work proposes a semi-automatic machine learning workflow and a NER + Ontology boosting technique for the automatic classification of biomedical literature. The practical relevance of the proposed approach has been proven in the curation of 4,115 gluten-related documents extracted from PubMed and contrasted against the word embedding alternative. Comparing the results of the experiments, the proposed NER + Ontology technique is an effective alternative to other state-of-the-art document representation techniques to process the existing biomedical literature.LAQV@REQUIMTERUNPérez-Pérez, MartínFerreira, TâniaLourenço, AnáliaIgrejas, GilbertoFdez-Riverola, Florentino2023-03-22T22:28:17Z2022-05-012022-05-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article15application/pdfhttp://hdl.handle.net/10362/151069eng0925-2312PURE: 56608635https://doi.org/10.1016/j.neucom.2021.10.100info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:33:30Zoai:run.unl.pt:10362/151069Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:54:26.588773Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation The case of gluten bibliome |
title |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
spellingShingle |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation Pérez-Pérez, Martín Document classification Gluten bibliome Literature mining Ontology-based representation Semi-automatic curation Computer Science Applications Cognitive Neuroscience Artificial Intelligence |
title_short |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
title_full |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
title_fullStr |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
title_full_unstemmed |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
title_sort |
Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation |
author |
Pérez-Pérez, Martín |
author_facet |
Pérez-Pérez, Martín Ferreira, Tânia Lourenço, Anália Igrejas, Gilberto Fdez-Riverola, Florentino |
author_role |
author |
author2 |
Ferreira, Tânia Lourenço, Anália Igrejas, Gilberto Fdez-Riverola, Florentino |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
LAQV@REQUIMTE RUN |
dc.contributor.author.fl_str_mv |
Pérez-Pérez, Martín Ferreira, Tânia Lourenço, Anália Igrejas, Gilberto Fdez-Riverola, Florentino |
dc.subject.por.fl_str_mv |
Document classification Gluten bibliome Literature mining Ontology-based representation Semi-automatic curation Computer Science Applications Cognitive Neuroscience Artificial Intelligence |
topic |
Document classification Gluten bibliome Literature mining Ontology-based representation Semi-automatic curation Computer Science Applications Cognitive Neuroscience Artificial Intelligence |
description |
Funding Information: SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was supported by: the Associate Laboratory for Green Chemistry - LAQV financed by the Portuguese Foundation for Science and Technology (FCT/MCTES) Ref. UID/QUI/50006/2020. Ref. NORTE-01-0145-FEDER-000004; the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, the “Centro singular de investigación de Galicia” (accreditation 2019-2022) funded by the European Regional Development Fund (ERDF)-Ref. ED431G2019/06. The authors also acknowledge the postdoctoral fellowship [ED481B-2019-032] of Martín Pérez-Pérez, funded by Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUG. Publisher Copyright: © 2021 The Author(s) |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-05-01 2022-05-01T00:00:00Z 2023-03-22T22:28:17Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/151069 |
url |
http://hdl.handle.net/10362/151069 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0925-2312 PURE: 56608635 https://doi.org/10.1016/j.neucom.2021.10.100 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
15 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138133055373312 |