Multilingual bi-encoder models for biomedical entity linking

Detalhes bibliográficos
Autor(a) principal: Guven, Zekeriya Anil
Data de Publicação: 2023
Outros Autores: Lamúrias, André
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/163981
Resumo: Funding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd.
id RCAP_6a5593e1500814d2b888e523a6125f65
oai_identifier_str oai:run.unl.pt:10362/163981
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Multilingual bi-encoder models for biomedical entity linkingbiomedical entity linkingdata analysisentity linkinglanguage modelmultilingual analysisnatural language processingControl and Systems EngineeringTheoretical Computer ScienceComputational Theory and MathematicsArtificial IntelligenceFunding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd.Natural language processing (NLP) is a field of study that focuses on data analysis on texts with certain methods. NLP includes tasks such as sentiment analysis, spam detection, entity linking, and question answering, to name a few. Entity linking is an NLP task that is used to map mentions specified in the text to the entities of a Knowledge Base. In this study, we analysed the efficacy of bi-encoder entity linking models for multilingual biomedical texts. Using surface-based, approximate nearest neighbour search and embedding approaches during the candidate generation phase, accuracy, and recall values were measured on language representation models such as BERT, SapBERT, BioBERT, and RoBERTa according to language and domain. The proposed entity linking framework was analysed on the BC5CDR and Cantemist datasets for English and Spanish, respectively. The framework achieved 76.75% accuracy for the BC5CDR and 60.19% for the Cantemist. In addition, the proposed framework was compared with previous studies. The results highlight the challenges that come with domain-specific multilingual datasets.DI - Departamento de InformáticaRUNGuven, Zekeriya AnilLamúrias, André2024-02-22T23:53:00Z2023-112023-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article14application/pdfhttp://hdl.handle.net/10362/163981eng0266-4720PURE: 83890974https://doi.org/10.1111/exsy.13388info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:50:01Zoai:run.unl.pt:10362/163981Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:58.428437Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Multilingual bi-encoder models for biomedical entity linking
title Multilingual bi-encoder models for biomedical entity linking
spellingShingle Multilingual bi-encoder models for biomedical entity linking
Guven, Zekeriya Anil
biomedical entity linking
data analysis
entity linking
language model
multilingual analysis
natural language processing
Control and Systems Engineering
Theoretical Computer Science
Computational Theory and Mathematics
Artificial Intelligence
title_short Multilingual bi-encoder models for biomedical entity linking
title_full Multilingual bi-encoder models for biomedical entity linking
title_fullStr Multilingual bi-encoder models for biomedical entity linking
title_full_unstemmed Multilingual bi-encoder models for biomedical entity linking
title_sort Multilingual bi-encoder models for biomedical entity linking
author Guven, Zekeriya Anil
author_facet Guven, Zekeriya Anil
Lamúrias, André
author_role author
author2 Lamúrias, André
author2_role author
dc.contributor.none.fl_str_mv DI - Departamento de Informática
RUN
dc.contributor.author.fl_str_mv Guven, Zekeriya Anil
Lamúrias, André
dc.subject.por.fl_str_mv biomedical entity linking
data analysis
entity linking
language model
multilingual analysis
natural language processing
Control and Systems Engineering
Theoretical Computer Science
Computational Theory and Mathematics
Artificial Intelligence
topic biomedical entity linking
data analysis
entity linking
language model
multilingual analysis
natural language processing
Control and Systems Engineering
Theoretical Computer Science
Computational Theory and Mathematics
Artificial Intelligence
description Funding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd.
publishDate 2023
dc.date.none.fl_str_mv 2023-11
2023-11-01T00:00:00Z
2024-02-22T23:53:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/163981
url http://hdl.handle.net/10362/163981
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0266-4720
PURE: 83890974
https://doi.org/10.1111/exsy.13388
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 14
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138175923257344