Multilingual bi-encoder models for biomedical entity linking
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/163981 |
Resumo: | Funding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd. |
id |
RCAP_6a5593e1500814d2b888e523a6125f65 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/163981 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Multilingual bi-encoder models for biomedical entity linkingbiomedical entity linkingdata analysisentity linkinglanguage modelmultilingual analysisnatural language processingControl and Systems EngineeringTheoretical Computer ScienceComputational Theory and MathematicsArtificial IntelligenceFunding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd.Natural language processing (NLP) is a field of study that focuses on data analysis on texts with certain methods. NLP includes tasks such as sentiment analysis, spam detection, entity linking, and question answering, to name a few. Entity linking is an NLP task that is used to map mentions specified in the text to the entities of a Knowledge Base. In this study, we analysed the efficacy of bi-encoder entity linking models for multilingual biomedical texts. Using surface-based, approximate nearest neighbour search and embedding approaches during the candidate generation phase, accuracy, and recall values were measured on language representation models such as BERT, SapBERT, BioBERT, and RoBERTa according to language and domain. The proposed entity linking framework was analysed on the BC5CDR and Cantemist datasets for English and Spanish, respectively. The framework achieved 76.75% accuracy for the BC5CDR and 60.19% for the Cantemist. In addition, the proposed framework was compared with previous studies. The results highlight the challenges that come with domain-specific multilingual datasets.DI - Departamento de InformáticaRUNGuven, Zekeriya AnilLamúrias, André2024-02-22T23:53:00Z2023-112023-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article14application/pdfhttp://hdl.handle.net/10362/163981eng0266-4720PURE: 83890974https://doi.org/10.1111/exsy.13388info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:50:01Zoai:run.unl.pt:10362/163981Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:59:58.428437Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Multilingual bi-encoder models for biomedical entity linking |
title |
Multilingual bi-encoder models for biomedical entity linking |
spellingShingle |
Multilingual bi-encoder models for biomedical entity linking Guven, Zekeriya Anil biomedical entity linking data analysis entity linking language model multilingual analysis natural language processing Control and Systems Engineering Theoretical Computer Science Computational Theory and Mathematics Artificial Intelligence |
title_short |
Multilingual bi-encoder models for biomedical entity linking |
title_full |
Multilingual bi-encoder models for biomedical entity linking |
title_fullStr |
Multilingual bi-encoder models for biomedical entity linking |
title_full_unstemmed |
Multilingual bi-encoder models for biomedical entity linking |
title_sort |
Multilingual bi-encoder models for biomedical entity linking |
author |
Guven, Zekeriya Anil |
author_facet |
Guven, Zekeriya Anil Lamúrias, André |
author_role |
author |
author2 |
Lamúrias, André |
author2_role |
author |
dc.contributor.none.fl_str_mv |
DI - Departamento de Informática RUN |
dc.contributor.author.fl_str_mv |
Guven, Zekeriya Anil Lamúrias, André |
dc.subject.por.fl_str_mv |
biomedical entity linking data analysis entity linking language model multilingual analysis natural language processing Control and Systems Engineering Theoretical Computer Science Computational Theory and Mathematics Artificial Intelligence |
topic |
biomedical entity linking data analysis entity linking language model multilingual analysis natural language processing Control and Systems Engineering Theoretical Computer Science Computational Theory and Mathematics Artificial Intelligence |
description |
Funding Information: The authors would like to thank Prof. Dr. Murat Osman Unalir and Prof. Dr. Katja Hose. Publisher Copyright: © 2023 The Authors. Expert Systems published by John Wiley & Sons Ltd. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-11 2023-11-01T00:00:00Z 2024-02-22T23:53:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/163981 |
url |
http://hdl.handle.net/10362/163981 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0266-4720 PURE: 83890974 https://doi.org/10.1111/exsy.13388 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
14 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138175923257344 |