Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/25119 |
Resumo: | The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%. |
id |
RCAP_48bc9ab6528b9e3bfeaa2525b3c70f8d |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/25119 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguationBiomedical word sense disambiguationKnowledge-based approachesWord embeddingsThe biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.UA Editora2019-01-15T16:25:23Z2017-10-01T00:00:00Z2017-10conference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/25119eng978-972-789-522-9Antunes, RuiMatos, Sérgioinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-06T04:18:14Zoai:ria.ua.pt:10773/25119Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-06T04:18:14Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
title |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
spellingShingle |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation Antunes, Rui Biomedical word sense disambiguation Knowledge-based approaches Word embeddings |
title_short |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
title_full |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
title_fullStr |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
title_full_unstemmed |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
title_sort |
Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation |
author |
Antunes, Rui |
author_facet |
Antunes, Rui Matos, Sérgio |
author_role |
author |
author2 |
Matos, Sérgio |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Antunes, Rui Matos, Sérgio |
dc.subject.por.fl_str_mv |
Biomedical word sense disambiguation Knowledge-based approaches Word embeddings |
topic |
Biomedical word sense disambiguation Knowledge-based approaches Word embeddings |
description |
The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-10-01T00:00:00Z 2017-10 2019-01-15T16:25:23Z |
dc.type.driver.fl_str_mv |
conference object |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/25119 |
url |
http://hdl.handle.net/10773/25119 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
978-972-789-522-9 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
UA Editora |
publisher.none.fl_str_mv |
UA Editora |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817543695594946560 |