Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation

Antunes, Rui; Matos, Sérgio

Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation

Detalhes bibliográficos
Autor(a) principal:	Antunes, Rui
Data de Publicação:	2017
Outros Autores:	Matos, Sérgio
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/25119
Resumo:	The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.

Metadados do item

id	RCAP_48bc9ab6528b9e3bfeaa2525b3c70f8d
oai_identifier_str	oai:ria.ua.pt:10773/25119
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguationBiomedical word sense disambiguationKnowledge-based approachesWord embeddingsThe biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.UA Editora2019-01-15T16:25:23Z2017-10-01T00:00:00Z2017-10conference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/25119eng978-972-789-522-9Antunes, RuiMatos, Sérgioinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-06T04:18:14Zoai:ria.ua.pt:10773/25119Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-06T04:18:14Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
spellingShingle	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation Antunes, Rui Biomedical word sense disambiguation Knowledge-based approaches Word embeddings
title_short	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_full	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_fullStr	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_full_unstemmed	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_sort	Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
author	Antunes, Rui
author_facet	Antunes, Rui Matos, Sérgio
author_role	author
author2	Matos, Sérgio
author2_role	author
dc.contributor.author.fl_str_mv	Antunes, Rui Matos, Sérgio
dc.subject.por.fl_str_mv	Biomedical word sense disambiguation Knowledge-based approaches Word embeddings
topic	Biomedical word sense disambiguation Knowledge-based approaches Word embeddings
description	The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.
publishDate	2017
dc.date.none.fl_str_mv	2017-10-01T00:00:00Z 2017-10 2019-01-15T16:25:23Z
dc.type.driver.fl_str_mv	conference object
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/25119
url	http://hdl.handle.net/10773/25119
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	978-972-789-522-9
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	UA Editora
publisher.none.fl_str_mv	UA Editora
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv	mluisa.alvim@gmail.com
_version_	1817543695594946560

Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation

Registros relacionados