Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Oliveira, Hugo Gonçalo

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Detalhes bibliográficos
Autor(a) principal:	Oliveira, Hugo Gonçalo
Data de Publicação:	2018
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035
Resumo:	Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.

Metadados do item

id	RCAP_d453b707623cb412d73b773b0cdad205
oai_identifier_str	oai:estudogeral.uc.pt:10316/107675
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similaritysemantic similarityword similaritylexical knowledge baseslexical semanticsword embeddingsdistributional semanticsIdentifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.MDPI2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/107675http://hdl.handle.net/10316/107675https://doi.org/10.3390/info9020035eng2078-2489Oliveira, Hugo Gonçaloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-26T11:42:49Zoai:estudogeral.uc.pt:10316/107675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:24:00.031866Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
spellingShingle	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity Oliveira, Hugo Gonçalo semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
title_short	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_fullStr	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full_unstemmed	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_sort	Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
author	Oliveira, Hugo Gonçalo
author_facet	Oliveira, Hugo Gonçalo
author_role	author
dc.contributor.author.fl_str_mv	Oliveira, Hugo Gonçalo
dc.subject.por.fl_str_mv	semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
topic	semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics
description	Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
publishDate	2018
dc.date.none.fl_str_mv	2018
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10316/107675 http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035
url	http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	2078-2489
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	MDPI
publisher.none.fl_str_mv	MDPI
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134125856129024

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Registros relacionados