Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Detalhes bibliográficos
Autor(a) principal: Oliveira, Hugo Gonçalo
Data de Publicação: 2018
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10316/107675
https://doi.org/10.3390/info9020035
Resumo: Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
id RCAP_d453b707623cb412d73b773b0cdad205
oai_identifier_str oai:estudogeral.uc.pt:10316/107675
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similaritysemantic similarityword similaritylexical knowledge baseslexical semanticsword embeddingsdistributional semanticsIdentifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.MDPI2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/107675http://hdl.handle.net/10316/107675https://doi.org/10.3390/info9020035eng2078-2489Oliveira, Hugo Gonçaloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-26T11:42:49Zoai:estudogeral.uc.pt:10316/107675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:24:00.031866Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
spellingShingle Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
Oliveira, Hugo Gonçalo
semantic similarity
word similarity
lexical knowledge bases
lexical semantics
word embeddings
distributional semantics
title_short Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_fullStr Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_full_unstemmed Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
title_sort Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
author Oliveira, Hugo Gonçalo
author_facet Oliveira, Hugo Gonçalo
author_role author
dc.contributor.author.fl_str_mv Oliveira, Hugo Gonçalo
dc.subject.por.fl_str_mv semantic similarity
word similarity
lexical knowledge bases
lexical semantics
word embeddings
distributional semantics
topic semantic similarity
word similarity
lexical knowledge bases
lexical semantics
word embeddings
distributional semantics
description Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.
publishDate 2018
dc.date.none.fl_str_mv 2018
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10316/107675
http://hdl.handle.net/10316/107675
https://doi.org/10.3390/info9020035
url http://hdl.handle.net/10316/107675
https://doi.org/10.3390/info9020035
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2078-2489
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134125856129024