Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035 |
Resumo: | Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined. |
id |
RCAP_d453b707623cb412d73b773b0cdad205 |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/107675 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similaritysemantic similarityword similaritylexical knowledge baseslexical semanticsword embeddingsdistributional semanticsIdentifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined.MDPI2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/107675http://hdl.handle.net/10316/107675https://doi.org/10.3390/info9020035eng2078-2489Oliveira, Hugo Gonçaloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-26T11:42:49Zoai:estudogeral.uc.pt:10316/107675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:24:00.031866Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
title |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
spellingShingle |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity Oliveira, Hugo Gonçalo semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics |
title_short |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
title_full |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
title_fullStr |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
title_full_unstemmed |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
title_sort |
Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity |
author |
Oliveira, Hugo Gonçalo |
author_facet |
Oliveira, Hugo Gonçalo |
author_role |
author |
dc.contributor.author.fl_str_mv |
Oliveira, Hugo Gonçalo |
dc.subject.por.fl_str_mv |
semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics |
topic |
semantic similarity word similarity lexical knowledge bases lexical semantics word embeddings distributional semantics |
description |
Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10316/107675 http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035 |
url |
http://hdl.handle.net/10316/107675 https://doi.org/10.3390/info9020035 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2078-2489 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134125856129024 |