Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/267624 |
Resumo: | Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score. |
id |
UFRGS-2_da212795b0814357f858428c88c68677 |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/267624 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Domingues, Gabriel CoutoCarbonera, Joel LuisLopes Junior, Alcides Gonçalves2023-11-25T03:26:22Z2023http://hdl.handle.net/10183/267624001187681Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score.Modelos de conhecimento explícito são artefatos que representam conhecimento de domí- nio de forma explícita e podem ser usados de diferentes maneiras, incluindo estruturação de dados e suporte à recuperação de informações e raciocínio. A identificação e classificação das relações semânticas entre conceitos é uma tarefa crítica no desenvolvimento de modelos de conhecimento. Este trabalho investiga o uso de abordagens de aprendizado de máquina e word embeddings estáticos pré-treinados para classificar relações semânticas entre conceitos, avaliando diferentes técnicas para lidar com os desafios impostos por dados desbalanceados neste contexto. Propomos uma metodologia para construir conjuntos de dados para a tarefa de classificação de relações semânticas a partir de word embeddings usando o WordNet como referência semântica. Ao aplicar a metodologia proposta, geramos dois conjuntos de dados diferentes, com duas variações, para a tarefa de classificação. Por fim, avaliamos um conjunto de abordagens gerais para lidar com desbalanceamento de dados em tarefas de classificação. Nossos resultados indicaram que, enquanto algumas estratégias, como o SMOTE, mostraram promessa em métricas específicas, o modelo base demonstrou consistentemente um desempenho superior em termos de F1 score.application/pdfengAprendizado de máquinaRedes neuraisSemântica computacionalWord EmbeddingsSupervised LearningOntologiesKnowledge GraphsWordNetEvaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddingsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2023Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001187681.pdf.txt001187681.pdf.txtExtracted Texttext/plain114979http://www.lume.ufrgs.br/bitstream/10183/267624/2/001187681.pdf.txtb05776512cdb57bb038e01af1f4f86feMD52ORIGINAL001187681.pdfTexto completo (inglês)application/pdf7370989http://www.lume.ufrgs.br/bitstream/10183/267624/1/001187681.pdf0507fca2c20a7f360583d95bff0985b6MD5110183/2676242023-11-26 04:25:54.705086oai:www.lume.ufrgs.br:10183/267624Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-11-26T06:25:54Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
title |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
spellingShingle |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings Domingues, Gabriel Couto Aprendizado de máquina Redes neurais Semântica computacional Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet |
title_short |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
title_full |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
title_fullStr |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
title_full_unstemmed |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
title_sort |
Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings |
author |
Domingues, Gabriel Couto |
author_facet |
Domingues, Gabriel Couto |
author_role |
author |
dc.contributor.author.fl_str_mv |
Domingues, Gabriel Couto |
dc.contributor.advisor1.fl_str_mv |
Carbonera, Joel Luis |
dc.contributor.advisor-co1.fl_str_mv |
Lopes Junior, Alcides Gonçalves |
contributor_str_mv |
Carbonera, Joel Luis Lopes Junior, Alcides Gonçalves |
dc.subject.por.fl_str_mv |
Aprendizado de máquina Redes neurais Semântica computacional |
topic |
Aprendizado de máquina Redes neurais Semântica computacional Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet |
dc.subject.eng.fl_str_mv |
Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet |
description |
Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-11-25T03:26:22Z |
dc.date.issued.fl_str_mv |
2023 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/267624 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001187681 |
url |
http://hdl.handle.net/10183/267624 |
identifier_str_mv |
001187681 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/267624/2/001187681.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/267624/1/001187681.pdf |
bitstream.checksum.fl_str_mv |
b05776512cdb57bb038e01af1f4f86fe 0507fca2c20a7f360583d95bff0985b6 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1815447353042141184 |