Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings

Domingues, Gabriel Couto

Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings

Detalhes bibliográficos
Autor(a) principal:	Domingues, Gabriel Couto
Data de Publicação:	2023
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRGS
Texto Completo:	http://hdl.handle.net/10183/267624
Resumo:	Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score.

Metadados do item

id	UFRGS-2_da212795b0814357f858428c88c68677
oai_identifier_str	oai:www.lume.ufrgs.br:10183/267624
network_acronym_str	UFRGS-2
network_name_str	Repositório Institucional da UFRGS
repository_id_str
spelling	Domingues, Gabriel CoutoCarbonera, Joel LuisLopes Junior, Alcides Gonçalves2023-11-25T03:26:22Z2023http://hdl.handle.net/10183/267624001187681Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score.Modelos de conhecimento explícito são artefatos que representam conhecimento de domí- nio de forma explícita e podem ser usados de diferentes maneiras, incluindo estruturação de dados e suporte à recuperação de informações e raciocínio. A identificação e classificação das relações semânticas entre conceitos é uma tarefa crítica no desenvolvimento de modelos de conhecimento. Este trabalho investiga o uso de abordagens de aprendizado de máquina e word embeddings estáticos pré-treinados para classificar relações semânticas entre conceitos, avaliando diferentes técnicas para lidar com os desafios impostos por dados desbalanceados neste contexto. Propomos uma metodologia para construir conjuntos de dados para a tarefa de classificação de relações semânticas a partir de word embeddings usando o WordNet como referência semântica. Ao aplicar a metodologia proposta, geramos dois conjuntos de dados diferentes, com duas variações, para a tarefa de classificação. Por fim, avaliamos um conjunto de abordagens gerais para lidar com desbalanceamento de dados em tarefas de classificação. Nossos resultados indicaram que, enquanto algumas estratégias, como o SMOTE, mostraram promessa em métricas específicas, o modelo base demonstrou consistentemente um desempenho superior em termos de F1 score.application/pdfengAprendizado de máquinaRedes neuraisSemântica computacionalWord EmbeddingsSupervised LearningOntologiesKnowledge GraphsWordNetEvaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddingsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPorto Alegre, BR-RS2023Ciência da Computação: Ênfase em Ciência da Computação: Bachareladograduaçãoinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001187681.pdf.txt001187681.pdf.txtExtracted Texttext/plain114979http://www.lume.ufrgs.br/bitstream/10183/267624/2/001187681.pdf.txtb05776512cdb57bb038e01af1f4f86feMD52ORIGINAL001187681.pdfTexto completo (inglês)application/pdf7370989http://www.lume.ufrgs.br/bitstream/10183/267624/1/001187681.pdf0507fca2c20a7f360583d95bff0985b6MD5110183/2676242023-11-26 04:25:54.705086oai:www.lume.ufrgs.br:10183/267624Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2023-11-26T06:25:54Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
title	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
spellingShingle	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings Domingues, Gabriel Couto Aprendizado de máquina Redes neurais Semântica computacional Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet
title_short	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
title_full	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
title_fullStr	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
title_full_unstemmed	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
title_sort	Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings
author	Domingues, Gabriel Couto
author_facet	Domingues, Gabriel Couto
author_role	author
dc.contributor.author.fl_str_mv	Domingues, Gabriel Couto
dc.contributor.advisor1.fl_str_mv	Carbonera, Joel Luis
dc.contributor.advisor-co1.fl_str_mv	Lopes Junior, Alcides Gonçalves
contributor_str_mv	Carbonera, Joel Luis Lopes Junior, Alcides Gonçalves
dc.subject.por.fl_str_mv	Aprendizado de máquina Redes neurais Semântica computacional
topic	Aprendizado de máquina Redes neurais Semântica computacional Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet
dc.subject.eng.fl_str_mv	Word Embeddings Supervised Learning Ontologies Knowledge Graphs WordNet
description	Explicit knowledge models are artifacts that represent domain knowledge in an explicit way and can be used in different ways, including structuring data, supporting information retrieval and reasoning. The identification and classification of semantic relationships between concepts is a critical task in the development of knowledge models. This work investigates the use of machine learning approaches and pre-trained static word embeddings to classify semantic relationships between concepts, evaluating different techniques to deal with the challenges imposed by data imbalance in this context. We proposed a methodology for building datasets for the task of semantic relationship classification from word embeddings using WordNet as a semantic reference. By applying the proposed methodology, we generated two different datasets, with two variations, for the target task. Finally, we evaluated a set of general approaches for dealing with data imbalance in classification tasks. Our results indicated that while some strategies like SMOTE showed promise in specific metrics, the baseline model consistently achieved superior performance in terms of F1 score.
publishDate	2023
dc.date.accessioned.fl_str_mv	2023-11-25T03:26:22Z
dc.date.issued.fl_str_mv	2023
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/267624
dc.identifier.nrb.pt_BR.fl_str_mv	001187681
url	http://hdl.handle.net/10183/267624
identifier_str_mv	001187681
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Repositório Institucional da UFRGS
collection	Repositório Institucional da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/267624/2/001187681.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/267624/1/001187681.pdf
bitstream.checksum.fl_str_mv	b05776512cdb57bb038e01af1f4f86fe 0507fca2c20a7f360583d95bff0985b6
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_	1815447353042141184

Evaluating data imbalance approaches for classifying semantic relations using machine learning and word embeddings

Registros relacionados