Development of tools for sentiment analysis in the portuguese language

Gonçalves, Jorge Miguel da Silva Brandão

Development of tools for sentiment analysis in the portuguese language

Detalhes bibliográficos
Autor(a) principal:	Gonçalves, Jorge Miguel da Silva Brandão
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/1822/84073
Resumo:	Dissertação de mestrado em Informatics Engineering

Metadados do item

id	RCAP_29200924532a67e3229429c175677cd1
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/84073
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Development of tools for sentiment analysis in the portuguese languageDeep learningMachine learningText miningSentiment analysisDissertação de mestrado em Informatics EngineeringA Análise de Sentimentos é uma das áreas mais importantes na ciência da computação, nomeadamente no Processamento da Linguagem Natural. As suas aplicações vão desde a análise de produtos até à contenção do cyberbullying. A importância da análise dos sentimentos é inigualável, mas quando se trata de línguas menos faladas, o campo parece ficar para trás. Neste contexto, Omnium AI propôs uma dissertação onde exploramos a Análise de Sentimentos para a Língua Portuguesa, com a intenção de criar uma nova ferramenta computacional. Esta dissertação vai examinar o campo da análise de sentimentos e o desenvolvimento do package Omnia. Este package é composto por ferramentas para a leitura de dados, o seu processamento e a criação de modelos Machine Learning (ML) e Deep Learning (DL) a partir dos dados lidos. Em específico, vamos concentrarnos no desenvolvimento do package Omnia Text Mining, com objectivo de criar ferramentas de pré-processamento e modelos de ML e DL para a análise de sentimentos para a língua portuguesa. Esta dissertação vai criar uma abordagem para lidar com problemas de análise de sentimentos composta por um processo de recolha de dados, seguido de um passo de pré-processamento e acabando com o desenvolvimento de modelos de ML e DL. Esta abordagem será aplicada ao tópico do Covid-19. Após serem criados os modelos para os datasets relativos ao Covid, avaliamos os resultados para as diferentes combinações de métodos de pré-processamento e modelos onde apuramos que as Long Short Term Memory (LSTM)s e o HFAutoModel com o embedding Bert foram os melhores modelos. No geral, os modelos de DL e Autogluon obtiveram melhores resultados que os modelos de ML. Nos métodos de pré-processamento visualizamos que não existe uma Pipeline geral que possa ser utilizada para todos os casos. No final, iremos discutir as conclusões que podemos retirar desta dissertação juntamente com uma secção de trabalho futuro, onde exploraremos os próximos passos possíveis para este projecto.Sentiment Analysis is one of the most important areas in computer science, namely in Natural Language Processing. Its applications range from product reviews to cyberbullying containment. The importance of sentiment analysis is unprecedented, but when it comes to lesser-used languages, the field seems to be lagging behind. In this context, Omnium AI proposed a dissertation where we explore Sentiment Analysis for the Portuguese Language with the aim of creating a new computational tool. This dissertation is going to delve into the sentiment analysis field and the development of the Omnia package. This package is composed of tools for reading datasets, processing them and creating ML and DL models from the data read. Specifically, we will focus on developing the Omnia Text Mining package, with aim of creating pre-processing tools and models for Sentiment Analysis (SA) in the Portuguese Language. This dissertation creates an approach to tackle SA problems that involve a data gathering step followed by a pre-processing step and finishing with a model step where we develop different ML and DL models. This approach will be applied to a Covid-19 topic. From this approach, we obtained two datasets, from which we created ML, DL and Autogluon models. After creating the models we evaluated the results from the different combinations of pre-processing methods (Pipelines) and ML and DL models where we ascertained that LSTMs and HFAutoModel with a Bert embedding were the best models for the datasets we used. In general, DL and Autogluon models gave us better results than ML. For the pre-processing Pipelines, we were able to visualise that there is no one Pipeline fits all solution, each model had different Pipelines working better. Lastly, we will discuss the conclusions we can take from this work along with a future work section, where we explore the possible next steps for this project.Rocha, MiguelPereira, VítorUniversidade do MinhoGonçalves, Jorge Miguel da Silva Brandão2022-12-192022-12-19T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1822/84073eng203252365info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:50:57Zoai:repositorium.sdum.uminho.pt:1822/84073Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:49:43.533887Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Development of tools for sentiment analysis in the portuguese language
title	Development of tools for sentiment analysis in the portuguese language
spellingShingle	Development of tools for sentiment analysis in the portuguese language Gonçalves, Jorge Miguel da Silva Brandão Deep learning Machine learning Text mining Sentiment analysis
title_short	Development of tools for sentiment analysis in the portuguese language
title_full	Development of tools for sentiment analysis in the portuguese language
title_fullStr	Development of tools for sentiment analysis in the portuguese language
title_full_unstemmed	Development of tools for sentiment analysis in the portuguese language
title_sort	Development of tools for sentiment analysis in the portuguese language
author	Gonçalves, Jorge Miguel da Silva Brandão
author_facet	Gonçalves, Jorge Miguel da Silva Brandão
author_role	author
dc.contributor.none.fl_str_mv	Rocha, Miguel Pereira, Vítor Universidade do Minho
dc.contributor.author.fl_str_mv	Gonçalves, Jorge Miguel da Silva Brandão
dc.subject.por.fl_str_mv	Deep learning Machine learning Text mining Sentiment analysis
topic	Deep learning Machine learning Text mining Sentiment analysis
description	Dissertação de mestrado em Informatics Engineering
publishDate	2022
dc.date.none.fl_str_mv	2022-12-19 2022-12-19T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1822/84073
url	https://hdl.handle.net/1822/84073
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	203252365
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799133079847043072

Development of tools for sentiment analysis in the portuguese language

Registros relacionados