Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews

Detalhes bibliográficos
Autor(a) principal: Parada, Daniel Moisés De Olival
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.13/5605
Resumo: This work investigates the usage of deep learning algorithms to perform sentiment analysis over restaurant reviews from the Zomato application, making use of natural language processing techniques to handle text data and taking advantage of the rating given by consumers to perform supervised training. This work presents two models developed from scratch to address the case study problem using recurrent neural networks and self-attention: Recurrent Encoder Classifier and Attentive Recurrent Encoder Classifier. These models were subject to two heuristic-based optimization procedures: a discrete genetic algorithm to select an optimal set of hyperparameters and optimal architecture and a grid search algorithm to optimize the text preprocessing steps. The usage of deep learning models with Portuguese data is limited; hence, the gain in performance was evaluated against classical machine learning models trained on Zomato’s dataset, verifying an improvement of 3% in F1-score. The genetic algorithm yielded a relative obtainable improvement score of 4.4% and 8.3% on the recurrent and attentive recurrent encoders architectures, respectively, against their baseline configuration, with the possibility of further optimization by increasing the number of generations. The grid search algorithm slightly improved the performance of each architecture. Both had comparable results, where the Attentive Recurrent Encoder Classifier presented the best performance with 76% of F1-score, 92.5% of ROC-AUC, and 82.7% of accuracy. Tests on a Raspberry Pi application to use the model for inference demonstrated the feasibility of the proposed approach for sentiment analysis in real-world, resource-constrained environments. The results of the study demonstrate that deep learning algorithms can effectively analyze sentiment and show superior results to the traditional ML algorithms and supports the need of exploring smaller, single-task Deep Learning models in the transition of businesses to solutions based on artificial intelligence.
id RCAP_9ca9cef81c0df17c63402d583e7ea203
oai_identifier_str oai:digituma.uma.pt:10400.13/5605
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant ReviewsProcessamento de linguagem naturalAnálise de sentimentosLíngua portuguesaAprendizagem profundaAlgoritmo genéticoDispositivo de bordaNatural language processingSentiment analysisPortuguese languageDeep learningGenetic algorithmEdge computingElectrical Engineering – Telecommunications.Faculdade de Ciências Exatas e da EngenhariaDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaThis work investigates the usage of deep learning algorithms to perform sentiment analysis over restaurant reviews from the Zomato application, making use of natural language processing techniques to handle text data and taking advantage of the rating given by consumers to perform supervised training. This work presents two models developed from scratch to address the case study problem using recurrent neural networks and self-attention: Recurrent Encoder Classifier and Attentive Recurrent Encoder Classifier. These models were subject to two heuristic-based optimization procedures: a discrete genetic algorithm to select an optimal set of hyperparameters and optimal architecture and a grid search algorithm to optimize the text preprocessing steps. The usage of deep learning models with Portuguese data is limited; hence, the gain in performance was evaluated against classical machine learning models trained on Zomato’s dataset, verifying an improvement of 3% in F1-score. The genetic algorithm yielded a relative obtainable improvement score of 4.4% and 8.3% on the recurrent and attentive recurrent encoders architectures, respectively, against their baseline configuration, with the possibility of further optimization by increasing the number of generations. The grid search algorithm slightly improved the performance of each architecture. Both had comparable results, where the Attentive Recurrent Encoder Classifier presented the best performance with 76% of F1-score, 92.5% of ROC-AUC, and 82.7% of accuracy. Tests on a Raspberry Pi application to use the model for inference demonstrated the feasibility of the proposed approach for sentiment analysis in real-world, resource-constrained environments. The results of the study demonstrate that deep learning algorithms can effectively analyze sentiment and show superior results to the traditional ML algorithms and supports the need of exploring smaller, single-task Deep Learning models in the transition of businesses to solutions based on artificial intelligence.Esta dissertação investiga a utilização de algoritmos de aprendizagem profunda para realizar análise de sentimentos em avaliações de restaurantes da aplicação Zomato, fazendo uso de técnicas de processamento de linguagem natural para lidar com dados de texto e aproveitando a classificação atribuída pelos consumidores para realizar o treino supervisionado. Este trabalho apresenta dois modelos desenvolvidos de raiz usando redes neurais recorrentes e mecanismos de atenção: Recurrent Encoder Classifier e Attentive Recurrent Encoder Classifier; para abordar o caso de estudo. Estes modelos foram submetidos a dois processos de otimização baseados em heurísticas, um algoritmo genético discreto para selecionar um conjunto ótimo de híper-parâmetros e configurações arquiteturais, e um algoritmo de pesquisa de grade para otimizar as etapas de pré processamento de texto. Dada a limitada utilização de modelos de aprendizagem profunda com dados em português, o seu desempenho foi comparado com modelos clássicos treinados nos dados da Zomato, revelando uma melhoria de 3% no F1. O algoritmo genético resultou num valor da métrica relative obtainable improvement de 4,4% e 8,3% para as arquiteturas com codificadores recursivos e recursivos com atenção, respetivamente, em comparação com suas configurações de referência, com a possibilidade de estender o processo de otimização aumentando o número de gerações. A pesquisa em grade melhorou ligeiramente o desempenho de cada arquitetura. Ambas as arquiteturas apresentaram resultados comparáveis, com a Attentive Recurrent Encoder obtendo o melhor desempenho, com 76% de pontuação F1, 92,5% de ROC-AUC e 82,7% de precisão. Testes numa aplicação com Raspberry Pi utilizando o modelo para inferência demonstraram a viabilidade da abordagem proposta para análise de sentimentos em cenários do mundo real, com recursos limitados. Os resultados indicam que os algoritmos de aprendizagem profunda podem analisar sentimentos de forma eficaz e mostrar resultados superiores aos algoritmos tradicionais, e apoiam a necessidade de explorar modelos de aprendizagem profunda pequenos e de uma única tarefa na transição das empresas para soluções baseadas em inteligência artificial.Dias, Fernando Manuel Rosmaninho Morgado FerrãoMendonça, Fábio Ruben SilvaDigitUMaParada, Daniel Moisés De Olival2024-02-222025-02-02T00:00:00Z2024-02-22T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.13/5605TID:203545222enginfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-17T05:58:51Zoai:digituma.uma.pt:10400.13/5605Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:01:54.896232Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
title Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
spellingShingle Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
Parada, Daniel Moisés De Olival
Processamento de linguagem natural
Análise de sentimentos
Língua portuguesa
Aprendizagem profunda
Algoritmo genético
Dispositivo de borda
Natural language processing
Sentiment analysis
Portuguese language
Deep learning
Genetic algorithm
Edge computing
Electrical Engineering – Telecommunications
.
Faculdade de Ciências Exatas e da Engenharia
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
title_full Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
title_fullStr Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
title_full_unstemmed Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
title_sort Deep Learning for Sentiment Analysis: a case study about Portuguese Restaurant Reviews
author Parada, Daniel Moisés De Olival
author_facet Parada, Daniel Moisés De Olival
author_role author
dc.contributor.none.fl_str_mv Dias, Fernando Manuel Rosmaninho Morgado Ferrão
Mendonça, Fábio Ruben Silva
DigitUMa
dc.contributor.author.fl_str_mv Parada, Daniel Moisés De Olival
dc.subject.por.fl_str_mv Processamento de linguagem natural
Análise de sentimentos
Língua portuguesa
Aprendizagem profunda
Algoritmo genético
Dispositivo de borda
Natural language processing
Sentiment analysis
Portuguese language
Deep learning
Genetic algorithm
Edge computing
Electrical Engineering – Telecommunications
.
Faculdade de Ciências Exatas e da Engenharia
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Processamento de linguagem natural
Análise de sentimentos
Língua portuguesa
Aprendizagem profunda
Algoritmo genético
Dispositivo de borda
Natural language processing
Sentiment analysis
Portuguese language
Deep learning
Genetic algorithm
Edge computing
Electrical Engineering – Telecommunications
.
Faculdade de Ciências Exatas e da Engenharia
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description This work investigates the usage of deep learning algorithms to perform sentiment analysis over restaurant reviews from the Zomato application, making use of natural language processing techniques to handle text data and taking advantage of the rating given by consumers to perform supervised training. This work presents two models developed from scratch to address the case study problem using recurrent neural networks and self-attention: Recurrent Encoder Classifier and Attentive Recurrent Encoder Classifier. These models were subject to two heuristic-based optimization procedures: a discrete genetic algorithm to select an optimal set of hyperparameters and optimal architecture and a grid search algorithm to optimize the text preprocessing steps. The usage of deep learning models with Portuguese data is limited; hence, the gain in performance was evaluated against classical machine learning models trained on Zomato’s dataset, verifying an improvement of 3% in F1-score. The genetic algorithm yielded a relative obtainable improvement score of 4.4% and 8.3% on the recurrent and attentive recurrent encoders architectures, respectively, against their baseline configuration, with the possibility of further optimization by increasing the number of generations. The grid search algorithm slightly improved the performance of each architecture. Both had comparable results, where the Attentive Recurrent Encoder Classifier presented the best performance with 76% of F1-score, 92.5% of ROC-AUC, and 82.7% of accuracy. Tests on a Raspberry Pi application to use the model for inference demonstrated the feasibility of the proposed approach for sentiment analysis in real-world, resource-constrained environments. The results of the study demonstrate that deep learning algorithms can effectively analyze sentiment and show superior results to the traditional ML algorithms and supports the need of exploring smaller, single-task Deep Learning models in the transition of businesses to solutions based on artificial intelligence.
publishDate 2024
dc.date.none.fl_str_mv 2024-02-22
2024-02-22T00:00:00Z
2025-02-02T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.13/5605
TID:203545222
url http://hdl.handle.net/10400.13/5605
identifier_str_mv TID:203545222
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138191819669504