Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | INFOCOMP: Jornal de Ciência da Computação |
Texto Completo: | https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763 |
Resumo: | During catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes. |
id |
UFLA-5_8c7d82245b44b704e7cba299c70aa9d9 |
---|---|
oai_identifier_str |
oai:infocomp.dcc.ufla.br:article/1763 |
network_acronym_str |
UFLA-5 |
network_name_str |
INFOCOMP: Jornal de Ciência da Computação |
repository_id_str |
|
spelling |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 TweetsDuring catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes.Editora da UFLA2021-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763INFOCOMP Journal of Computer Science; Vol. 20 No. 2 (2021): December 20211982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763/572Copyright (c) 2021 Rubul Kumar Baniainfo:eu-repo/semantics/openAccessBania, Rubul Kumar2021-12-01T17:16:52Zoai:infocomp.dcc.ufla.br:article/1763Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:47.309765INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true |
dc.title.none.fl_str_mv |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
title |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
spellingShingle |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets Bania, Rubul Kumar |
title_short |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
title_full |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
title_fullStr |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
title_full_unstemmed |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
title_sort |
Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets |
author |
Bania, Rubul Kumar |
author_facet |
Bania, Rubul Kumar |
author_role |
author |
dc.contributor.author.fl_str_mv |
Bania, Rubul Kumar |
description |
During catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-12-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763 |
url |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763/572 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2021 Rubul Kumar Bania info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2021 Rubul Kumar Bania |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Editora da UFLA |
publisher.none.fl_str_mv |
Editora da UFLA |
dc.source.none.fl_str_mv |
INFOCOMP Journal of Computer Science; Vol. 20 No. 2 (2021): December 2021 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
INFOCOMP: Jornal de Ciência da Computação |
collection |
INFOCOMP: Jornal de Ciência da Computação |
repository.name.fl_str_mv |
INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
infocomp@dcc.ufla.br||apfreire@dcc.ufla.br |
_version_ |
1799874742673473536 |