Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets

Bania, Rubul Kumar

Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets

Detalhes bibliográficos
Autor(a) principal:	Bania, Rubul Kumar
Data de Publicação:	2021
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	INFOCOMP: Jornal de Ciência da Computação
Texto Completo:	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763
Resumo:	During catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes.

Metadados do item

id	UFLA-5_8c7d82245b44b704e7cba299c70aa9d9
oai_identifier_str	oai:infocomp.dcc.ufla.br:article/1763
network_acronym_str	UFLA-5
network_name_str	INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 TweetsDuring catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes.Editora da UFLA2021-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763INFOCOMP Journal of Computer Science; Vol. 20 No. 2 (2021): December 20211982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763/572Copyright (c) 2021 Rubul Kumar Baniainfo:eu-repo/semantics/openAccessBania, Rubul Kumar2021-12-01T17:16:52Zoai:infocomp.dcc.ufla.br:article/1763Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:47.309765INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
title	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
spellingShingle	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets Bania, Rubul Kumar
title_short	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
title_full	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
title_fullStr	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
title_full_unstemmed	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
title_sort	Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets
author	Bania, Rubul Kumar
author_facet	Bania, Rubul Kumar
author_role	author
dc.contributor.author.fl_str_mv	Bania, Rubul Kumar
description	During catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes.
publishDate	2021
dc.date.none.fl_str_mv	2021-12-01
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763
url	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763/572
dc.rights.driver.fl_str_mv	Copyright (c) 2021 Rubul Kumar Bania info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2021 Rubul Kumar Bania
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Editora da UFLA
publisher.none.fl_str_mv	Editora da UFLA
dc.source.none.fl_str_mv	INFOCOMP Journal of Computer Science; Vol. 20 No. 2 (2021): December 2021 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	INFOCOMP: Jornal de Ciência da Computação
collection	INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv	INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	infocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br
_version_	1799874742673473536

Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets

Registros relacionados