COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models

Detalhes bibliográficos
Autor(a) principal: Bania, Rubul Kumar
Data de Publicação: 2020
Tipo de documento: Artigo
Idioma: eng
Título da fonte: INFOCOMP: Jornal de Ciência da Computação
Texto Completo: https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985
Resumo: The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes.  As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.
id UFLA-5_c4bd0279c8ca70a5d1e69c0e38436b46
oai_identifier_str oai:infocomp.dcc.ufla.br:article/985
network_acronym_str UFLA-5
network_name_str INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning ModelsThe fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes.  As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.Editora da UFLA2020-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985INFOCOMP Journal of Computer Science; Vol. 19 No. 2 (2020): December 2020; 23-411982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985/541Copyright (c) 2020 Rubul Kumar Baniainfo:eu-repo/semantics/openAccessBania, Rubul Kumar2020-12-01T21:34:08Zoai:infocomp.dcc.ufla.br:article/985Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:45.558515INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
title COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
spellingShingle COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
Bania, Rubul Kumar
title_short COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
title_full COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
title_fullStr COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
title_full_unstemmed COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
title_sort COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
author Bania, Rubul Kumar
author_facet Bania, Rubul Kumar
author_role author
dc.contributor.author.fl_str_mv Bania, Rubul Kumar
description The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes.  As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.
publishDate 2020
dc.date.none.fl_str_mv 2020-12-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985
url https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985/541
dc.rights.driver.fl_str_mv Copyright (c) 2020 Rubul Kumar Bania
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2020 Rubul Kumar Bania
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Editora da UFLA
publisher.none.fl_str_mv Editora da UFLA
dc.source.none.fl_str_mv INFOCOMP Journal of Computer Science; Vol. 19 No. 2 (2020): December 2020; 23-41
1982-3363
1807-4545
reponame:INFOCOMP: Jornal de Ciência da Computação
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str INFOCOMP: Jornal de Ciência da Computação
collection INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv infocomp@dcc.ufla.br||apfreire@dcc.ufla.br
_version_ 1799874742634676224