COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | INFOCOMP: Jornal de Ciência da Computação |
Texto Completo: | https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985 |
Resumo: | The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes. As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high. |
id |
UFLA-5_c4bd0279c8ca70a5d1e69c0e38436b46 |
---|---|
oai_identifier_str |
oai:infocomp.dcc.ufla.br:article/985 |
network_acronym_str |
UFLA-5 |
network_name_str |
INFOCOMP: Jornal de Ciência da Computação |
repository_id_str |
|
spelling |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning ModelsThe fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes. As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.Editora da UFLA2020-12-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985INFOCOMP Journal of Computer Science; Vol. 19 No. 2 (2020): December 2020; 23-411982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985/541Copyright (c) 2020 Rubul Kumar Baniainfo:eu-repo/semantics/openAccessBania, Rubul Kumar2020-12-01T21:34:08Zoai:infocomp.dcc.ufla.br:article/985Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:45.558515INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true |
dc.title.none.fl_str_mv |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
title |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
spellingShingle |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models Bania, Rubul Kumar |
title_short |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
title_full |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
title_fullStr |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
title_full_unstemmed |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
title_sort |
COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models |
author |
Bania, Rubul Kumar |
author_facet |
Bania, Rubul Kumar |
author_role |
author |
dc.contributor.author.fl_str_mv |
Bania, Rubul Kumar |
description |
The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes. As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-12-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985 |
url |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985/541 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2020 Rubul Kumar Bania info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2020 Rubul Kumar Bania |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Editora da UFLA |
publisher.none.fl_str_mv |
Editora da UFLA |
dc.source.none.fl_str_mv |
INFOCOMP Journal of Computer Science; Vol. 19 No. 2 (2020): December 2020; 23-41 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
INFOCOMP: Jornal de Ciência da Computação |
collection |
INFOCOMP: Jornal de Ciência da Computação |
repository.name.fl_str_mv |
INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
infocomp@dcc.ufla.br||apfreire@dcc.ufla.br |
_version_ |
1799874742634676224 |