Predictive analysis of COVID-19 symptoms in social networks through machine learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10198/25323 |
Resumo: | Social media is a great source of data for analyses, since they provide ways for people to share emotions, feelings, ideas, and even symptoms of diseases. By the end of 2019, a global pandemic alert was raised, relative to a virus that had a high contamination rate and could cause respiratory complications. To help identify those who may have the symptoms of this disease or to detect who is already infected, this paper analyzed the performance of eight machine learning algorithms (KNN, Naive Bayes, Decision Tree, Random Forest, SVM, simple Multilayer Perceptron, Convolutional Neural Networks and BERT) in the search and classification of tweets that mention self-report of COVID-19 symptoms. The dataset was labeled using a set of disease symptom keywords provided by the World Health Organization. The tests showed that Random Forest algorithm had the best results, closely followed by BERT and Convolution Neural Network, although traditional machine learning algorithms also have can also provide good results. This work could also aid in the selection of algorithms in the identification of diseases symptoms in social media content. |
id |
RCAP_b33fbf6247d671f53d9a5ede987713c6 |
---|---|
oai_identifier_str |
oai:bibliotecadigital.ipb.pt:10198/25323 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Predictive analysis of COVID-19 symptoms in social networks through machine learningNatural language processingMachine learningText classificationCOVID-19Tweet analysisSocial media is a great source of data for analyses, since they provide ways for people to share emotions, feelings, ideas, and even symptoms of diseases. By the end of 2019, a global pandemic alert was raised, relative to a virus that had a high contamination rate and could cause respiratory complications. To help identify those who may have the symptoms of this disease or to detect who is already infected, this paper analyzed the performance of eight machine learning algorithms (KNN, Naive Bayes, Decision Tree, Random Forest, SVM, simple Multilayer Perceptron, Convolutional Neural Networks and BERT) in the search and classification of tweets that mention self-report of COVID-19 symptoms. The dataset was labeled using a set of disease symptom keywords provided by the World Health Organization. The tests showed that Random Forest algorithm had the best results, closely followed by BERT and Convolution Neural Network, although traditional machine learning algorithms also have can also provide good results. This work could also aid in the selection of algorithms in the identification of diseases symptoms in social media content.This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the Project Scope: DSAIPA/AI/0088/2020Biblioteca Digital do IPBSilva, Clístenes Fernandes daJunior, Arnaldo CandidoLopes, Rui Pedro2022-04-04T10:28:25Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10198/25323engSilva, Clístenes Fernandes da; Junior, Arnaldo Candido; Lopes, Rui Pedro (2022). Predictive analysis of COVID-19 symptoms in social networks through machine learning. Electronics. ISSN 2079-9292. 11:4, p. 1-1410.3390/electronics110405802079-9292info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-21T10:56:41Zoai:bibliotecadigital.ipb.pt:10198/25323Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:15:59.718995Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
title |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
spellingShingle |
Predictive analysis of COVID-19 symptoms in social networks through machine learning Silva, Clístenes Fernandes da Natural language processing Machine learning Text classification COVID-19 Tweet analysis |
title_short |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
title_full |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
title_fullStr |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
title_full_unstemmed |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
title_sort |
Predictive analysis of COVID-19 symptoms in social networks through machine learning |
author |
Silva, Clístenes Fernandes da |
author_facet |
Silva, Clístenes Fernandes da Junior, Arnaldo Candido Lopes, Rui Pedro |
author_role |
author |
author2 |
Junior, Arnaldo Candido Lopes, Rui Pedro |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Biblioteca Digital do IPB |
dc.contributor.author.fl_str_mv |
Silva, Clístenes Fernandes da Junior, Arnaldo Candido Lopes, Rui Pedro |
dc.subject.por.fl_str_mv |
Natural language processing Machine learning Text classification COVID-19 Tweet analysis |
topic |
Natural language processing Machine learning Text classification COVID-19 Tweet analysis |
description |
Social media is a great source of data for analyses, since they provide ways for people to share emotions, feelings, ideas, and even symptoms of diseases. By the end of 2019, a global pandemic alert was raised, relative to a virus that had a high contamination rate and could cause respiratory complications. To help identify those who may have the symptoms of this disease or to detect who is already infected, this paper analyzed the performance of eight machine learning algorithms (KNN, Naive Bayes, Decision Tree, Random Forest, SVM, simple Multilayer Perceptron, Convolutional Neural Networks and BERT) in the search and classification of tweets that mention self-report of COVID-19 symptoms. The dataset was labeled using a set of disease symptom keywords provided by the World Health Organization. The tests showed that Random Forest algorithm had the best results, closely followed by BERT and Convolution Neural Network, although traditional machine learning algorithms also have can also provide good results. This work could also aid in the selection of algorithms in the identification of diseases symptoms in social media content. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-04-04T10:28:25Z 2022 2022-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10198/25323 |
url |
http://hdl.handle.net/10198/25323 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Silva, Clístenes Fernandes da; Junior, Arnaldo Candido; Lopes, Rui Pedro (2022). Predictive analysis of COVID-19 symptoms in social networks through machine learning. Electronics. ISSN 2079-9292. 11:4, p. 1-14 10.3390/electronics11040580 2079-9292 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135443921403904 |