Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data

Detalhes bibliográficos
Autor(a) principal: Antas, João
Data de Publicação: 2022
Outros Autores: Silva, Rodrigo Rocha, Bernardino, Jorge
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10316/100540
https://doi.org/10.3390/computers11020029
Resumo: COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.
id RCAP_cf46a63fb2856f621a2dad20099e304a
oai_identifier_str oai:estudogeral.uc.pt:10316/100540
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Databig dataCOVID-19Data MiningSQL and NoSQL databasesCOVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.2022info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/100540http://hdl.handle.net/10316/100540https://doi.org/10.3390/computers11020029eng2073-431XAntas, JoãoSilva, Rodrigo RochaBernardino, Jorgeinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-06-30T20:31:26Zoai:estudogeral.uc.pt:10316/100540Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:17:54.455472Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
spellingShingle Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
Antas, João
big data
COVID-19
Data Mining
SQL and NoSQL databases
title_short Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_full Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_fullStr Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_full_unstemmed Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
title_sort Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
author Antas, João
author_facet Antas, João
Silva, Rodrigo Rocha
Bernardino, Jorge
author_role author
author2 Silva, Rodrigo Rocha
Bernardino, Jorge
author2_role author
author
dc.contributor.author.fl_str_mv Antas, João
Silva, Rodrigo Rocha
Bernardino, Jorge
dc.subject.por.fl_str_mv big data
COVID-19
Data Mining
SQL and NoSQL databases
topic big data
COVID-19
Data Mining
SQL and NoSQL databases
description COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.
publishDate 2022
dc.date.none.fl_str_mv 2022
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10316/100540
http://hdl.handle.net/10316/100540
https://doi.org/10.3390/computers11020029
url http://hdl.handle.net/10316/100540
https://doi.org/10.3390/computers11020029
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2073-431X
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134074856538112