Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10316/100540 https://doi.org/10.3390/computers11020029 |
Resumo: | COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume. |
id |
RCAP_cf46a63fb2856f621a2dad20099e304a |
---|---|
oai_identifier_str |
oai:estudogeral.uc.pt:10316/100540 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Databig dataCOVID-19Data MiningSQL and NoSQL databasesCOVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume.2022info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/100540http://hdl.handle.net/10316/100540https://doi.org/10.3390/computers11020029eng2073-431XAntas, JoãoSilva, Rodrigo RochaBernardino, Jorgeinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-06-30T20:31:26Zoai:estudogeral.uc.pt:10316/100540Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:17:54.455472Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
title |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
spellingShingle |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data Antas, João big data COVID-19 Data Mining SQL and NoSQL databases |
title_short |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
title_full |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
title_fullStr |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
title_full_unstemmed |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
title_sort |
Assessment of SQL and NoSQL Systems to Store and Mine COVID-19 Data |
author |
Antas, João |
author_facet |
Antas, João Silva, Rodrigo Rocha Bernardino, Jorge |
author_role |
author |
author2 |
Silva, Rodrigo Rocha Bernardino, Jorge |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Antas, João Silva, Rodrigo Rocha Bernardino, Jorge |
dc.subject.por.fl_str_mv |
big data COVID-19 Data Mining SQL and NoSQL databases |
topic |
big data COVID-19 Data Mining SQL and NoSQL databases |
description |
COVID-19 has provoked enormous negative impacts on human lives and the world economy. In order to help in the fight against this pandemic, this study evaluates different databases’ systems and selects the most suitable for storing, handling, and mining COVID-19 data. We evaluate different SQL and NoSQL database systems using the following metrics: query runtime, memory used, CPU used, and storage size. The databases systems assessed were Microsoft SQL Server, MongoDB, and Cassandra. We also evaluate Data Mining algorithms, including Decision Trees, Random Forest, Naive Bayes, and Logistic Regression using Orange Data Mining software data classification tests. Classification tests were performed using cross-validation in a table with about 3 M records, including COVID-19 exams with patients’ symptoms. The Random Forest algorithm has obtained the best average accuracy, recall, precision, and F1 Score in the COVID-19 predictive model performed in the mining stage. In performance evaluation, MongoDB has presented the best results for almost all tests with a large data volume. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10316/100540 http://hdl.handle.net/10316/100540 https://doi.org/10.3390/computers11020029 |
url |
http://hdl.handle.net/10316/100540 https://doi.org/10.3390/computers11020029 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2073-431X |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134074856538112 |