Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/256836 |
Resumo: | The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms. |
id |
UFRGS-2_da622ea5e1efd8dcada7ed6db7b2a54f |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/256836 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Dorn, MárcioGrisci, Bruno IochinsNarloch, Pedro HenriqueFeltes, Bruno CésarÁvila, Eduardo MullerKahmann, AlessandroAlho, Clarice Sampaio2023-04-07T03:26:40Z20212376-5992http://hdl.handle.net/10183/256836001138423The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.application/pdfengPeerJ Computer Science. New York. Vol. 7 (set. 2021), p. 670-704Aprendizado de máquinaMineração de dadosCOVID-19Machine learningData miningImbalanced datasetsCovid, HemogramComparison of machine learning techniques to handle imbalanced COVID-19 CBC datasetsEstrangeiroinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001138423.pdf.txt001138423.pdf.txtExtracted Texttext/plain96486http://www.lume.ufrgs.br/bitstream/10183/256836/2/001138423.pdf.txt911c16ed0a6af0852f255629a7bd16e8MD52ORIGINAL001138423.pdfTexto completo (inglês)application/pdf13082802http://www.lume.ufrgs.br/bitstream/10183/256836/1/001138423.pdf5c4cc76135056adcd977bfcd8386e0ecMD5110183/2568362024-05-01 06:51:05.34644oai:www.lume.ufrgs.br:10183/256836Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2024-05-01T09:51:05Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
title |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
spellingShingle |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets Dorn, Márcio Aprendizado de máquina Mineração de dados COVID-19 Machine learning Data mining Imbalanced datasets Covid, Hemogram |
title_short |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
title_full |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
title_fullStr |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
title_full_unstemmed |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
title_sort |
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets |
author |
Dorn, Márcio |
author_facet |
Dorn, Márcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio |
author_role |
author |
author2 |
Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Dorn, Márcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio |
dc.subject.por.fl_str_mv |
Aprendizado de máquina Mineração de dados COVID-19 |
topic |
Aprendizado de máquina Mineração de dados COVID-19 Machine learning Data mining Imbalanced datasets Covid, Hemogram |
dc.subject.eng.fl_str_mv |
Machine learning Data mining Imbalanced datasets Covid, Hemogram |
description |
The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms. |
publishDate |
2021 |
dc.date.issued.fl_str_mv |
2021 |
dc.date.accessioned.fl_str_mv |
2023-04-07T03:26:40Z |
dc.type.driver.fl_str_mv |
Estrangeiro info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/256836 |
dc.identifier.issn.pt_BR.fl_str_mv |
2376-5992 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001138423 |
identifier_str_mv |
2376-5992 001138423 |
url |
http://hdl.handle.net/10183/256836 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.pt_BR.fl_str_mv |
PeerJ Computer Science. New York. Vol. 7 (set. 2021), p. 670-704 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/256836/2/001138423.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/256836/1/001138423.pdf |
bitstream.checksum.fl_str_mv |
911c16ed0a6af0852f255629a7bd16e8 5c4cc76135056adcd977bfcd8386e0ec |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1815447825353277440 |