Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

Dorn, Márcio; Grisci, Bruno Iochins; Narloch, Pedro Henrique; Feltes, Bruno César; Ávila, Eduardo Muller; Kahmann, Alessandro; Alho, Clarice Sampaio

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

Detalhes bibliográficos
Autor(a) principal:	Dorn, Márcio
Data de Publicação:	2021
Outros Autores:	Grisci, Bruno Iochins, Narloch, Pedro Henrique, Feltes, Bruno César, Ávila, Eduardo Muller, Kahmann, Alessandro, Alho, Clarice Sampaio
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRGS
Texto Completo:	http://hdl.handle.net/10183/256836
Resumo:	The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.

Metadados do item

id	UFRGS-2_da622ea5e1efd8dcada7ed6db7b2a54f
oai_identifier_str	oai:www.lume.ufrgs.br:10183/256836
network_acronym_str	UFRGS-2
network_name_str	Repositório Institucional da UFRGS
repository_id_str
spelling	Dorn, MárcioGrisci, Bruno IochinsNarloch, Pedro HenriqueFeltes, Bruno CésarÁvila, Eduardo MullerKahmann, AlessandroAlho, Clarice Sampaio2023-04-07T03:26:40Z20212376-5992http://hdl.handle.net/10183/256836001138423The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.application/pdfengPeerJ Computer Science. New York. Vol. 7 (set. 2021), p. 670-704Aprendizado de máquinaMineração de dadosCOVID-19Machine learningData miningImbalanced datasetsCovid, HemogramComparison of machine learning techniques to handle imbalanced COVID-19 CBC datasetsEstrangeiroinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001138423.pdf.txt001138423.pdf.txtExtracted Texttext/plain96486http://www.lume.ufrgs.br/bitstream/10183/256836/2/001138423.pdf.txt911c16ed0a6af0852f255629a7bd16e8MD52ORIGINAL001138423.pdfTexto completo (inglês)application/pdf13082802http://www.lume.ufrgs.br/bitstream/10183/256836/1/001138423.pdf5c4cc76135056adcd977bfcd8386e0ecMD5110183/2568362024-05-01 06:51:05.34644oai:www.lume.ufrgs.br:10183/256836Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2024-05-01T09:51:05Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
spellingShingle	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets Dorn, Márcio Aprendizado de máquina Mineração de dados COVID-19 Machine learning Data mining Imbalanced datasets Covid, Hemogram
title_short	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_full	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_fullStr	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_full_unstemmed	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
title_sort	Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets
author	Dorn, Márcio
author_facet	Dorn, Márcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio
author_role	author
author2	Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio
author2_role	author author author author author author
dc.contributor.author.fl_str_mv	Dorn, Márcio Grisci, Bruno Iochins Narloch, Pedro Henrique Feltes, Bruno César Ávila, Eduardo Muller Kahmann, Alessandro Alho, Clarice Sampaio
dc.subject.por.fl_str_mv	Aprendizado de máquina Mineração de dados COVID-19
topic	Aprendizado de máquina Mineração de dados COVID-19 Machine learning Data mining Imbalanced datasets Covid, Hemogram
dc.subject.eng.fl_str_mv	Machine learning Data mining Imbalanced datasets Covid, Hemogram
description	The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.
publishDate	2021
dc.date.issued.fl_str_mv	2021
dc.date.accessioned.fl_str_mv	2023-04-07T03:26:40Z
dc.type.driver.fl_str_mv	Estrangeiro info:eu-repo/semantics/article
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/256836
dc.identifier.issn.pt_BR.fl_str_mv	2376-5992
dc.identifier.nrb.pt_BR.fl_str_mv	001138423
identifier_str_mv	2376-5992 001138423
url	http://hdl.handle.net/10183/256836
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.ispartof.pt_BR.fl_str_mv	PeerJ Computer Science. New York. Vol. 7 (set. 2021), p. 670-704
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Repositório Institucional da UFRGS
collection	Repositório Institucional da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/256836/2/001138423.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/256836/1/001138423.pdf
bitstream.checksum.fl_str_mv	911c16ed0a6af0852f255629a7bd16e8 5c4cc76135056adcd977bfcd8386e0ec
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_	1815447825353277440

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

Registros relacionados