O2PF: Oversampling via optimum-path forest for breast cancer detection

Detalhes bibliográficos
Autor(a) principal: Passos, Leandro [UNESP]
Data de Publicação: 2020
Outros Autores: Jodas, Danilo [UNESP], Ribeiro, Luiz [UNESP], Moreira, Thierry [UNESP], Papa, Joao [UNESP]
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1109/CBMS49503.2020.00100
http://hdl.handle.net/11449/221556
Resumo: Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.
id UNSP_8aaad7c35cafa09492440132a95524e6
oai_identifier_str oai:repositorio.unesp.br:11449/221556
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling O2PF: Oversampling via optimum-path forest for breast cancer detectionData imbalanceOptimum-path forestOversamplingBreast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.Sao Paulo State University Department of ComputingSao Paulo State University Department of ComputingUniversidade Estadual Paulista (UNESP)Passos, Leandro [UNESP]Jodas, Danilo [UNESP]Ribeiro, Luiz [UNESP]Moreira, Thierry [UNESP]Papa, Joao [UNESP]2022-04-28T19:29:21Z2022-04-28T19:29:21Z2020-07-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject498-503http://dx.doi.org/10.1109/CBMS49503.2020.00100Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503.1063-7125http://hdl.handle.net/11449/22155610.1109/CBMS49503.2020.001002-s2.0-85091143461Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - IEEE Symposium on Computer-Based Medical Systemsinfo:eu-repo/semantics/openAccess2022-04-28T19:29:21Zoai:repositorio.unesp.br:11449/221556Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462022-04-28T19:29:21Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv O2PF: Oversampling via optimum-path forest for breast cancer detection
title O2PF: Oversampling via optimum-path forest for breast cancer detection
spellingShingle O2PF: Oversampling via optimum-path forest for breast cancer detection
Passos, Leandro [UNESP]
Data imbalance
Optimum-path forest
Oversampling
title_short O2PF: Oversampling via optimum-path forest for breast cancer detection
title_full O2PF: Oversampling via optimum-path forest for breast cancer detection
title_fullStr O2PF: Oversampling via optimum-path forest for breast cancer detection
title_full_unstemmed O2PF: Oversampling via optimum-path forest for breast cancer detection
title_sort O2PF: Oversampling via optimum-path forest for breast cancer detection
author Passos, Leandro [UNESP]
author_facet Passos, Leandro [UNESP]
Jodas, Danilo [UNESP]
Ribeiro, Luiz [UNESP]
Moreira, Thierry [UNESP]
Papa, Joao [UNESP]
author_role author
author2 Jodas, Danilo [UNESP]
Ribeiro, Luiz [UNESP]
Moreira, Thierry [UNESP]
Papa, Joao [UNESP]
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
dc.contributor.author.fl_str_mv Passos, Leandro [UNESP]
Jodas, Danilo [UNESP]
Ribeiro, Luiz [UNESP]
Moreira, Thierry [UNESP]
Papa, Joao [UNESP]
dc.subject.por.fl_str_mv Data imbalance
Optimum-path forest
Oversampling
topic Data imbalance
Optimum-path forest
Oversampling
description Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.
publishDate 2020
dc.date.none.fl_str_mv 2020-07-01
2022-04-28T19:29:21Z
2022-04-28T19:29:21Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1109/CBMS49503.2020.00100
Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503.
1063-7125
http://hdl.handle.net/11449/221556
10.1109/CBMS49503.2020.00100
2-s2.0-85091143461
url http://dx.doi.org/10.1109/CBMS49503.2020.00100
http://hdl.handle.net/11449/221556
identifier_str_mv Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503.
1063-7125
10.1109/CBMS49503.2020.00100
2-s2.0-85091143461
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Proceedings - IEEE Symposium on Computer-Based Medical Systems
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 498-503
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1799965727423201280