O2PF: Oversampling via optimum-path forest for breast cancer detection
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1109/CBMS49503.2020.00100 http://hdl.handle.net/11449/221556 |
Resumo: | Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets. |
id |
UNSP_8aaad7c35cafa09492440132a95524e6 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/221556 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
O2PF: Oversampling via optimum-path forest for breast cancer detectionData imbalanceOptimum-path forestOversamplingBreast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.Sao Paulo State University Department of ComputingSao Paulo State University Department of ComputingUniversidade Estadual Paulista (UNESP)Passos, Leandro [UNESP]Jodas, Danilo [UNESP]Ribeiro, Luiz [UNESP]Moreira, Thierry [UNESP]Papa, Joao [UNESP]2022-04-28T19:29:21Z2022-04-28T19:29:21Z2020-07-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject498-503http://dx.doi.org/10.1109/CBMS49503.2020.00100Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503.1063-7125http://hdl.handle.net/11449/22155610.1109/CBMS49503.2020.001002-s2.0-85091143461Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings - IEEE Symposium on Computer-Based Medical Systemsinfo:eu-repo/semantics/openAccess2022-04-28T19:29:21Zoai:repositorio.unesp.br:11449/221556Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T23:59:11.626262Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
title |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
spellingShingle |
O2PF: Oversampling via optimum-path forest for breast cancer detection Passos, Leandro [UNESP] Data imbalance Optimum-path forest Oversampling |
title_short |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
title_full |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
title_fullStr |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
title_full_unstemmed |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
title_sort |
O2PF: Oversampling via optimum-path forest for breast cancer detection |
author |
Passos, Leandro [UNESP] |
author_facet |
Passos, Leandro [UNESP] Jodas, Danilo [UNESP] Ribeiro, Luiz [UNESP] Moreira, Thierry [UNESP] Papa, Joao [UNESP] |
author_role |
author |
author2 |
Jodas, Danilo [UNESP] Ribeiro, Luiz [UNESP] Moreira, Thierry [UNESP] Papa, Joao [UNESP] |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) |
dc.contributor.author.fl_str_mv |
Passos, Leandro [UNESP] Jodas, Danilo [UNESP] Ribeiro, Luiz [UNESP] Moreira, Thierry [UNESP] Papa, Joao [UNESP] |
dc.subject.por.fl_str_mv |
Data imbalance Optimum-path forest Oversampling |
topic |
Data imbalance Optimum-path forest Oversampling |
description |
Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the O2PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-07-01 2022-04-28T19:29:21Z 2022-04-28T19:29:21Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1109/CBMS49503.2020.00100 Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503. 1063-7125 http://hdl.handle.net/11449/221556 10.1109/CBMS49503.2020.00100 2-s2.0-85091143461 |
url |
http://dx.doi.org/10.1109/CBMS49503.2020.00100 http://hdl.handle.net/11449/221556 |
identifier_str_mv |
Proceedings - IEEE Symposium on Computer-Based Medical Systems, v. 2020-July, p. 498-503. 1063-7125 10.1109/CBMS49503.2020.00100 2-s2.0-85091143461 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Proceedings - IEEE Symposium on Computer-Based Medical Systems |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
498-503 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129570025504768 |