Handling imbalanced datasets through Optimum-Path Forest
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://dx.doi.org/10.1016/j.knosys.2022.108445 http://hdl.handle.net/11449/234201 |
Resumo: | In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches. |
id |
UNSP_971d4ee5da266237ecf919b44386d0fb |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/234201 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Handling imbalanced datasets through Optimum-Path ForestImbalanced dataOptimum-Path ForestOversamplingUndersamplingIn the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Department of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01Department of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01Department of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01Department of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01FAPESP: #2013/07375-0FAPESP: #2014/12236-1FAPESP: #2017/02286-0FAPESP: #2018/21934-5FAPESP: #2019/07665-4FAPESP: #2019/18287-0FAPESP: #2020/12101-0CNPq: #307066/2017-7CNPq: #427968/2018-6Universidade Estadual Paulista (UNESP)Passos, Leandro Aparecido [UNESP]Jodas, Danilo S. [UNESP]Ribeiro, Luiz C.F. [UNESP]Akio, Marco [UNESP]de Souza, Andre Nunes [UNESP]Papa, João Paulo [UNESP]2022-05-01T13:57:34Z2022-05-01T13:57:34Z2022-04-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.knosys.2022.108445Knowledge-Based Systems, v. 242.0950-7051http://hdl.handle.net/11449/23420110.1016/j.knosys.2022.1084452-s2.0-85125266467Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengKnowledge-Based Systemsinfo:eu-repo/semantics/openAccess2024-04-23T16:11:00Zoai:repositorio.unesp.br:11449/234201Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-04-23T16:11Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Handling imbalanced datasets through Optimum-Path Forest |
title |
Handling imbalanced datasets through Optimum-Path Forest |
spellingShingle |
Handling imbalanced datasets through Optimum-Path Forest Passos, Leandro Aparecido [UNESP] Imbalanced data Optimum-Path Forest Oversampling Undersampling |
title_short |
Handling imbalanced datasets through Optimum-Path Forest |
title_full |
Handling imbalanced datasets through Optimum-Path Forest |
title_fullStr |
Handling imbalanced datasets through Optimum-Path Forest |
title_full_unstemmed |
Handling imbalanced datasets through Optimum-Path Forest |
title_sort |
Handling imbalanced datasets through Optimum-Path Forest |
author |
Passos, Leandro Aparecido [UNESP] |
author_facet |
Passos, Leandro Aparecido [UNESP] Jodas, Danilo S. [UNESP] Ribeiro, Luiz C.F. [UNESP] Akio, Marco [UNESP] de Souza, Andre Nunes [UNESP] Papa, João Paulo [UNESP] |
author_role |
author |
author2 |
Jodas, Danilo S. [UNESP] Ribeiro, Luiz C.F. [UNESP] Akio, Marco [UNESP] de Souza, Andre Nunes [UNESP] Papa, João Paulo [UNESP] |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) |
dc.contributor.author.fl_str_mv |
Passos, Leandro Aparecido [UNESP] Jodas, Danilo S. [UNESP] Ribeiro, Luiz C.F. [UNESP] Akio, Marco [UNESP] de Souza, Andre Nunes [UNESP] Papa, João Paulo [UNESP] |
dc.subject.por.fl_str_mv |
Imbalanced data Optimum-Path Forest Oversampling Undersampling |
topic |
Imbalanced data Optimum-Path Forest Oversampling Undersampling |
description |
In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-05-01T13:57:34Z 2022-05-01T13:57:34Z 2022-04-22 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1016/j.knosys.2022.108445 Knowledge-Based Systems, v. 242. 0950-7051 http://hdl.handle.net/11449/234201 10.1016/j.knosys.2022.108445 2-s2.0-85125266467 |
url |
http://dx.doi.org/10.1016/j.knosys.2022.108445 http://hdl.handle.net/11449/234201 |
identifier_str_mv |
Knowledge-Based Systems, v. 242. 0950-7051 10.1016/j.knosys.2022.108445 2-s2.0-85125266467 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Knowledge-Based Systems |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1797790299450245120 |