Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE

Detalhes bibliográficos
Autor(a) principal: Douzas, Georgios
Data de Publicação: 2018
Outros Autores: Bação, Fernando, Last, Felix
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.1016/j.ins.2018.06.056
Resumo: Douzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056
id RCAP_34363b2b0bb1739e5c9c17e36f9f906d
oai_identifier_str oai:run.unl.pt:10362/85456
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTEClass-imbalanced learningClassificationClusteringOversamplingSupervised learningWithin-class imbalanceSoftwareControl and Systems EngineeringTheoretical Computer ScienceComputer Science ApplicationsInformation Systems and ManagementArtificial IntelligenceDouzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampling technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 90 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation1 is made available in the Python programming language.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNDouzas, GeorgiosBação, FernandoLast, Felix2024-02-17T01:31:53Z2018-10-012018-10-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article20application/pdfhttps://doi.org/10.1016/j.ins.2018.06.056eng0020-0255PURE: 5469601http://www.scopus.com/inward/record.url?scp=85049450664&partnerID=8YFLogxKhttps://doi.org/10.1016/j.ins.2018.06.056info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:38:24Zoai:run.unl.pt:10362/85456Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:36:37.297971Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
title Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
spellingShingle Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Douzas, Georgios
Class-imbalanced learning
Classification
Clustering
Oversampling
Supervised learning
Within-class imbalance
Software
Control and Systems Engineering
Theoretical Computer Science
Computer Science Applications
Information Systems and Management
Artificial Intelligence
title_short Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
title_full Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
title_fullStr Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
title_full_unstemmed Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
title_sort Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
author Douzas, Georgios
author_facet Douzas, Georgios
Bação, Fernando
Last, Felix
author_role author
author2 Bação, Fernando
Last, Felix
author2_role author
author
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.contributor.author.fl_str_mv Douzas, Georgios
Bação, Fernando
Last, Felix
dc.subject.por.fl_str_mv Class-imbalanced learning
Classification
Clustering
Oversampling
Supervised learning
Within-class imbalance
Software
Control and Systems Engineering
Theoretical Computer Science
Computer Science Applications
Information Systems and Management
Artificial Intelligence
topic Class-imbalanced learning
Classification
Clustering
Oversampling
Supervised learning
Within-class imbalance
Software
Control and Systems Engineering
Theoretical Computer Science
Computer Science Applications
Information Systems and Management
Artificial Intelligence
description Douzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056
publishDate 2018
dc.date.none.fl_str_mv 2018-10-01
2018-10-01T00:00:00Z
2024-02-17T01:31:53Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.1016/j.ins.2018.06.056
url https://doi.org/10.1016/j.ins.2018.06.056
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0020-0255
PURE: 5469601
http://www.scopus.com/inward/record.url?scp=85049450664&partnerID=8YFLogxK
https://doi.org/10.1016/j.ins.2018.06.056
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 20
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137983988760576