Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.1016/j.ins.2018.06.056 |
Resumo: | Douzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056 |
id |
RCAP_34363b2b0bb1739e5c9c17e36f9f906d |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/85456 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTEClass-imbalanced learningClassificationClusteringOversamplingSupervised learningWithin-class imbalanceSoftwareControl and Systems EngineeringTheoretical Computer ScienceComputer Science ApplicationsInformation Systems and ManagementArtificial IntelligenceDouzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampling technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 90 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation1 is made available in the Python programming language.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNDouzas, GeorgiosBação, FernandoLast, Felix2024-02-17T01:31:53Z2018-10-012018-10-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article20application/pdfhttps://doi.org/10.1016/j.ins.2018.06.056eng0020-0255PURE: 5469601http://www.scopus.com/inward/record.url?scp=85049450664&partnerID=8YFLogxKhttps://doi.org/10.1016/j.ins.2018.06.056info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:38:24Zoai:run.unl.pt:10362/85456Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:36:37.297971Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
title |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
spellingShingle |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE Douzas, Georgios Class-imbalanced learning Classification Clustering Oversampling Supervised learning Within-class imbalance Software Control and Systems Engineering Theoretical Computer Science Computer Science Applications Information Systems and Management Artificial Intelligence |
title_short |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
title_full |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
title_fullStr |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
title_full_unstemmed |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
title_sort |
Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE |
author |
Douzas, Georgios |
author_facet |
Douzas, Georgios Bação, Fernando Last, Felix |
author_role |
author |
author2 |
Bação, Fernando Last, Felix |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
dc.contributor.author.fl_str_mv |
Douzas, Georgios Bação, Fernando Last, Felix |
dc.subject.por.fl_str_mv |
Class-imbalanced learning Classification Clustering Oversampling Supervised learning Within-class imbalance Software Control and Systems Engineering Theoretical Computer Science Computer Science Applications Information Systems and Management Artificial Intelligence |
topic |
Class-imbalanced learning Classification Clustering Oversampling Supervised learning Within-class imbalance Software Control and Systems Engineering Theoretical Computer Science Computer Science Applications Information Systems and Management Artificial Intelligence |
description |
Douzas, G., Bação, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465, 1-20. DOI: 10.1016/j.ins.2018.06.056 |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-10-01 2018-10-01T00:00:00Z 2024-02-17T01:31:53Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.1016/j.ins.2018.06.056 |
url |
https://doi.org/10.1016/j.ins.2018.06.056 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0020-0255 PURE: 5469601 http://www.scopus.com/inward/record.url?scp=85049450664&partnerID=8YFLogxK https://doi.org/10.1016/j.ins.2018.06.056 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
20 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137983988760576 |