Oversampling for imbalanced learning based on k-means and smote

Detalhes bibliográficos
Autor(a) principal: Last, Felix
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/31042
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_af89d92526d4f1844821176292151f9a
oai_identifier_str oai:run.unl.pt:10362/31042
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Oversampling for imbalanced learning based on k-means and smoteClass-imbalanced learningOversamplingClassificationClusteringSupervised learningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsLearning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Bação, Fernando José Ferreira LucasRUNLast, Felix2018-02-22T16:44:00Z2018-02-052018-02-05T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/31042TID:201852080enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:17:09Zoai:run.unl.pt:10362/31042Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:29:34.791423Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Oversampling for imbalanced learning based on k-means and smote
title Oversampling for imbalanced learning based on k-means and smote
spellingShingle Oversampling for imbalanced learning based on k-means and smote
Last, Felix
Class-imbalanced learning
Oversampling
Classification
Clustering
Supervised learning
title_short Oversampling for imbalanced learning based on k-means and smote
title_full Oversampling for imbalanced learning based on k-means and smote
title_fullStr Oversampling for imbalanced learning based on k-means and smote
title_full_unstemmed Oversampling for imbalanced learning based on k-means and smote
title_sort Oversampling for imbalanced learning based on k-means and smote
author Last, Felix
author_facet Last, Felix
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Last, Felix
dc.subject.por.fl_str_mv Class-imbalanced learning
Oversampling
Classification
Clustering
Supervised learning
topic Class-imbalanced learning
Oversampling
Classification
Clustering
Supervised learning
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2018
dc.date.none.fl_str_mv 2018-02-22T16:44:00Z
2018-02-05
2018-02-05T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/31042
TID:201852080
url http://hdl.handle.net/10362/31042
identifier_str_mv TID:201852080
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137921289158656