Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning

Detalhes bibliográficos
Autor(a) principal: Pereira, Mariana Matoso
Data de Publicação: 2019
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/63810
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
id RCAP_d7331435b97005d662b3eda1a8493d24
oai_identifier_str oai:run.unl.pt:10362/63810
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learningImbalanced LearningOversamplingClusteringSupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceImbalanced datasets in supervised learning are considered an ongoing challenging task for standard algorithms, seeing as they are designed to handle balanced class distributions and perform poorly when applied to problems of the imbalanced nature. Many methods have been developed to address this specific problem but the more general approach to achieve a balanced class distribution is data level modification, instead of algorithm modifications. Although class imbalances are responsible for significant losses of performance in standard classifiers in many different types of problems, another aspect that is important to consider is the small disjuncts problem. Therefore, it is important to consider and understand solutions that not only take into the account the between-class imbalance (the imbalance occurring between the two classes) but also the within-class imbalance (the imbalance occurring between the sub-clusters of each class) and to oversample the dataset by rectifying these two types of imbalances simultaneously. It has been shown that cluster-based oversampling is a robust solution that takes into consideration these two problems. This work sets out to study the effect and impact combining different existing oversampling methods with a clustering-based approach. Empirical results of extensive experiments show that the combinations of different oversampling techniques with the clustering algorithm k-means – K-Means Oversampling - improves upon classification results resulting solely from the oversampling techniques with no prior clustering step.Bação, Fernando José Ferreira LucasRUNPereira, Mariana Matoso2019-03-19T17:54:35Z2019-03-012019-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/63810TID:202200000enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:30:16Zoai:run.unl.pt:10362/63810Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:33:59.552111Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
title Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
spellingShingle Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
Pereira, Mariana Matoso
Imbalanced Learning
Oversampling
Clustering
Supervised Learning
title_short Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
title_full Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
title_fullStr Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
title_full_unstemmed Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
title_sort Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
author Pereira, Mariana Matoso
author_facet Pereira, Mariana Matoso
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Pereira, Mariana Matoso
dc.subject.por.fl_str_mv Imbalanced Learning
Oversampling
Clustering
Supervised Learning
topic Imbalanced Learning
Oversampling
Clustering
Supervised Learning
description Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
publishDate 2019
dc.date.none.fl_str_mv 2019-03-19T17:54:35Z
2019-03-01
2019-03-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/63810
TID:202200000
url http://hdl.handle.net/10362/63810
identifier_str_mv TID:202200000
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137961494708224