Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/63810 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
id |
RCAP_d7331435b97005d662b3eda1a8493d24 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/63810 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learningImbalanced LearningOversamplingClusteringSupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceImbalanced datasets in supervised learning are considered an ongoing challenging task for standard algorithms, seeing as they are designed to handle balanced class distributions and perform poorly when applied to problems of the imbalanced nature. Many methods have been developed to address this specific problem but the more general approach to achieve a balanced class distribution is data level modification, instead of algorithm modifications. Although class imbalances are responsible for significant losses of performance in standard classifiers in many different types of problems, another aspect that is important to consider is the small disjuncts problem. Therefore, it is important to consider and understand solutions that not only take into the account the between-class imbalance (the imbalance occurring between the two classes) but also the within-class imbalance (the imbalance occurring between the sub-clusters of each class) and to oversample the dataset by rectifying these two types of imbalances simultaneously. It has been shown that cluster-based oversampling is a robust solution that takes into consideration these two problems. This work sets out to study the effect and impact combining different existing oversampling methods with a clustering-based approach. Empirical results of extensive experiments show that the combinations of different oversampling techniques with the clustering algorithm k-means – K-Means Oversampling - improves upon classification results resulting solely from the oversampling techniques with no prior clustering step.Bação, Fernando José Ferreira LucasRUNPereira, Mariana Matoso2019-03-19T17:54:35Z2019-03-012019-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/63810TID:202200000enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:30:16Zoai:run.unl.pt:10362/63810Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:33:59.552111Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
spellingShingle |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning Pereira, Mariana Matoso Imbalanced Learning Oversampling Clustering Supervised Learning |
title_short |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_full |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_fullStr |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_full_unstemmed |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_sort |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
author |
Pereira, Mariana Matoso |
author_facet |
Pereira, Mariana Matoso |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas RUN |
dc.contributor.author.fl_str_mv |
Pereira, Mariana Matoso |
dc.subject.por.fl_str_mv |
Imbalanced Learning Oversampling Clustering Supervised Learning |
topic |
Imbalanced Learning Oversampling Clustering Supervised Learning |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-03-19T17:54:35Z 2019-03-01 2019-03-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/63810 TID:202200000 |
url |
http://hdl.handle.net/10362/63810 |
identifier_str_mv |
TID:202200000 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137961494708224 |