Geometric SMOTE for imbalanced datasets with nominal and continuous features
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/157223 |
Resumo: | Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) . |
id |
RCAP_b8e90456f3966e10ada23295faa0ee9b |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/157223 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Geometric SMOTE for imbalanced datasets with nominal and continuous featuresImbalanced learningOversamplingSMOTEData generationNominal dataEngineering(all)Computer Science ApplicationsArtificial IntelligenceFonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .Imbalanced learning can be addressed in 3 different ways: Resampling, algorithmic modifications and cost-sensitive solutions. Resampling, and specifically oversampling, are more general approaches when opposed to algorithmic and cost-sensitive methods. Since the proposal of the Synthetic Minority Oversampling TEchnique (SMOTE), various SMOTE variants and neural network-based oversampling methods have been developed. However, the options to oversample datasets with nominal and continuous features are limited. We propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC. Our method modifies SMOTENC’s encoding and generation mechanism for nominal features while using G-SMOTE’s data selection mechanism to determine the center observation and k-nearest neighbors and generation mechanism for continuous features. G-SMOTENC’s performance is compared against SMOTENC’s along with two other baseline methods, a State-of-the-art oversampling method and no oversampling. The experiment was performed over 20 datasets with varying imbalance ratios, number of metric and non-metric features and target classes. We found a significant improvement in classification performance when using G-SMOTENC as the oversampling method. An open-source implementation of G-SMOTENC is made available in the Python programming language.Information Management Research Center (MagIC) - NOVA Information Management SchoolNOVA Information Management School (NOVA IMS)RUNFonseca, JoaoBacao, Fernando2023-09-01T22:15:42Z2023-12-302023-12-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article9application/pdfhttp://hdl.handle.net/10362/157223eng0957-4174PURE: 67759778https://doi.org/10.1016/j.eswa.2023.121053info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-10-21T01:36:50Zoai:run.unl.pt:10362/157223Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-10-21T01:36:50Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
title |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
spellingShingle |
Geometric SMOTE for imbalanced datasets with nominal and continuous features Fonseca, Joao Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
title_short |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
title_full |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
title_fullStr |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
title_full_unstemmed |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
title_sort |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
author |
Fonseca, Joao |
author_facet |
Fonseca, Joao Bacao, Fernando |
author_role |
author |
author2 |
Bacao, Fernando |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Information Management Research Center (MagIC) - NOVA Information Management School NOVA Information Management School (NOVA IMS) RUN |
dc.contributor.author.fl_str_mv |
Fonseca, Joao Bacao, Fernando |
dc.subject.por.fl_str_mv |
Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
topic |
Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
description |
Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) . |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-09-01T22:15:42Z 2023-12-30 2023-12-30T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/157223 |
url |
http://hdl.handle.net/10362/157223 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0957-4174 PURE: 67759778 https://doi.org/10.1016/j.eswa.2023.121053 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
9 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817545952403128320 |