Geometric SMOTE for imbalanced datasets with nominal and continuous features

Detalhes bibliográficos
Autor(a) principal: Fonseca, Joao
Data de Publicação: 2023
Outros Autores: Bacao, Fernando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/157223
Resumo: Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .
id RCAP_b8e90456f3966e10ada23295faa0ee9b
oai_identifier_str oai:run.unl.pt:10362/157223
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Geometric SMOTE for imbalanced datasets with nominal and continuous featuresImbalanced learningOversamplingSMOTEData generationNominal dataEngineering(all)Computer Science ApplicationsArtificial IntelligenceFonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .Imbalanced learning can be addressed in 3 different ways: Resampling, algorithmic modifications and cost-sensitive solutions. Resampling, and specifically oversampling, are more general approaches when opposed to algorithmic and cost-sensitive methods. Since the proposal of the Synthetic Minority Oversampling TEchnique (SMOTE), various SMOTE variants and neural network-based oversampling methods have been developed. However, the options to oversample datasets with nominal and continuous features are limited. We propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC. Our method modifies SMOTENC’s encoding and generation mechanism for nominal features while using G-SMOTE’s data selection mechanism to determine the center observation and k-nearest neighbors and generation mechanism for continuous features. G-SMOTENC’s performance is compared against SMOTENC’s along with two other baseline methods, a State-of-the-art oversampling method and no oversampling. The experiment was performed over 20 datasets with varying imbalance ratios, number of metric and non-metric features and target classes. We found a significant improvement in classification performance when using G-SMOTENC as the oversampling method. An open-source implementation of G-SMOTENC is made available in the Python programming language.Information Management Research Center (MagIC) - NOVA Information Management SchoolNOVA Information Management School (NOVA IMS)RUNFonseca, JoaoBacao, Fernando2023-09-01T22:15:42Z2023-12-012023-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article9application/pdfhttp://hdl.handle.net/10362/157223eng0957-4174PURE: 67759778https://doi.org/10.1016/j.eswa.2023.121053info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:39:32Zoai:run.unl.pt:10362/157223Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:56:37.574769Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Geometric SMOTE for imbalanced datasets with nominal and continuous features
title Geometric SMOTE for imbalanced datasets with nominal and continuous features
spellingShingle Geometric SMOTE for imbalanced datasets with nominal and continuous features
Fonseca, Joao
Imbalanced learning
Oversampling
SMOTE
Data generation
Nominal data
Engineering(all)
Computer Science Applications
Artificial Intelligence
title_short Geometric SMOTE for imbalanced datasets with nominal and continuous features
title_full Geometric SMOTE for imbalanced datasets with nominal and continuous features
title_fullStr Geometric SMOTE for imbalanced datasets with nominal and continuous features
title_full_unstemmed Geometric SMOTE for imbalanced datasets with nominal and continuous features
title_sort Geometric SMOTE for imbalanced datasets with nominal and continuous features
author Fonseca, Joao
author_facet Fonseca, Joao
Bacao, Fernando
author_role author
author2 Bacao, Fernando
author2_role author
dc.contributor.none.fl_str_mv Information Management Research Center (MagIC) - NOVA Information Management School
NOVA Information Management School (NOVA IMS)
RUN
dc.contributor.author.fl_str_mv Fonseca, Joao
Bacao, Fernando
dc.subject.por.fl_str_mv Imbalanced learning
Oversampling
SMOTE
Data generation
Nominal data
Engineering(all)
Computer Science Applications
Artificial Intelligence
topic Imbalanced learning
Oversampling
SMOTE
Data generation
Nominal data
Engineering(all)
Computer Science Applications
Artificial Intelligence
description Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .
publishDate 2023
dc.date.none.fl_str_mv 2023-09-01T22:15:42Z
2023-12-01
2023-12-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/157223
url http://hdl.handle.net/10362/157223
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0957-4174
PURE: 67759778
https://doi.org/10.1016/j.eswa.2023.121053
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 9
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138150868582400