G-SOMO

Detalhes bibliográficos
Autor(a) principal: Douzas, Georgios
Data de Publicação: 2021
Outros Autores: Rauch, Rene, Bação, Fernando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/119826
Resumo: Douzas, G., Rauch, R., & Bacao, F. (2021). G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE. Expert Systems with Applications, 183, 1-11. [115230]. https://doi.org/10.1016/j.eswa.2021.115230
id RCAP_b00e9da0a69af61b94b04f6182f4624f
oai_identifier_str oai:run.unl.pt:10362/119826
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling G-SOMOAn oversampling approach based on self-organized maps and geometric SMOTEClassificationG-SMOTEImbalanced learningMachine learningOversamplingSOMEngineering(all)Computer Science ApplicationsArtificial IntelligenceDouzas, G., Rauch, R., & Bacao, F. (2021). G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE. Expert Systems with Applications, 183, 1-11. [115230]. https://doi.org/10.1016/j.eswa.2021.115230Traditional supervised machine learning classifiers are challenged to learn highly skewed data distributions as they are designed to expect classes to equally contribute to the minimization of the classifiers cost function. Moreover, the classifiers design expects equal misclassification costs, causing a bias for overrepresented classes. Different strategies have been proposed to correct this issue. The modification of the data set has become a common practice since the procedure is generalizable to all classifiers. Various algorithms to rebalance the data distribution through the creation of synthetic instances were proposed in the past. In this paper, we propose a new oversampling algorithm named G-SOMO. The algorithm identifies optimal areas to create artificial data instances in an informed manner and utilizes a geometric region during the data generation process to increase their variability. Our empirical results on 69 datasets, validated with different classifiers and metrics against a benchmark of commonly used oversampling methods show that G-SOMO consistently outperforms competing oversampling methods. Additionally, the statistical significance of our results is established.Information Management Research Center (MagIC) - NOVA Information Management SchoolNOVA Information Management School (NOVA IMS)RUNDouzas, GeorgiosRauch, ReneBação, Fernando2024-02-17T01:31:53Z2021-11-302021-11-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article11application/pdfhttp://hdl.handle.net/10362/119826eng0957-4174PURE: 32101400https://doi.org/10.1016/j.eswa.2021.115230info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:02:24Zoai:run.unl.pt:10362/119826Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:44:12.216916Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv G-SOMO
An oversampling approach based on self-organized maps and geometric SMOTE
title G-SOMO
spellingShingle G-SOMO
Douzas, Georgios
Classification
G-SMOTE
Imbalanced learning
Machine learning
Oversampling
SOM
Engineering(all)
Computer Science Applications
Artificial Intelligence
title_short G-SOMO
title_full G-SOMO
title_fullStr G-SOMO
title_full_unstemmed G-SOMO
title_sort G-SOMO
author Douzas, Georgios
author_facet Douzas, Georgios
Rauch, Rene
Bação, Fernando
author_role author
author2 Rauch, Rene
Bação, Fernando
author2_role author
author
dc.contributor.none.fl_str_mv Information Management Research Center (MagIC) - NOVA Information Management School
NOVA Information Management School (NOVA IMS)
RUN
dc.contributor.author.fl_str_mv Douzas, Georgios
Rauch, Rene
Bação, Fernando
dc.subject.por.fl_str_mv Classification
G-SMOTE
Imbalanced learning
Machine learning
Oversampling
SOM
Engineering(all)
Computer Science Applications
Artificial Intelligence
topic Classification
G-SMOTE
Imbalanced learning
Machine learning
Oversampling
SOM
Engineering(all)
Computer Science Applications
Artificial Intelligence
description Douzas, G., Rauch, R., & Bacao, F. (2021). G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE. Expert Systems with Applications, 183, 1-11. [115230]. https://doi.org/10.1016/j.eswa.2021.115230
publishDate 2021
dc.date.none.fl_str_mv 2021-11-30
2021-11-30T00:00:00Z
2024-02-17T01:31:53Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/119826
url http://hdl.handle.net/10362/119826
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0957-4174
PURE: 32101400
https://doi.org/10.1016/j.eswa.2021.115230
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 11
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138049970405376