Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm

Detalhes bibliográficos
Autor(a) principal: Lechleitner, Maria
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/99077
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_14fd031d76b475fbe4b83b78a40ea68b
oai_identifier_str oai:run.unl.pt:10362/99077
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithmMachine LearningClassificationSmall Data ProblemArtificial Data GenerationOversamplingDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the age of Big Data, many machine learning tasks in numerous industries are still restricted due to the use of small datasets. The limited availability of data often results in unsatisfactory prediction performance of supervised learning algorithms and, consequently, poor decision making. The current research work aims to mitigate the small dataset problem by artificial data generation in the pre-processing phase of the data analysis process. The oversampling technique Geometric SMOTE is applied to generate new training instances and enhance crisp data structures. Experimental results show a significant improvement on the prediction accuracy when compared with the use of original, small datasets and over other oversampling techniques such as Random Oversampling, SMOTE and Borderline SMOTE. These findings show that artificial data creation is a promising approach to overcome the problem of small data in classification tasks.Bação, Fernando José Ferreira LucasRUNLechleitner, Maria2020-06-09T07:20:54Z2020-05-272020-05-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/99077TID:202485099enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:46:09Zoai:run.unl.pt:10362/99077Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:39:07.398286Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
title Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
spellingShingle Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
Lechleitner, Maria
Machine Learning
Classification
Small Data Problem
Artificial Data Generation
Oversampling
title_short Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
title_full Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
title_fullStr Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
title_full_unstemmed Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
title_sort Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
author Lechleitner, Maria
author_facet Lechleitner, Maria
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Lechleitner, Maria
dc.subject.por.fl_str_mv Machine Learning
Classification
Small Data Problem
Artificial Data Generation
Oversampling
topic Machine Learning
Classification
Small Data Problem
Artificial Data Generation
Oversampling
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2020
dc.date.none.fl_str_mv 2020-06-09T07:20:54Z
2020-05-27
2020-05-27T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/99077
TID:202485099
url http://hdl.handle.net/10362/99077
identifier_str_mv TID:202485099
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138007524048896