Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/99077 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
id |
RCAP_14fd031d76b475fbe4b83b78a40ea68b |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/99077 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithmMachine LearningClassificationSmall Data ProblemArtificial Data GenerationOversamplingDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the age of Big Data, many machine learning tasks in numerous industries are still restricted due to the use of small datasets. The limited availability of data often results in unsatisfactory prediction performance of supervised learning algorithms and, consequently, poor decision making. The current research work aims to mitigate the small dataset problem by artificial data generation in the pre-processing phase of the data analysis process. The oversampling technique Geometric SMOTE is applied to generate new training instances and enhance crisp data structures. Experimental results show a significant improvement on the prediction accuracy when compared with the use of original, small datasets and over other oversampling techniques such as Random Oversampling, SMOTE and Borderline SMOTE. These findings show that artificial data creation is a promising approach to overcome the problem of small data in classification tasks.Bação, Fernando José Ferreira LucasRUNLechleitner, Maria2020-06-09T07:20:54Z2020-05-272020-05-27T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/99077TID:202485099enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:46:09Zoai:run.unl.pt:10362/99077Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:39:07.398286Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
title |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
spellingShingle |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm Lechleitner, Maria Machine Learning Classification Small Data Problem Artificial Data Generation Oversampling |
title_short |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
title_full |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
title_fullStr |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
title_full_unstemmed |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
title_sort |
Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm |
author |
Lechleitner, Maria |
author_facet |
Lechleitner, Maria |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas RUN |
dc.contributor.author.fl_str_mv |
Lechleitner, Maria |
dc.subject.por.fl_str_mv |
Machine Learning Classification Small Data Problem Artificial Data Generation Oversampling |
topic |
Machine Learning Classification Small Data Problem Artificial Data Generation Oversampling |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-06-09T07:20:54Z 2020-05-27 2020-05-27T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/99077 TID:202485099 |
url |
http://hdl.handle.net/10362/99077 |
identifier_str_mv |
TID:202485099 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138007524048896 |