A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting

Detalhes bibliográficos
Autor(a) principal: Silva, Sara
Data de Publicação: 2018
Outros Autores: Vanneschi, Leonardo, Cabral, Ana I.R., Vasconcelos, Maria J.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/151417
Resumo: Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003
id RCAP_56ca302a11e3c6561d4d27096977a454
oai_identifier_str oai:run.unl.pt:10362/151417
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfittingClassificationData errorsGenetic ProgrammingHidden overfittingNoisy labelsSemi-supervised learningComputer Science(all)Mathematics(all)Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003Data gathered in the real world normally contains noise, either stemming from inaccurate experimental measurements or introduced by human errors. Our work deals with classification data where the attribute values were accurately measured, but the categories may have been mislabeled by the human in several sample points, resulting in unreliable training data. Genetic Programming (GP) compares favorably with the Classification and Regression Trees (CART) method, but it is still highly affected by these errors. Despite consistently achieving high accuracy in both training and test sets, many classification errors are found in a later validation phase, revealing a previously hidden overfitting to the erroneous data. Furthermore, the evolved models frequently output raw values that are far from the expected range. To improve the behavior of the evolved models, we extend the original training set with additional sample points where the class label is unknown, and devise a simple way for GP to use this additional information and learn in a semi-supervised manner. The results are surprisingly good. In the presence of the exact same mislabeling errors, the additional unlabeled data allowed GP to evolve models that achieved high accuracy also in the validation phase. This is a brand new approach to semi-supervised learning that opens an array of possibilities for making the most of the abundance of unlabeled data available today, in a simple and inexpensive way.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNSilva, SaraVanneschi, LeonardoCabral, Ana I.R.Vasconcelos, Maria J.2024-01-27T01:32:02Z2018-04-012018-04-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article16application/pdfhttp://hdl.handle.net/10362/151417eng2210-6502PURE: 3788203https://doi.org/10.1016/j.swevo.2017.11.003info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:33:52Zoai:run.unl.pt:10362/151417Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:54:35.062328Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
title A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
spellingShingle A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
Silva, Sara
Classification
Data errors
Genetic Programming
Hidden overfitting
Noisy labels
Semi-supervised learning
Computer Science(all)
Mathematics(all)
title_short A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
title_full A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
title_fullStr A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
title_full_unstemmed A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
title_sort A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
author Silva, Sara
author_facet Silva, Sara
Vanneschi, Leonardo
Cabral, Ana I.R.
Vasconcelos, Maria J.
author_role author
author2 Vanneschi, Leonardo
Cabral, Ana I.R.
Vasconcelos, Maria J.
author2_role author
author
author
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.contributor.author.fl_str_mv Silva, Sara
Vanneschi, Leonardo
Cabral, Ana I.R.
Vasconcelos, Maria J.
dc.subject.por.fl_str_mv Classification
Data errors
Genetic Programming
Hidden overfitting
Noisy labels
Semi-supervised learning
Computer Science(all)
Mathematics(all)
topic Classification
Data errors
Genetic Programming
Hidden overfitting
Noisy labels
Semi-supervised learning
Computer Science(all)
Mathematics(all)
description Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003
publishDate 2018
dc.date.none.fl_str_mv 2018-04-01
2018-04-01T00:00:00Z
2024-01-27T01:32:02Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/151417
url http://hdl.handle.net/10362/151417
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2210-6502
PURE: 3788203
https://doi.org/10.1016/j.swevo.2017.11.003
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 16
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138134585245696