Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNIFESP |
Texto Completo: | https://repositorio.unifesp.br/handle/11600/60704 |
Resumo: | We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters. |
id |
UFSP_e4e55cf97312e69014ffdcb8b863b093 |
---|---|
oai_identifier_str |
oai:repositorio.unifesp.br/:11600/60704 |
network_acronym_str |
UFSP |
network_name_str |
Repositório Institucional da UNIFESP |
repository_id_str |
3465 |
spelling |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningLgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningAprendizado de MáquinaRuído de rótuloAprendizado Semissupervisionado baseado em GrafosFiltroConsistência Local e GlobalWe consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.Neste trabalho, consideramos o problema de aprendizado de máquina com rótulos ruidosos, no caso em que a maioria dos dados não são rotulados. Mais especificamente, nos concentramos na aprendizagem semissupervisionada baseada em grafos, em que muitas abordagens diferentes já foram propostas, como a norma l1, a busca de base de autofunções suave e a formulação bivariada. Propomos nosso próprio filtro semissupervisionado, nomeado Filtro automático leave-one-out com base na consistência local e global (LGCLVOAuto), que corrige e redistribui as informações do rótulo para minimizar o erro "leave-one-out" (deixa um rótulo de fora), ao mesmo tempo mantendo-se consistente com o processo de passeio aleatório imposto por sua linha de base, o algoritmo de Consistência Local e Global (LGC). Exploramos o problema da dominância diagonal em soluções do LGC e sua possível relação com o sobreajuste, e como zerando essa diagonal leva ao custo desejado. Fazemos uso da otimização via gradiente descendente nos rótulos para minimizar esse custo, transferindo parte da confiança nos próprios rótulos para o modelo de propagação. Para eliminar soluções degeneradas, algumas restrições são postas em prática: os rótulos não podem mudar de classe e a contribuição geral de cada classe deve permanecer a mesma. A otimização requer apenas as relações entre os rótulos: consequentemente, é adequado para conjuntos de dados moderadamente grandes, em particular quando os dados rotulados são escassos. Requer um único parâmetro. Em teoria, ele pode ser estendido trivialmente para uma generalização de sua linha de base. Os resultados mostram que LGCLVOAuto é capaz de superar sua linha de base com sobra quando há ruído e atrapalha pouco no cenário sem ruído, sendo uma ferramenta útil para a detecção de rótulos ruidosos. Além disso, é competitivo com outros métodos que requerem mais parâmetros.Não recebi financiamentoUniversidade Federal de São PauloBerton, Lilian [UNIFESP]http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfwhttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecwAfonso, Bruno Klaus de Aquino [UNIFESP]2021-03-23T21:12:56Z2021-03-23T21:12:56Z2020-10-10info:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/publishedVersion89 p.application/pdf@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }https://repositorio.unifesp.br/handle/11600/60704engSão José dos Camposinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESP2024-08-03T00:35:10Zoai:repositorio.unifesp.br/:11600/60704Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-08-03T00:35:10Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false |
dc.title.none.fl_str_mv |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
title |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
spellingShingle |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning Afonso, Bruno Klaus de Aquino [UNIFESP] Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global |
title_short |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
title_full |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
title_fullStr |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
title_full_unstemmed |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
title_sort |
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning |
author |
Afonso, Bruno Klaus de Aquino [UNIFESP] |
author_facet |
Afonso, Bruno Klaus de Aquino [UNIFESP] |
author_role |
author |
dc.contributor.none.fl_str_mv |
Berton, Lilian [UNIFESP] http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfw http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecw |
dc.contributor.author.fl_str_mv |
Afonso, Bruno Klaus de Aquino [UNIFESP] |
dc.subject.por.fl_str_mv |
Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global |
topic |
Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global |
description |
We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-10-10 2021-03-23T21:12:56Z 2021-03-23T21:12:56Z |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} } https://repositorio.unifesp.br/handle/11600/60704 |
identifier_str_mv |
@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} } |
url |
https://repositorio.unifesp.br/handle/11600/60704 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
89 p. application/pdf |
dc.coverage.none.fl_str_mv |
São José dos Campos |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Paulo |
publisher.none.fl_str_mv |
Universidade Federal de São Paulo |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UNIFESP instname:Universidade Federal de São Paulo (UNIFESP) instacron:UNIFESP |
instname_str |
Universidade Federal de São Paulo (UNIFESP) |
instacron_str |
UNIFESP |
institution |
UNIFESP |
reponame_str |
Repositório Institucional da UNIFESP |
collection |
Repositório Institucional da UNIFESP |
repository.name.fl_str_mv |
Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP) |
repository.mail.fl_str_mv |
biblioteca.csp@unifesp.br |
_version_ |
1814268424274051072 |