Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning

Detalhes bibliográficos
Autor(a) principal: Afonso, Bruno Klaus de Aquino [UNIFESP]
Data de Publicação: 2020
Tipo de documento: Trabalho de conclusão de curso
Idioma: eng
Título da fonte: Repositório Institucional da UNIFESP
Texto Completo: https://repositorio.unifesp.br/handle/11600/60704
Resumo: We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.
id UFSP_e4e55cf97312e69014ffdcb8b863b093
oai_identifier_str oai:repositorio.unifesp.br/:11600/60704
network_acronym_str UFSP
network_name_str Repositório Institucional da UNIFESP
repository_id_str 3465
spelling Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningLgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningAprendizado de MáquinaRuído de rótuloAprendizado Semissupervisionado baseado em GrafosFiltroConsistência Local e GlobalWe consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.Neste trabalho, consideramos o problema de aprendizado de máquina com rótulos ruidosos, no caso em que a maioria dos dados não são rotulados. Mais especificamente, nos concentramos na aprendizagem semissupervisionada baseada em grafos, em que muitas abordagens diferentes já foram propostas, como a norma l1, a busca de base de autofunções suave e a formulação bivariada. Propomos nosso próprio filtro semissupervisionado, nomeado Filtro automático leave-one-out com base na consistência local e global (LGCLVOAuto), que corrige e redistribui as informações do rótulo para minimizar o erro "leave-one-out" (deixa um rótulo de fora), ao mesmo tempo mantendo-se consistente com o processo de passeio aleatório imposto por sua linha de base, o algoritmo de Consistência Local e Global (LGC). Exploramos o problema da dominância diagonal em soluções do LGC e sua possível relação com o sobreajuste, e como zerando essa diagonal leva ao custo desejado. Fazemos uso da otimização via gradiente descendente nos rótulos para minimizar esse custo, transferindo parte da confiança nos próprios rótulos para o modelo de propagação. Para eliminar soluções degeneradas, algumas restrições são postas em prática: os rótulos não podem mudar de classe e a contribuição geral de cada classe deve permanecer a mesma. A otimização requer apenas as relações entre os rótulos: consequentemente, é adequado para conjuntos de dados moderadamente grandes, em particular quando os dados rotulados são escassos. Requer um único parâmetro. Em teoria, ele pode ser estendido trivialmente para uma generalização de sua linha de base. Os resultados mostram que LGCLVOAuto é capaz de superar sua linha de base com sobra quando há ruído e atrapalha pouco no cenário sem ruído, sendo uma ferramenta útil para a detecção de rótulos ruidosos. Além disso, é competitivo com outros métodos que requerem mais parâmetros.Não recebi financiamentoUniversidade Federal de São PauloBerton, Lilian [UNIFESP]http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfwhttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecwAfonso, Bruno Klaus de Aquino [UNIFESP]2021-03-23T21:12:56Z2021-03-23T21:12:56Z2020-10-10info:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/publishedVersion89 p.application/pdf@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }https://repositorio.unifesp.br/handle/11600/60704engSão José dos Camposinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESP2024-08-03T00:35:10Zoai:repositorio.unifesp.br/:11600/60704Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-08-03T00:35:10Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false
dc.title.none.fl_str_mv Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
spellingShingle Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
Afonso, Bruno Klaus de Aquino [UNIFESP]
Aprendizado de Máquina
Ruído de rótulo
Aprendizado Semissupervisionado baseado em Grafos
Filtro
Consistência Local e Global
title_short Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_full Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_fullStr Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_full_unstemmed Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_sort Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
author Afonso, Bruno Klaus de Aquino [UNIFESP]
author_facet Afonso, Bruno Klaus de Aquino [UNIFESP]
author_role author
dc.contributor.none.fl_str_mv Berton, Lilian [UNIFESP]
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfw
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecw
dc.contributor.author.fl_str_mv Afonso, Bruno Klaus de Aquino [UNIFESP]
dc.subject.por.fl_str_mv Aprendizado de Máquina
Ruído de rótulo
Aprendizado Semissupervisionado baseado em Grafos
Filtro
Consistência Local e Global
topic Aprendizado de Máquina
Ruído de rótulo
Aprendizado Semissupervisionado baseado em Grafos
Filtro
Consistência Local e Global
description We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.
publishDate 2020
dc.date.none.fl_str_mv 2020-10-10
2021-03-23T21:12:56Z
2021-03-23T21:12:56Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format bachelorThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv @misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }
https://repositorio.unifesp.br/handle/11600/60704
identifier_str_mv @misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }
url https://repositorio.unifesp.br/handle/11600/60704
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 89 p.
application/pdf
dc.coverage.none.fl_str_mv São José dos Campos
dc.publisher.none.fl_str_mv Universidade Federal de São Paulo
publisher.none.fl_str_mv Universidade Federal de São Paulo
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNIFESP
instname:Universidade Federal de São Paulo (UNIFESP)
instacron:UNIFESP
instname_str Universidade Federal de São Paulo (UNIFESP)
instacron_str UNIFESP
institution UNIFESP
reponame_str Repositório Institucional da UNIFESP
collection Repositório Institucional da UNIFESP
repository.name.fl_str_mv Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)
repository.mail.fl_str_mv biblioteca.csp@unifesp.br
_version_ 1814268424274051072