Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning

Afonso, Bruno Klaus de Aquino [UNIFESP]

Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning

Detalhes bibliográficos
Autor(a) principal:	Afonso, Bruno Klaus de Aquino [UNIFESP]
Data de Publicação:	2020
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	eng
Título da fonte:	Repositório Institucional da UNIFESP
Texto Completo:	https://repositorio.unifesp.br/handle/11600/60704
Resumo:	We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.

Metadados do item

id	UFSP_e4e55cf97312e69014ffdcb8b863b093
oai_identifier_str	oai:repositorio.unifesp.br/:11600/60704
network_acronym_str	UFSP
network_name_str	Repositório Institucional da UNIFESP
repository_id_str	3465
spelling	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningLgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learningAprendizado de MáquinaRuído de rótuloAprendizado Semissupervisionado baseado em GrafosFiltroConsistência Local e GlobalWe consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.Neste trabalho, consideramos o problema de aprendizado de máquina com rótulos ruidosos, no caso em que a maioria dos dados não são rotulados. Mais especificamente, nos concentramos na aprendizagem semissupervisionada baseada em grafos, em que muitas abordagens diferentes já foram propostas, como a norma l1, a busca de base de autofunções suave e a formulação bivariada. Propomos nosso próprio filtro semissupervisionado, nomeado Filtro automático leave-one-out com base na consistência local e global (LGCLVOAuto), que corrige e redistribui as informações do rótulo para minimizar o erro "leave-one-out" (deixa um rótulo de fora), ao mesmo tempo mantendo-se consistente com o processo de passeio aleatório imposto por sua linha de base, o algoritmo de Consistência Local e Global (LGC). Exploramos o problema da dominância diagonal em soluções do LGC e sua possível relação com o sobreajuste, e como zerando essa diagonal leva ao custo desejado. Fazemos uso da otimização via gradiente descendente nos rótulos para minimizar esse custo, transferindo parte da confiança nos próprios rótulos para o modelo de propagação. Para eliminar soluções degeneradas, algumas restrições são postas em prática: os rótulos não podem mudar de classe e a contribuição geral de cada classe deve permanecer a mesma. A otimização requer apenas as relações entre os rótulos: consequentemente, é adequado para conjuntos de dados moderadamente grandes, em particular quando os dados rotulados são escassos. Requer um único parâmetro. Em teoria, ele pode ser estendido trivialmente para uma generalização de sua linha de base. Os resultados mostram que LGCLVOAuto é capaz de superar sua linha de base com sobra quando há ruído e atrapalha pouco no cenário sem ruído, sendo uma ferramenta útil para a detecção de rótulos ruidosos. Além disso, é competitivo com outros métodos que requerem mais parâmetros.Não recebi financiamentoUniversidade Federal de São PauloBerton, Lilian [UNIFESP]http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfwhttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecwAfonso, Bruno Klaus de Aquino [UNIFESP]2021-03-23T21:12:56Z2021-03-23T21:12:56Z2020-10-10info:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/publishedVersion89 p.application/pdf@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }https://repositorio.unifesp.br/handle/11600/60704engSão José dos Camposinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESP2024-08-03T00:35:10Zoai:repositorio.unifesp.br/:11600/60704Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-08-03T00:35:10Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false
dc.title.none.fl_str_mv	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
spellingShingle	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning Afonso, Bruno Klaus de Aquino [UNIFESP] Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global
title_short	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_full	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_fullStr	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_full_unstemmed	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
title_sort	Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning
author	Afonso, Bruno Klaus de Aquino [UNIFESP]
author_facet	Afonso, Bruno Klaus de Aquino [UNIFESP]
author_role	author
dc.contributor.none.fl_str_mv	Berton, Lilian [UNIFESP] http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4266309U2&tokenCaptchar=03AGdBq24j3FaAQ2l3vC06a2EYNuDIubiizZuVcnlV6fI5G1vJpNj8yNRi7jWJTFyGDAnUm6aRVZH8qSDGWEhPG1WaS1YLF_rtjc_lKyHWhzMYbnbmgbDIkdx3Arfv0Y4Oxpng_A1ibsDpfkGlhbNmRfp15hn1niO3vRe6xySPQOvpTgwa1REqi06sQ189zgVOUJ30o2viuA_sRZzO2_1rwpHcCOGB2Sc1WzcAFv4ocwniB-c3tq3Z47nOwJzqBenCMPFqhaistyXaPhO9c7uhZ8ElxS2u5VAmtG-pmIGHecfkKgahkdZzFhGz4Vfj_HGh9CNcx47aLK-HBVWUobAZcmMbtV7E9nJ_ycZuw6U-EiIErMXYRR0at53Ep97GWvd56Mitbj3WaGTHKuIS1R3veldf4F5Pfl6MIt4wcPZdtybEF9cMM01NaHRFo0CM-EhPQ_-j7oQGw9MWg8LzVoq33R0cXIQQw9dhNIezaLsnSOILFNcepCD_ifwg5XYXCMdC24Ul046sVjE2qqGjlg3x6MHEleXHwxLRfw http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8584642H3&tokenCaptchar=03AGdBq2638O_blQ7hqTlWZWIOTWvEUzqH53xZpspYlXm0MgoVA_h2APnQ3iXz7gle5FAdPB0yb7bw9z4swkUK5O8bunl3qZiGgPSmCkH7oLxahXpqhqNFyrU34bQAK6FJUDUAVB8LHCBXrGTT1a2fEecpL6k1ilTePqIvsy38KV3yawi3wxCJ3Yvs022uS8V8CcZ1ZRr-wy_DPL5MuqbphWGVUyPGIHOcvBYUrtCmLY7Jv8we0Z52vRacc_OFZuLCbumeJfyzk2iccjIADhaQfOJKKbJoGoEzIuSJMd8el6OFWJfXXppuJ-u-CNkCSdQopQMoAM9XJlaB9hJSrc4Ko3GwJzlQqUb_GUQ32GQTpzh2r6Pzu-uP4siVLJuqiRa_PCpfCIvw-DInwkHL5RHnjDG4Ns1ob1xJ85gEcrIVUnzGG3CY5zEAmImmE1x1JwbgBOG85NlPy_Z6E2QlGKq-KVjwQ5oaRpEqZz9amfcCPYmJE7QmqOX2k77M6_Y7OEiyBPYAXOV9IPny2XWy-d_1U27QThg5n7lecw
dc.contributor.author.fl_str_mv	Afonso, Bruno Klaus de Aquino [UNIFESP]
dc.subject.por.fl_str_mv	Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global
topic	Aprendizado de Máquina Ruído de rótulo Aprendizado Semissupervisionado baseado em Grafos Filtro Consistência Local e Global
description	We consider the problem of learning with noisy labels where most of the data is unlabelled. More specifically, we focus on graph-based semi-supervised learning, a setting in which many different approaches have already been proposed, such as $\ell_1$ norm, smooth eigenbasis pursuit and bivariate formulation. We propose our own semi-supervised filter, named Automatic Leave-One-Out Filter based on Local and Global Consistency (LGCLVOAuto), that corrects and redistributes label information in order to minimize leave-one-out error, while remaining consistent with the random walk process imposed by its baseline, the Local and Global Consistency (LGC) algorithm. We explore the problem of diagonal dominance in LGC solutions and its possible relation to overfitting, and how setting it to zero leads to the leave-one-out cost. We make use of gradient descent optimization on labels to minimize this cost, transferring some of the trust from the labels themselves to the propagation model. In order to eliminate degenerate solutions, some restrictions are put in place: labels cannot change class, and the overall contribution for each class should remain the same. The optimization requires only the relations between labels: consequently, it is suited to moderately large datasets such as MNIST, in particular when labelled data is scarce. It requires a single parameter. In theory, it may be extended trivially to the more general LapRLS classifier. Results show that LGCLVOAuto is capable of outperforming its baseline significantly when there is noise, and not be too harmful in the noiseless scenario. Moreover, it is competitive with other methods that require more parameters.
publishDate	2020
dc.date.none.fl_str_mv	2020-10-10 2021-03-23T21:12:56Z 2021-03-23T21:12:56Z
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} } https://repositorio.unifesp.br/handle/11600/60704
identifier_str_mv	@misc{Klaus_2020_TCC, author = {Afonso, B.K. de A.}, title = "{Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning}", note = {Undergraduate thesis (BSc. in Computer Science), ICT/UNIFESP (Institute of Science and Technology of the Federal University of São Paulo), São José dos Campos, Brazil}, year = {2020}} }
url	https://repositorio.unifesp.br/handle/11600/60704
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	89 p. application/pdf
dc.coverage.none.fl_str_mv	São José dos Campos
dc.publisher.none.fl_str_mv	Universidade Federal de São Paulo
publisher.none.fl_str_mv	Universidade Federal de São Paulo
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UNIFESP instname:Universidade Federal de São Paulo (UNIFESP) instacron:UNIFESP
instname_str	Universidade Federal de São Paulo (UNIFESP)
instacron_str	UNIFESP
institution	UNIFESP
reponame_str	Repositório Institucional da UNIFESP
collection	Repositório Institucional da UNIFESP
repository.name.fl_str_mv	Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)
repository.mail.fl_str_mv	biblioteca.csp@unifesp.br
_version_	1814268424274051072

Lgclvoauto: correction of labels with gradient descent optimization for graph-based semi-supervised learning

Registros relacionados