Statistical physics analysis of machine learning models

Veiga, Rodrigo Soares

Statistical physics analysis of machine learning models

Detalhes bibliográficos
Autor(a) principal:	Veiga, Rodrigo Soares
Data de Publicação:	2022
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://www.teses.usp.br/teses/disponiveis/43/43134/tde-17082022-084404/
Resumo:	This thesis presents three main contributions to the understanding of machine learning models by making use of statistical physics tools. First, we investigate the possible relation between the renormalisation group and restricted Boltzmann machines trained with two-dimensional ferromagnetic Ising data, pointing out possible misleadings in preliminary proposals to explicitly construct this bridge. Secondly, we examine the convergence behaviour of stochastic gradient descent in high-dimensional two-layer neural networks. By building up on classic statistical physics approaches and extending them to a broad range of learning rate, time scales, and hidden layer width, we construct a phase diagram describing the various learning scenarios arising in the high-dimensional setting. We also discuss the trade-off between learning rate and hidden layer width, which has been crucial in the recent mean-field theories. Thirdly, we study both Bayes-optimal and empirical risk minimization generalisation errors of the multi-class teacher-student perceptron. We characterise a first-order phase transition arising in the Bayes-optimal performance for Rademacher teacher weights and observe that, for Gaussian teachers, regularised cross-entropy minimisation can yield to close-to-optimal performance.

Metadados do item

id	USP_c4e7b9c2313574388065cb3646a29ecd
oai_identifier_str	oai:teses.usp.br:tde-17082022-084404
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Statistical physics analysis of machine learning modelsAnálise de física estatística em modelos de aprendizado de máquinaAprendizado computacionalBayesian inferenceHigh-dimensional statisticsInferência BayesianaMachine learningMecânica estatísticaMétodos estatísticos para aprendizagemNeural networksRedes neuraisStatistical physicsStochastic gradient descentThis thesis presents three main contributions to the understanding of machine learning models by making use of statistical physics tools. First, we investigate the possible relation between the renormalisation group and restricted Boltzmann machines trained with two-dimensional ferromagnetic Ising data, pointing out possible misleadings in preliminary proposals to explicitly construct this bridge. Secondly, we examine the convergence behaviour of stochastic gradient descent in high-dimensional two-layer neural networks. By building up on classic statistical physics approaches and extending them to a broad range of learning rate, time scales, and hidden layer width, we construct a phase diagram describing the various learning scenarios arising in the high-dimensional setting. We also discuss the trade-off between learning rate and hidden layer width, which has been crucial in the recent mean-field theories. Thirdly, we study both Bayes-optimal and empirical risk minimization generalisation errors of the multi-class teacher-student perceptron. We characterise a first-order phase transition arising in the Bayes-optimal performance for Rademacher teacher weights and observe that, for Gaussian teachers, regularised cross-entropy minimisation can yield to close-to-optimal performance.Esta tese apresenta três contribuições principais para a compreensão de modelos de aprendizado de máquina por meio de ferramentas de física estatística. Primeiramente, investigamos a possível relação entre o grupo de renormalização e máquinas de Boltzmann restritas treinadas com dados amostrados do modelo de Ising ferromagnético bidimensional, apontando problemas em propostas preliminares para construir explicitamente essa ponte. Em segundo lugar, examinamos o comportamento da convergência do algoritmo de descida do gradiente estocástico em redes neurais de duas camadas no limite de alta dimensão. Com base nas abordagens clássicas da física estatística e estendendo-as para uma ampla faixa de taxa de aprendizado, escalas de tempo e tamanho da camada oculta, construímos um diagrama de fase descrevendo os vários cenários de aprendizado que surgem no limite de alta dimensionalidade. Também discutimos a relação entre a taxa de aprendizado e o tamanho da camada oculta, o que tem sido crucial nas recentes teorias de campo médio. Em terceiro lugar, estudamos os erros de generalização Bayes-ótimo e empírico do perceptron multi-classe no cenário professor-aluno. Caracterizamos uma transição de fase de primeira ordem na performance Bayes-ótimo para professores com acoplamentos Rademacher e observamos que, para professores com acoplamentos gaussianos, a minimização de entropia cruzada com regularização pode resultar em desempenho próximo ao Bayes-ótimo.Biblioteca Digitais de Teses e Dissertações da USPVicente, RenatoVeiga, Rodrigo Soares2022-08-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/43/43134/tde-17082022-084404/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2022-10-18T21:16:32Zoai:teses.usp.br:tde-17082022-084404Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212022-10-18T21:16:32Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Statistical physics analysis of machine learning models Análise de física estatística em modelos de aprendizado de máquina
title	Statistical physics analysis of machine learning models
spellingShingle	Statistical physics analysis of machine learning models Veiga, Rodrigo Soares Aprendizado computacional Bayesian inference High-dimensional statistics Inferência Bayesiana Machine learning Mecânica estatística Métodos estatísticos para aprendizagem Neural networks Redes neurais Statistical physics Stochastic gradient descent
title_short	Statistical physics analysis of machine learning models
title_full	Statistical physics analysis of machine learning models
title_fullStr	Statistical physics analysis of machine learning models
title_full_unstemmed	Statistical physics analysis of machine learning models
title_sort	Statistical physics analysis of machine learning models
author	Veiga, Rodrigo Soares
author_facet	Veiga, Rodrigo Soares
author_role	author
dc.contributor.none.fl_str_mv	Vicente, Renato
dc.contributor.author.fl_str_mv	Veiga, Rodrigo Soares
dc.subject.por.fl_str_mv	Aprendizado computacional Bayesian inference High-dimensional statistics Inferência Bayesiana Machine learning Mecânica estatística Métodos estatísticos para aprendizagem Neural networks Redes neurais Statistical physics Stochastic gradient descent
topic	Aprendizado computacional Bayesian inference High-dimensional statistics Inferência Bayesiana Machine learning Mecânica estatística Métodos estatísticos para aprendizagem Neural networks Redes neurais Statistical physics Stochastic gradient descent
description	This thesis presents three main contributions to the understanding of machine learning models by making use of statistical physics tools. First, we investigate the possible relation between the renormalisation group and restricted Boltzmann machines trained with two-dimensional ferromagnetic Ising data, pointing out possible misleadings in preliminary proposals to explicitly construct this bridge. Secondly, we examine the convergence behaviour of stochastic gradient descent in high-dimensional two-layer neural networks. By building up on classic statistical physics approaches and extending them to a broad range of learning rate, time scales, and hidden layer width, we construct a phase diagram describing the various learning scenarios arising in the high-dimensional setting. We also discuss the trade-off between learning rate and hidden layer width, which has been crucial in the recent mean-field theories. Thirdly, we study both Bayes-optimal and empirical risk minimization generalisation errors of the multi-class teacher-student perceptron. We characterise a first-order phase transition arising in the Bayes-optimal performance for Rademacher teacher weights and observe that, for Gaussian teachers, regularised cross-entropy minimisation can yield to close-to-optimal performance.
publishDate	2022
dc.date.none.fl_str_mv	2022-08-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/43/43134/tde-17082022-084404/
url	https://www.teses.usp.br/teses/disponiveis/43/43134/tde-17082022-084404/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815257046839197696

Statistical physics analysis of machine learning models

Registros relacionados