Scalable learning of probabilistic circuits

Geh, Renato Lui

Scalable learning of probabilistic circuits

Detalhes bibliográficos
Autor(a) principal:	Geh, Renato Lui
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052022-122922/
Resumo:	The rising popularity of generative models together with the growing need for flexible and exact inferences have motivated the machine learning community to look for expressive yet tractable probabilistic models. Probabilistic circuits (PCs) are a family of tractable probabilistic models capable of answering a wide range of queries exactly and in polynomial time. Their operational syntax in the form of a computational graph and their principled probabilistic semantics allow their parameters to be estimated by the highly scalable and efficient optimization techniques used in deep learning. Importantly, tractability is tightly linked to constraints on their underlying graph: by enforcing certain structural assumptions, queries like marginals, maximum a posteriori or entropy become linear time computable while still retaining great expressivity. While inference is usually straightforward, learning PCs that both obey the needed structural restrictions and exploit their expressive power has proven a challenge. Current state-of-the-art structure learning algorithms for PCs can be roughly divided into three main categories. Most learning algorithms seek to generate a usually tree-shaped circuit from recursive decompositions on data, often through clustering and costly statistical (in)dependence tests, which can become prohibitive in higher dimensional data. Alternatively, other approaches involve constructing an intricate network by growing an initial circuit through structural preserving iterative methods. Besides depending on a sufficiently expressive initial structure, these can possibly take several minutes per iteration and many iterations until visible improvement. Lastly, other approaches involve randomly generating a probabilistic circuit by some criterion. Although usually less performant compared to other methods, random PCs are orders of magnitude more time efficient. With this in mind, this dissertation aims to propose fast and scalable random structure learning algorithms for PCs from two different standpoints: from a logical point of view, we efficiently construct a highly structured binary PC that takes certain knowledge in the form of logical constraints and scalably translate them into a probabilistic circuit; from the viewpoint of data guided structure search, we propose hierarchically building PCs from random hyperplanes. We empirically show that either approach is competitive against state-of-the-art methods of the same class, and that their performance can be further boosted by simple ensemble strategies.

Metadados do item

id	USP_a9aae99ca1a23411efc284e3e9e6a325
oai_identifier_str	oai:teses.usp.br:tde-23052022-122922
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Scalable learning of probabilistic circuitsAprendizado escalável de circuitos probabilísticosAprendizado de máquinaCircuitos probabilísticosMachine learningModelos probabilísticosProbabilistic circuitsProbabilistic modelsThe rising popularity of generative models together with the growing need for flexible and exact inferences have motivated the machine learning community to look for expressive yet tractable probabilistic models. Probabilistic circuits (PCs) are a family of tractable probabilistic models capable of answering a wide range of queries exactly and in polynomial time. Their operational syntax in the form of a computational graph and their principled probabilistic semantics allow their parameters to be estimated by the highly scalable and efficient optimization techniques used in deep learning. Importantly, tractability is tightly linked to constraints on their underlying graph: by enforcing certain structural assumptions, queries like marginals, maximum a posteriori or entropy become linear time computable while still retaining great expressivity. While inference is usually straightforward, learning PCs that both obey the needed structural restrictions and exploit their expressive power has proven a challenge. Current state-of-the-art structure learning algorithms for PCs can be roughly divided into three main categories. Most learning algorithms seek to generate a usually tree-shaped circuit from recursive decompositions on data, often through clustering and costly statistical (in)dependence tests, which can become prohibitive in higher dimensional data. Alternatively, other approaches involve constructing an intricate network by growing an initial circuit through structural preserving iterative methods. Besides depending on a sufficiently expressive initial structure, these can possibly take several minutes per iteration and many iterations until visible improvement. Lastly, other approaches involve randomly generating a probabilistic circuit by some criterion. Although usually less performant compared to other methods, random PCs are orders of magnitude more time efficient. With this in mind, this dissertation aims to propose fast and scalable random structure learning algorithms for PCs from two different standpoints: from a logical point of view, we efficiently construct a highly structured binary PC that takes certain knowledge in the form of logical constraints and scalably translate them into a probabilistic circuit; from the viewpoint of data guided structure search, we propose hierarchically building PCs from random hyperplanes. We empirically show that either approach is competitive against state-of-the-art methods of the same class, and that their performance can be further boosted by simple ensemble strategies.A crescente popularidade de modelos gerativos, assim como o aumento da demanda por modelos que produzam inferência exata e de forma flexível vêm motivando a comunidade de aprendizado de máquina a procurar por modelos probabilísticos que sejam tanto expressivos quanto tratáveis. Circuitos probabilísticos (PC, do inglês probabilistic circuit) são uma família de modelos probabilísticos tratáveis capazes de responder uma vasta gama de consultas de forma exata e em tempo polinomial. Sua sintaxe operacional concretizada por um grafo computacional, junto a sua semântica probabilística possibilitam que seus parâmetros sejam estimados pelas eficientes e altamente escaláveis técnicas utilizadas em aprendizado profundo. Notavelmente, tratabilidade está fortemente ligada às restrições impostas no grafo subjacente: ao impor certas restrições gráficas, consultas como probabilidade marginal, maximum a posteriori ou entropia tornam-se computáveis em tempo linear, ao mesmo tempo retendo alta expressividade. Enquanto que inferência é, de forma geral, descomplicada, a tarefa de aprender PCs de forma que os circuitos tanto observem as restrições estruturais necessárias quanto explorem sua expressividade tem se provado um desafio. O atual estado-da-arte para algoritmos de aprendizado estrutural de PCs pode ser grosseiramente dividido em três categorias principais. A maior parte dos algoritmos de aprendizado buscam gerar um circuito em formato de árvore através de decomposições recursivas nos dados, na maior parte das vezes através de algoritmos de clustering e custosos testes de independência estatística, o que pode tornar o processo inviável em altas dimensões. Alternativamente, outras técnicas envolvem construir uma complexa rede por meio de métodos incrementais iterativos que preservem uma certa estrutura do grafo. Além desta técnica depender de um circuito inicial suficientemente expressivo, tais métodos podem demorar vários minutos por iteração, e muitas iterações até que haja uma melhora visível. Por último, outras alternativas envolvem gerar aleatoriamente um circuito probabilístico através de algum critério. Apesar desta técnica normalmente gerar modelos menos performativos quando comparados com outros métodos, PCs aleatórios são ordens de grandeza mais eficiente em relação a tempo de execução. Com isso em mente, esta dissertação busca propor algoritmos de aprendizado estrutural de PCs que sejam rápidos e escaláveis através de duas lentes distintas: de um ponto de vista lógico, buscamos construir um PC sob variáveis binárias altamente estruturado que tome conhecimento certo na forma de restrições lógicas, e traduza-as em um circuito probabilístico de forma escalável; por meio da ótica de busca por estruturas guiada por dados, nós propomos construir PCs de forma hierárquica por meio de hiperplanos aleatórios. Nós mostramos, de forma empírica, que ambas são competitivas comparadas ao estado-da-arte, e que podemos melhorar sua performance por meio de estratégias simples de ensembles.Biblioteca Digitais de Teses e Dissertações da USPMauá, Denis DerataniGeh, Renato Lui2022-04-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052022-122922/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2022-05-23T21:27:30Zoai:teses.usp.br:tde-23052022-122922Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212022-05-23T21:27:30Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Scalable learning of probabilistic circuits Aprendizado escalável de circuitos probabilísticos
title	Scalable learning of probabilistic circuits
spellingShingle	Scalable learning of probabilistic circuits Geh, Renato Lui Aprendizado de máquina Circuitos probabilísticos Machine learning Modelos probabilísticos Probabilistic circuits Probabilistic models
title_short	Scalable learning of probabilistic circuits
title_full	Scalable learning of probabilistic circuits
title_fullStr	Scalable learning of probabilistic circuits
title_full_unstemmed	Scalable learning of probabilistic circuits
title_sort	Scalable learning of probabilistic circuits
author	Geh, Renato Lui
author_facet	Geh, Renato Lui
author_role	author
dc.contributor.none.fl_str_mv	Mauá, Denis Deratani
dc.contributor.author.fl_str_mv	Geh, Renato Lui
dc.subject.por.fl_str_mv	Aprendizado de máquina Circuitos probabilísticos Machine learning Modelos probabilísticos Probabilistic circuits Probabilistic models
topic	Aprendizado de máquina Circuitos probabilísticos Machine learning Modelos probabilísticos Probabilistic circuits Probabilistic models
description	The rising popularity of generative models together with the growing need for flexible and exact inferences have motivated the machine learning community to look for expressive yet tractable probabilistic models. Probabilistic circuits (PCs) are a family of tractable probabilistic models capable of answering a wide range of queries exactly and in polynomial time. Their operational syntax in the form of a computational graph and their principled probabilistic semantics allow their parameters to be estimated by the highly scalable and efficient optimization techniques used in deep learning. Importantly, tractability is tightly linked to constraints on their underlying graph: by enforcing certain structural assumptions, queries like marginals, maximum a posteriori or entropy become linear time computable while still retaining great expressivity. While inference is usually straightforward, learning PCs that both obey the needed structural restrictions and exploit their expressive power has proven a challenge. Current state-of-the-art structure learning algorithms for PCs can be roughly divided into three main categories. Most learning algorithms seek to generate a usually tree-shaped circuit from recursive decompositions on data, often through clustering and costly statistical (in)dependence tests, which can become prohibitive in higher dimensional data. Alternatively, other approaches involve constructing an intricate network by growing an initial circuit through structural preserving iterative methods. Besides depending on a sufficiently expressive initial structure, these can possibly take several minutes per iteration and many iterations until visible improvement. Lastly, other approaches involve randomly generating a probabilistic circuit by some criterion. Although usually less performant compared to other methods, random PCs are orders of magnitude more time efficient. With this in mind, this dissertation aims to propose fast and scalable random structure learning algorithms for PCs from two different standpoints: from a logical point of view, we efficiently construct a highly structured binary PC that takes certain knowledge in the form of logical constraints and scalably translate them into a probabilistic circuit; from the viewpoint of data guided structure search, we propose hierarchically building PCs from random hyperplanes. We empirically show that either approach is competitive against state-of-the-art methods of the same class, and that their performance can be further boosted by simple ensemble strategies.
publishDate	2022
dc.date.none.fl_str_mv	2022-04-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052022-122922/
url	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-23052022-122922/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815257363668533248

Scalable learning of probabilistic circuits

Registros relacionados