Exploring Visual Programming Concepts for Probabilistic Programming Languages

Detalhes bibliográficos
Autor(a) principal: Gabriel Cardoso Candal
Data de Publicação: 2016
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/85704
Resumo: Probabilistic programming is a way to create systems that help us make decisions in the face of uncertainty. Lots of everyday decisions involve judgment in determining relevant factors that we do not directly observe. Historically, one way to help make decisions under uncertainty has been to use a probabilistic reasoning system. Probabilistic reasoning combines our knowledge of a situation with the laws of probability to determine those unobserved factors that are critical to the decision. Typically, the way the several observations are combined is through the usage of bayesian statistics, due to its anachronistic interpretation where existing knowledge (priors) are combined with observations in order to gather evidence towards competing hypothesis. When compared to other machine learning methods (such as random forests, neural networks or linear regression), which take homogeneous data as input (requiring the user to separate their domain into different models), probabilistic programming is used to leverage the data's original structure. Plus, it provides full probability distributions over both the predictions and parameters of the model, whereas ML methods can only give the user a certain degree of confidence on the predictions. Until recently, probabilistic reasoning systems have been limited in scope, and have been hard to apply to many real world situations. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion. Probabilistic programming is a new approach that makes probabilistic reasoning systems easier to build and more widely applicable. A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models, in a such a way we can say that the program itself is the model, and then perform inference in those models. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. By empowering users with a common dialect in the form of a programming language, rather than requiring each one of them to the non-trivial and error-prone task of writing their own models and hand-tailored inference algorithms for the problem at hand, it encourages exploration, since different models require less time to setup and evaluate, and enables sharing knowledge in the form of best practices, patterns and tools such as optimized compilers or interpreters, debuggers, IDE's, optimizers and profilers. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. One can easily realize this by looking at the re-usable components PPLs offer, being one of them the inference engine, which can be plugged in into different models. For instances, it is easy to replace the exact-solution traditional Bayesian networks inference, which requires time exponential in the number of variables to run, with approximation algorithms such as the Markov Chain Monte Carlo (MCMC) or Variational Message Passing (VMP), which make it possible to compute large hierarchical models by resorting to sampling and approximation. PPLs often extend from a basic language (i.e., they are embedded in a host language like R, Java or Scala), although some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language. There have been successful applications of visual programming among several domains, being it education (MIT's Scratch and Microsoft's VPL), general-purpose programming (NoFlo), 3D modeling (Blender) and data science (RapidMiner and Weka Knowledge Flow). The latter, being popular products, have shown that there is added value in providing a graphical representation for working with data. However, as of today no tool provides a graphical representation for a PPL. DARPA, the main funder behind PPLs' research, considers one of the main key points of its Probabilistic Programming for Advancing Machine Learning program to make models easier to write (reducing development time, encouraging experimentation and reducing the level of expertise required to develop such models). The use of visual programming is suitable for this kind of objectives, so building upon the enormous flexibility of PPLs and the advantages of probabilistic models, we want to take advantage of the graphical intuition given by data visualization that data scientists are now accustomed to, and attempt to provide model and algorithmical visualization by rethinking how to capture the (usually textual) programmatic formalisms in a graphical manner. The goal of this dissertation is thus to explore graphical representations of a probabilistic programming language through the usage of node-based programming. The hypothesis under consideration is that graphical representations (not to be confused with bayesian graphical model), are more intuitive and easy to learn that full-blown PPLs. We intend to validate such hypothesis by ensuring that classical problems solved in the literature by PPLs are also supported by our graphical representation, and then measure how quickly a group of people trained in statistics would produce a viable model in both alternatives.
id RCAP_3a4169af8786b876d742c41b55756f97
oai_identifier_str oai:repositorio-aberto.up.pt:10216/85704
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Exploring Visual Programming Concepts for Probabilistic Programming LanguagesEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringProbabilistic programming is a way to create systems that help us make decisions in the face of uncertainty. Lots of everyday decisions involve judgment in determining relevant factors that we do not directly observe. Historically, one way to help make decisions under uncertainty has been to use a probabilistic reasoning system. Probabilistic reasoning combines our knowledge of a situation with the laws of probability to determine those unobserved factors that are critical to the decision. Typically, the way the several observations are combined is through the usage of bayesian statistics, due to its anachronistic interpretation where existing knowledge (priors) are combined with observations in order to gather evidence towards competing hypothesis. When compared to other machine learning methods (such as random forests, neural networks or linear regression), which take homogeneous data as input (requiring the user to separate their domain into different models), probabilistic programming is used to leverage the data's original structure. Plus, it provides full probability distributions over both the predictions and parameters of the model, whereas ML methods can only give the user a certain degree of confidence on the predictions. Until recently, probabilistic reasoning systems have been limited in scope, and have been hard to apply to many real world situations. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion. Probabilistic programming is a new approach that makes probabilistic reasoning systems easier to build and more widely applicable. A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models, in a such a way we can say that the program itself is the model, and then perform inference in those models. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. By empowering users with a common dialect in the form of a programming language, rather than requiring each one of them to the non-trivial and error-prone task of writing their own models and hand-tailored inference algorithms for the problem at hand, it encourages exploration, since different models require less time to setup and evaluate, and enables sharing knowledge in the form of best practices, patterns and tools such as optimized compilers or interpreters, debuggers, IDE's, optimizers and profilers. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. One can easily realize this by looking at the re-usable components PPLs offer, being one of them the inference engine, which can be plugged in into different models. For instances, it is easy to replace the exact-solution traditional Bayesian networks inference, which requires time exponential in the number of variables to run, with approximation algorithms such as the Markov Chain Monte Carlo (MCMC) or Variational Message Passing (VMP), which make it possible to compute large hierarchical models by resorting to sampling and approximation. PPLs often extend from a basic language (i.e., they are embedded in a host language like R, Java or Scala), although some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language. There have been successful applications of visual programming among several domains, being it education (MIT's Scratch and Microsoft's VPL), general-purpose programming (NoFlo), 3D modeling (Blender) and data science (RapidMiner and Weka Knowledge Flow). The latter, being popular products, have shown that there is added value in providing a graphical representation for working with data. However, as of today no tool provides a graphical representation for a PPL. DARPA, the main funder behind PPLs' research, considers one of the main key points of its Probabilistic Programming for Advancing Machine Learning program to make models easier to write (reducing development time, encouraging experimentation and reducing the level of expertise required to develop such models). The use of visual programming is suitable for this kind of objectives, so building upon the enormous flexibility of PPLs and the advantages of probabilistic models, we want to take advantage of the graphical intuition given by data visualization that data scientists are now accustomed to, and attempt to provide model and algorithmical visualization by rethinking how to capture the (usually textual) programmatic formalisms in a graphical manner. The goal of this dissertation is thus to explore graphical representations of a probabilistic programming language through the usage of node-based programming. The hypothesis under consideration is that graphical representations (not to be confused with bayesian graphical model), are more intuitive and easy to learn that full-blown PPLs. We intend to validate such hypothesis by ensuring that classical problems solved in the literature by PPLs are also supported by our graphical representation, and then measure how quickly a group of people trained in statistics would produce a viable model in both alternatives.2016-07-182016-07-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/85704TID:201303965engGabriel Cardoso Candalinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:01:42Zoai:repositorio-aberto.up.pt:10216/85704Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:52:51.946623Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Exploring Visual Programming Concepts for Probabilistic Programming Languages
title Exploring Visual Programming Concepts for Probabilistic Programming Languages
spellingShingle Exploring Visual Programming Concepts for Probabilistic Programming Languages
Gabriel Cardoso Candal
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Exploring Visual Programming Concepts for Probabilistic Programming Languages
title_full Exploring Visual Programming Concepts for Probabilistic Programming Languages
title_fullStr Exploring Visual Programming Concepts for Probabilistic Programming Languages
title_full_unstemmed Exploring Visual Programming Concepts for Probabilistic Programming Languages
title_sort Exploring Visual Programming Concepts for Probabilistic Programming Languages
author Gabriel Cardoso Candal
author_facet Gabriel Cardoso Candal
author_role author
dc.contributor.author.fl_str_mv Gabriel Cardoso Candal
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description Probabilistic programming is a way to create systems that help us make decisions in the face of uncertainty. Lots of everyday decisions involve judgment in determining relevant factors that we do not directly observe. Historically, one way to help make decisions under uncertainty has been to use a probabilistic reasoning system. Probabilistic reasoning combines our knowledge of a situation with the laws of probability to determine those unobserved factors that are critical to the decision. Typically, the way the several observations are combined is through the usage of bayesian statistics, due to its anachronistic interpretation where existing knowledge (priors) are combined with observations in order to gather evidence towards competing hypothesis. When compared to other machine learning methods (such as random forests, neural networks or linear regression), which take homogeneous data as input (requiring the user to separate their domain into different models), probabilistic programming is used to leverage the data's original structure. Plus, it provides full probability distributions over both the predictions and parameters of the model, whereas ML methods can only give the user a certain degree of confidence on the predictions. Until recently, probabilistic reasoning systems have been limited in scope, and have been hard to apply to many real world situations. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion. Probabilistic programming is a new approach that makes probabilistic reasoning systems easier to build and more widely applicable. A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models, in a such a way we can say that the program itself is the model, and then perform inference in those models. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. By empowering users with a common dialect in the form of a programming language, rather than requiring each one of them to the non-trivial and error-prone task of writing their own models and hand-tailored inference algorithms for the problem at hand, it encourages exploration, since different models require less time to setup and evaluate, and enables sharing knowledge in the form of best practices, patterns and tools such as optimized compilers or interpreters, debuggers, IDE's, optimizers and profilers. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. One can easily realize this by looking at the re-usable components PPLs offer, being one of them the inference engine, which can be plugged in into different models. For instances, it is easy to replace the exact-solution traditional Bayesian networks inference, which requires time exponential in the number of variables to run, with approximation algorithms such as the Markov Chain Monte Carlo (MCMC) or Variational Message Passing (VMP), which make it possible to compute large hierarchical models by resorting to sampling and approximation. PPLs often extend from a basic language (i.e., they are embedded in a host language like R, Java or Scala), although some PPLs such as WinBUGS and Stan offer a self-contained language, with no obvious origin in another language. There have been successful applications of visual programming among several domains, being it education (MIT's Scratch and Microsoft's VPL), general-purpose programming (NoFlo), 3D modeling (Blender) and data science (RapidMiner and Weka Knowledge Flow). The latter, being popular products, have shown that there is added value in providing a graphical representation for working with data. However, as of today no tool provides a graphical representation for a PPL. DARPA, the main funder behind PPLs' research, considers one of the main key points of its Probabilistic Programming for Advancing Machine Learning program to make models easier to write (reducing development time, encouraging experimentation and reducing the level of expertise required to develop such models). The use of visual programming is suitable for this kind of objectives, so building upon the enormous flexibility of PPLs and the advantages of probabilistic models, we want to take advantage of the graphical intuition given by data visualization that data scientists are now accustomed to, and attempt to provide model and algorithmical visualization by rethinking how to capture the (usually textual) programmatic formalisms in a graphical manner. The goal of this dissertation is thus to explore graphical representations of a probabilistic programming language through the usage of node-based programming. The hypothesis under consideration is that graphical representations (not to be confused with bayesian graphical model), are more intuitive and easy to learn that full-blown PPLs. We intend to validate such hypothesis by ensuring that classical problems solved in the literature by PPLs are also supported by our graphical representation, and then measure how quickly a group of people trained in statistics would produce a viable model in both alternatives.
publishDate 2016
dc.date.none.fl_str_mv 2016-07-18
2016-07-18T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/85704
TID:201303965
url https://hdl.handle.net/10216/85704
identifier_str_mv TID:201303965
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135848829026304