Development of estimation of distribution algorithms for linear genetic programming

Detalhes bibliográficos
Autor(a) principal: Sotto, Leo Francoso Dal Piccol [UNIFESP]
Data de Publicação: 2020
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UNIFESP
Texto Completo: https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=10889466
https://hdl.handle.net/11600/64929
Resumo: Linear Genetic Programming (LGP) is a Genetic Programming (GP) variant that has been successfully applied in various domains, such as regression, classification, and navigation. Differently from traditional GP, that represents programs as trees, LGP uses lists of instructions, which causes the data flow to be represented as a Directed Acyclic Graph (DAG) and introduces features as, for example, non-effective code and code reuse. As in other Evolutionary Algorithms (EAs), LGP’s stochastic search process neither has the knowledge to produce good quality solutions nor is it able to avoid poor quality programs, which reduces its efficacy. Furthermore, their recombination operators often ignore the correlation between the different positions in the genotype. To deal with these issues in EAs, researchers proposed the Estimation of Distribution Algorithms (EDAs), that use a probability model to sample promising solutions instead of applying recombination operators. The first goal of this PhD thesis is to propose EDAs for LGP that can make use of the LGP representation features and model dependencies between variables. Two forms of doing that are explored: 1) Adapting a Stochastic Context-free Grammar (SCFG) to sample sequences of instructions instead of derivation trees and integrating it in LGP via hybrid versions that combine the use of the grammar with the application of traditional LGP genetic operators; 2) Creating an intermediary integer vector that represents a sequence of instructions and using it to build a Bayesian Network. The resulting techniques are validated on regression and classification problems, and can outperform LGP when the hybrid version is considered. The thesis also address challenges in designing EDAs for the LGP representation. Given that the LGP representation features play an important role in how new methods should be designed, research was also conducted on the role of non-effective instructions in LGP and the impact of the DAG representation compared to the standard tree representation, in order to better understand how the technique works and thus to improve the design of new methods based on it. The conclusions are that non-effective instructions are an important component of LGP programs, although its benefits are dependent on how the algorithm is used, and it is also shown that DAGs present a great advantage over trees for solving determined classes of problems, specially design of digital circuits.
id UFSP_475f13b058f23619864b462f89812cec
oai_identifier_str oai:repositorio.unifesp.br/:11600/64929
network_acronym_str UFSP
network_name_str Repositório Institucional da UNIFESP
repository_id_str 3465
spelling Development of estimation of distribution algorithms for linear genetic programmingLinear Genetic ProgrammingEstimation Of Distribution AlgorithmsRegressionClassificationDigital CircuitsProgramação Genética LinearEstimação De Algoritmos De DistribuiçãoRegressãoClassificaçãoCircuitos DigitaisLinear Genetic Programming (LGP) is a Genetic Programming (GP) variant that has been successfully applied in various domains, such as regression, classification, and navigation. Differently from traditional GP, that represents programs as trees, LGP uses lists of instructions, which causes the data flow to be represented as a Directed Acyclic Graph (DAG) and introduces features as, for example, non-effective code and code reuse. As in other Evolutionary Algorithms (EAs), LGP’s stochastic search process neither has the knowledge to produce good quality solutions nor is it able to avoid poor quality programs, which reduces its efficacy. Furthermore, their recombination operators often ignore the correlation between the different positions in the genotype. To deal with these issues in EAs, researchers proposed the Estimation of Distribution Algorithms (EDAs), that use a probability model to sample promising solutions instead of applying recombination operators. The first goal of this PhD thesis is to propose EDAs for LGP that can make use of the LGP representation features and model dependencies between variables. Two forms of doing that are explored: 1) Adapting a Stochastic Context-free Grammar (SCFG) to sample sequences of instructions instead of derivation trees and integrating it in LGP via hybrid versions that combine the use of the grammar with the application of traditional LGP genetic operators; 2) Creating an intermediary integer vector that represents a sequence of instructions and using it to build a Bayesian Network. The resulting techniques are validated on regression and classification problems, and can outperform LGP when the hybrid version is considered. The thesis also address challenges in designing EDAs for the LGP representation. Given that the LGP representation features play an important role in how new methods should be designed, research was also conducted on the role of non-effective instructions in LGP and the impact of the DAG representation compared to the standard tree representation, in order to better understand how the technique works and thus to improve the design of new methods based on it. The conclusions are that non-effective instructions are an important component of LGP programs, although its benefits are dependent on how the algorithm is used, and it is also shown that DAGs present a great advantage over trees for solving determined classes of problems, specially design of digital circuits.Programação Genética Linear (LGP) é uma variante da Programação Genética (GP) aplicada com sucesso em vários domínios, como regressão, classificação, e navegação. Diferentemente da GP tradicional, que representa programas como árvores, a LGP usa listas de instruções, o que faz com que o fluxo de dados dos programas seja representado como um Grafo Direcionado Acíclico (DAG) e introduz características como código não-efetivo e reúso de código. Assim como em outros Algoritmos Evolutivos (EAs), o processo estocástico de busca da LGP não tem conhecimento para produzir soluções de alta qualidade e não é capaz de evitar programas de baixa qualidade, o que reduz sua eficácia. Além disso, seus operadores de recombinação ignoram a correlação entre diferentes posições do genótipo. Para lidar com esses problemas em EAs, pesquisadores propuseram os Algoritmos de Estimação de Distribuição (EDAs), que usam um modelo probabilístico para amostrar soluções ao invés de aplicar operadores genéticos. O primeiro objetivo desta tese de doutorado é propor EDAs para a LGP que façam uso das propriedades de sua representação e modelem dependências entre variáveis. Duas formas de se fazer isso são exploradas: 1) Adaptar uma Gramática Livre de Contexto Estocástica (SCFG) para amostrar sequências de instruções ao invés de árvores de derivação e integrá-las ao LGP através de versões híbridas que combinam o uso da gramática com a aplicação de operadores genéticos; 2) Criar um vetor de inteiros intermediário para representar uma sequência de instruções e usá-lo para construir uma Rede Bayesiana. As técnicas resultantes são validadas em problemas de regressão e classificação, e são capazes de superar a LGP através das versões híbridas. A tese também discute desafios para se desenvolver EDAs para a representação da LGP. Considerando que as características da representação da LGP são um aspecto importante no design de novos métodos, também se pesquisou o papel de instruções não-efetivas na LGP e o impacto da representação por DAGs comparada com a representação por árvores, com o objetivo de se entender melhor como a técnica funciona e melhorar o desenvolvimento de novos métodos baseados nela. Conclui-se que intruções não-efetivas são um componente importante da LGP, mas seus benefícios dependem de como o algoritmo é usado, e também é demonstrado que DAGs apresentam uma grade vantagem em relação a árvores em determinadas classes de problemas, especialmente evolução de circuitos digitais.Dados abertos - Sucupira - Teses e dissertações (2020)Universidade Federal de São Paulo (UNIFESP)Basgalupp, Marcio Porto [UNIFESP]Universidade Federal de São PauloSotto, Leo Francoso Dal Piccol [UNIFESP]2022-07-25T14:20:52Z2022-07-25T14:20:52Z2020-08-18info:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/publishedVersion178 p.application/pdfhttps://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=10889466LEO FRANCOSO DAL PICCOL SOTTO.pdfhttps://hdl.handle.net/11600/64929enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESP2024-07-27T04:33:36Zoai:repositorio.unifesp.br/:11600/64929Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-07-27T04:33:36Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false
dc.title.none.fl_str_mv Development of estimation of distribution algorithms for linear genetic programming
title Development of estimation of distribution algorithms for linear genetic programming
spellingShingle Development of estimation of distribution algorithms for linear genetic programming
Sotto, Leo Francoso Dal Piccol [UNIFESP]
Linear Genetic Programming
Estimation Of Distribution Algorithms
Regression
Classification
Digital Circuits
Programação Genética Linear
Estimação De Algoritmos De Distribuição
Regressão
Classificação
Circuitos Digitais
title_short Development of estimation of distribution algorithms for linear genetic programming
title_full Development of estimation of distribution algorithms for linear genetic programming
title_fullStr Development of estimation of distribution algorithms for linear genetic programming
title_full_unstemmed Development of estimation of distribution algorithms for linear genetic programming
title_sort Development of estimation of distribution algorithms for linear genetic programming
author Sotto, Leo Francoso Dal Piccol [UNIFESP]
author_facet Sotto, Leo Francoso Dal Piccol [UNIFESP]
author_role author
dc.contributor.none.fl_str_mv Basgalupp, Marcio Porto [UNIFESP]
Universidade Federal de São Paulo
dc.contributor.author.fl_str_mv Sotto, Leo Francoso Dal Piccol [UNIFESP]
dc.subject.por.fl_str_mv Linear Genetic Programming
Estimation Of Distribution Algorithms
Regression
Classification
Digital Circuits
Programação Genética Linear
Estimação De Algoritmos De Distribuição
Regressão
Classificação
Circuitos Digitais
topic Linear Genetic Programming
Estimation Of Distribution Algorithms
Regression
Classification
Digital Circuits
Programação Genética Linear
Estimação De Algoritmos De Distribuição
Regressão
Classificação
Circuitos Digitais
description Linear Genetic Programming (LGP) is a Genetic Programming (GP) variant that has been successfully applied in various domains, such as regression, classification, and navigation. Differently from traditional GP, that represents programs as trees, LGP uses lists of instructions, which causes the data flow to be represented as a Directed Acyclic Graph (DAG) and introduces features as, for example, non-effective code and code reuse. As in other Evolutionary Algorithms (EAs), LGP’s stochastic search process neither has the knowledge to produce good quality solutions nor is it able to avoid poor quality programs, which reduces its efficacy. Furthermore, their recombination operators often ignore the correlation between the different positions in the genotype. To deal with these issues in EAs, researchers proposed the Estimation of Distribution Algorithms (EDAs), that use a probability model to sample promising solutions instead of applying recombination operators. The first goal of this PhD thesis is to propose EDAs for LGP that can make use of the LGP representation features and model dependencies between variables. Two forms of doing that are explored: 1) Adapting a Stochastic Context-free Grammar (SCFG) to sample sequences of instructions instead of derivation trees and integrating it in LGP via hybrid versions that combine the use of the grammar with the application of traditional LGP genetic operators; 2) Creating an intermediary integer vector that represents a sequence of instructions and using it to build a Bayesian Network. The resulting techniques are validated on regression and classification problems, and can outperform LGP when the hybrid version is considered. The thesis also address challenges in designing EDAs for the LGP representation. Given that the LGP representation features play an important role in how new methods should be designed, research was also conducted on the role of non-effective instructions in LGP and the impact of the DAG representation compared to the standard tree representation, in order to better understand how the technique works and thus to improve the design of new methods based on it. The conclusions are that non-effective instructions are an important component of LGP programs, although its benefits are dependent on how the algorithm is used, and it is also shown that DAGs present a great advantage over trees for solving determined classes of problems, specially design of digital circuits.
publishDate 2020
dc.date.none.fl_str_mv 2020-08-18
2022-07-25T14:20:52Z
2022-07-25T14:20:52Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=10889466
LEO FRANCOSO DAL PICCOL SOTTO.pdf
https://hdl.handle.net/11600/64929
url https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=10889466
https://hdl.handle.net/11600/64929
identifier_str_mv LEO FRANCOSO DAL PICCOL SOTTO.pdf
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 178 p.
application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de São Paulo (UNIFESP)
publisher.none.fl_str_mv Universidade Federal de São Paulo (UNIFESP)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNIFESP
instname:Universidade Federal de São Paulo (UNIFESP)
instacron:UNIFESP
instname_str Universidade Federal de São Paulo (UNIFESP)
instacron_str UNIFESP
institution UNIFESP
reponame_str Repositório Institucional da UNIFESP
collection Repositório Institucional da UNIFESP
repository.name.fl_str_mv Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)
repository.mail.fl_str_mv biblioteca.csp@unifesp.br
_version_ 1814268339455787008