Dynamic equilibrium through reinforcement learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/1144 |
Resumo: | Reinforcement Learning is an area of Machine Learning that deals with how an agent should take actions in an environment such as to maximize the notion of accumulated reward. This type of learning is inspired by the way humans learn and has led to the creation of various algorithms for reinforcement learning. These algorithms focus on the way in which an agent’s behaviour can be improved, assuming independence as to their surroundings. The current work studies the application of reinforcement learning methods to solve the inverted pendulum problem. The importance of the variability of the environment (factors that are external to the agent) on the execution of reinforcement learning agents is studied by using a model that seeks to obtain equilibrium (stability) through dynamism – a Cart-Pole system or inverted pendulum. We sought to improve the behaviour of the autonomous agents by changing the information passed to them, while maintaining the agent’s internal parameters constant (learning rate, discount factors, decay rate, etc.), instead of the classical approach of tuning the agent’s internal parameters. The influence of changes on the state set and the action set on an agent’s capability to solve the Cart-pole problem was studied. We have studied typical behaviour of reinforcement learning agents applied to the classic BOXES model and a new form of characterizing the environment was proposed using the notion of convergence towards a reference value. We demonstrate the gain in performance of this new method applied to a Q-Learning agent. |
id |
RCAP_a867ece9480fa2529fc554cf494dfcfc |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/1144 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Dynamic equilibrium through reinforcement learningDynamic equilibriumEquilíbrio dinâmicoReinforcement learningAprendizagem por reforçoAutonomous agentsAgentes autónomosInverted pendulumPêndulo invertidoReinforcement Learning is an area of Machine Learning that deals with how an agent should take actions in an environment such as to maximize the notion of accumulated reward. This type of learning is inspired by the way humans learn and has led to the creation of various algorithms for reinforcement learning. These algorithms focus on the way in which an agent’s behaviour can be improved, assuming independence as to their surroundings. The current work studies the application of reinforcement learning methods to solve the inverted pendulum problem. The importance of the variability of the environment (factors that are external to the agent) on the execution of reinforcement learning agents is studied by using a model that seeks to obtain equilibrium (stability) through dynamism – a Cart-Pole system or inverted pendulum. We sought to improve the behaviour of the autonomous agents by changing the information passed to them, while maintaining the agent’s internal parameters constant (learning rate, discount factors, decay rate, etc.), instead of the classical approach of tuning the agent’s internal parameters. The influence of changes on the state set and the action set on an agent’s capability to solve the Cart-pole problem was studied. We have studied typical behaviour of reinforcement learning agents applied to the classic BOXES model and a new form of characterizing the environment was proposed using the notion of convergence towards a reference value. We demonstrate the gain in performance of this new method applied to a Q-Learning agent.A Aprendizagem por Reforço é uma área da Aprendizagem Automática que se preocupa com a forma como um agente deve tomar acções num ambiente de modo a maximizar a noção de recompensa acumulada. Esta forma de aprendizagem é inspirada na forma como os humanos aprendem e tem levado à criação de diversos algoritmos de aprendizagem por reforço. Estes algoritmos focam a forma de melhorar o comportamento do agente, assumindo uma independência em relação ao meio que os rodeia. O presente trabalho estuda a aplicação de métodos de aprendizagem por reforço na resolução do problema do pêndulo invertido. Neste contexto é estudado a importância da variabilidade do ambiente (factores externos ao agente) na execução de agentes de aprendizagem por reforço utilizando um modelo que tenta obter equilíbrio (estabilidade) através de dinamismo – o sistema Cart-Pole ou pêndulo invertido. Procurou-se melhorar o comportamento dos agentes autónomos alterando a informação passada a estes, mantendo constantes os parâmetros internos dos agentes (ritmo ou taxa de aprendizagem, factores de desconto, ritmo ou taxa de decaimento, etc.), em vez da vertente clássica de afinar os parâmetros internos dos agentes. Estudaram-se as influências nas alterações no conjunto de estados e no conjunto de acções na capacidade de um agente de resolver o problema do pêndulo invertido. Estudou-se o comportamento típico dos agentes de aprendizagem por reforço aplicado ao modelo clássico BOXES, sendo proposto uma nova forma de caracterizar o ambiente utilizando a noção de convergência para um valor de referência. Demonstrou-se o ganho em desempenho deste novo método aplicado a um agente Q-Learning.Morgado, Luís Filipe GraçaRCIPLFaustino, Paulo Fernando Pinho2012-02-24T14:27:36Z2011-092011-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/pdfhttp://hdl.handle.net/10400.21/1144engFaustino, Paulo Fernando Pinho - Dynamic equilibrium through reinforcement learning. Lisboa: Instituto Superior de Engenharia de Lisboa, 2011. Dissertação de mestrado.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T09:38:46Zoai:repositorio.ipl.pt:10400.21/1144Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:11:15.797153Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Dynamic equilibrium through reinforcement learning |
title |
Dynamic equilibrium through reinforcement learning |
spellingShingle |
Dynamic equilibrium through reinforcement learning Faustino, Paulo Fernando Pinho Dynamic equilibrium Equilíbrio dinâmico Reinforcement learning Aprendizagem por reforço Autonomous agents Agentes autónomos Inverted pendulum Pêndulo invertido |
title_short |
Dynamic equilibrium through reinforcement learning |
title_full |
Dynamic equilibrium through reinforcement learning |
title_fullStr |
Dynamic equilibrium through reinforcement learning |
title_full_unstemmed |
Dynamic equilibrium through reinforcement learning |
title_sort |
Dynamic equilibrium through reinforcement learning |
author |
Faustino, Paulo Fernando Pinho |
author_facet |
Faustino, Paulo Fernando Pinho |
author_role |
author |
dc.contributor.none.fl_str_mv |
Morgado, Luís Filipe Graça RCIPL |
dc.contributor.author.fl_str_mv |
Faustino, Paulo Fernando Pinho |
dc.subject.por.fl_str_mv |
Dynamic equilibrium Equilíbrio dinâmico Reinforcement learning Aprendizagem por reforço Autonomous agents Agentes autónomos Inverted pendulum Pêndulo invertido |
topic |
Dynamic equilibrium Equilíbrio dinâmico Reinforcement learning Aprendizagem por reforço Autonomous agents Agentes autónomos Inverted pendulum Pêndulo invertido |
description |
Reinforcement Learning is an area of Machine Learning that deals with how an agent should take actions in an environment such as to maximize the notion of accumulated reward. This type of learning is inspired by the way humans learn and has led to the creation of various algorithms for reinforcement learning. These algorithms focus on the way in which an agent’s behaviour can be improved, assuming independence as to their surroundings. The current work studies the application of reinforcement learning methods to solve the inverted pendulum problem. The importance of the variability of the environment (factors that are external to the agent) on the execution of reinforcement learning agents is studied by using a model that seeks to obtain equilibrium (stability) through dynamism – a Cart-Pole system or inverted pendulum. We sought to improve the behaviour of the autonomous agents by changing the information passed to them, while maintaining the agent’s internal parameters constant (learning rate, discount factors, decay rate, etc.), instead of the classical approach of tuning the agent’s internal parameters. The influence of changes on the state set and the action set on an agent’s capability to solve the Cart-pole problem was studied. We have studied typical behaviour of reinforcement learning agents applied to the classic BOXES model and a new form of characterizing the environment was proposed using the notion of convergence towards a reference value. We demonstrate the gain in performance of this new method applied to a Q-Learning agent. |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011-09 2011-09-01T00:00:00Z 2012-02-24T14:27:36Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/1144 |
url |
http://hdl.handle.net/10400.21/1144 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Faustino, Paulo Fernando Pinho - Dynamic equilibrium through reinforcement learning. Lisboa: Instituto Superior de Engenharia de Lisboa, 2011. Dissertação de mestrado. |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133364450492416 |