NXT Mindstorms e aprendizagem por reforço

Coelho, João Paulo Carracha

NXT Mindstorms e aprendizagem por reforço

Detalhes bibliográficos
Autor(a) principal:	Coelho, João Paulo Carracha
Data de Publicação:	2011
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10174/12283
Resumo:	A aprendizagem por reforço é uma aprendizagem por tentativa e erro, onde o agente, através da interação com o ambiente, aprende a realizar uma tarefa com base em recompensas positivas e negativas. Esta dissertação pretende analisar o comportamento de um robô implementado com um algoritmo de aprendizagem por reforço cujo objetivo consiste em seguir um percurso. Para tal, foi utilizado o robô educacional criado pela Lego, o NXT Mindstorms, implementado com um algoritmo de aprendizagem por reforço, o Q-learning, utilizando os métodos de pesquisa Softmax e -greedy. Para programar o robô utilizou-se a linguagem de programação lejOS NXJ. Realizaram-se várias experiências com o objetivo de determinar a influência das variáveis do algoritmo Q-learning (taxa de aprendizagem e fator de desconto) e dos métodos de pesquisa Softmax (temperatura) e -Greedy (taxa de exploração), dos valores da função de recompensa e da utilização de vários percursos. Concluiu-se, através das experiências realizadas, que um robô implementado com um algoritmo de aprendizagem por reforço consegue aprender a tarefa em poucas iterações (inferior a 100 iterações). Também é possivel concluir, através da experiência para determinar a influência das variáveis do algoritmo e dos métodos de pesquisa, que o robô apresenta melhores resultados quando o valor da taxa de exploração diminui. Conclui-se que, para todas as experiências realizadas neste trabalho o método de pesquisa Softmax apresenta melhores resultados em relação ao -Greedy e o robô tem melhor desempenho no percurso em linha recta. Como trabalho futuro pretende-se implementar outro algoritmo de aprendizagem por reforço , como o State-Action-Reward-State-Action (SARSA) e outros métodos de pesquisa com o objetivo de determinar qual o mais indicado para esta tarefa. Pretende-se também construir um robô para outras tarefas como seguir uma fonte de luz ou som; ### Abstract Reinforcement learning is learning by trial and error, where the agent through interaction with the environment, learn a task based on positive and negative rewards. This master's thesis aims to analyze the behavior of a robot implemented with a reinforcement learning algorithm whose goal is to follow a route. For this purpose was used the educational robot created by Lego, the NXT Mindstorms, implemented with a reinforcement learning algorithm, the Q-learning, using the research methods, Softmax and -Greedy, implemented with the programming language lejOS NXJ. Were performed several experiments in order to determine the in uence of the variables of Q-learnig algorithm (learning rate and discount factor) and research methods Softmax (temperature) e -Greedy (exploration rate), the values of reward function and the uses of several routes. It was concluded, through the experiments, that a robot implemented with a reinforcement learning algorithm can learn the task in a few iterations (less than 100 iterations). It is also possible to conclude, through the experiment to determine the in uence of the variables of the algorithm and of the methods of research, that the robot produces better results when the value of exploration rate decreases. It also conclude that for all experiments in this work the research method Softmax produces better results compared to the -Greedy and the robot has better performance following a straight line. As future work we intend to implement another reinforcement learning algorithm, such as SARSA and other research methods in order to determine the best to this task. The aim is also build a robot to perform other tasks such as following a light or sound source.

Metadados do item

id	RCAP_e8951a6b61e5e3ba7103840a044c6fc4
oai_identifier_str	oai:dspace.uevora.pt:10174/12283
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	NXT Mindstorms e aprendizagem por reforçoA aprendizagem por reforço é uma aprendizagem por tentativa e erro, onde o agente, através da interação com o ambiente, aprende a realizar uma tarefa com base em recompensas positivas e negativas. Esta dissertação pretende analisar o comportamento de um robô implementado com um algoritmo de aprendizagem por reforço cujo objetivo consiste em seguir um percurso. Para tal, foi utilizado o robô educacional criado pela Lego, o NXT Mindstorms, implementado com um algoritmo de aprendizagem por reforço, o Q-learning, utilizando os métodos de pesquisa Softmax e -greedy. Para programar o robô utilizou-se a linguagem de programação lejOS NXJ. Realizaram-se várias experiências com o objetivo de determinar a influência das variáveis do algoritmo Q-learning (taxa de aprendizagem e fator de desconto) e dos métodos de pesquisa Softmax (temperatura) e -Greedy (taxa de exploração), dos valores da função de recompensa e da utilização de vários percursos. Concluiu-se, através das experiências realizadas, que um robô implementado com um algoritmo de aprendizagem por reforço consegue aprender a tarefa em poucas iterações (inferior a 100 iterações). Também é possivel concluir, através da experiência para determinar a influência das variáveis do algoritmo e dos métodos de pesquisa, que o robô apresenta melhores resultados quando o valor da taxa de exploração diminui. Conclui-se que, para todas as experiências realizadas neste trabalho o método de pesquisa Softmax apresenta melhores resultados em relação ao -Greedy e o robô tem melhor desempenho no percurso em linha recta. Como trabalho futuro pretende-se implementar outro algoritmo de aprendizagem por reforço , como o State-Action-Reward-State-Action (SARSA) e outros métodos de pesquisa com o objetivo de determinar qual o mais indicado para esta tarefa. Pretende-se também construir um robô para outras tarefas como seguir uma fonte de luz ou som; ### Abstract Reinforcement learning is learning by trial and error, where the agent through interaction with the environment, learn a task based on positive and negative rewards. This master's thesis aims to analyze the behavior of a robot implemented with a reinforcement learning algorithm whose goal is to follow a route. For this purpose was used the educational robot created by Lego, the NXT Mindstorms, implemented with a reinforcement learning algorithm, the Q-learning, using the research methods, Softmax and -Greedy, implemented with the programming language lejOS NXJ. Were performed several experiments in order to determine the in uence of the variables of Q-learnig algorithm (learning rate and discount factor) and research methods Softmax (temperature) e -Greedy (exploration rate), the values of reward function and the uses of several routes. It was concluded, through the experiments, that a robot implemented with a reinforcement learning algorithm can learn the task in a few iterations (less than 100 iterations). It is also possible to conclude, through the experiment to determine the in uence of the variables of the algorithm and of the methods of research, that the robot produces better results when the value of exploration rate decreases. It also conclude that for all experiments in this work the research method Softmax produces better results compared to the -Greedy and the robot has better performance following a straight line. As future work we intend to implement another reinforcement learning algorithm, such as SARSA and other research methods in order to determine the best to this task. The aim is also build a robot to perform other tasks such as following a light or sound source.Universidade de Évora2015-01-06T16:20:30Z2015-01-062011-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/10174/12283http://hdl.handle.net/10174/12283porDepartamento de Informáticateses@bib.uevora.pt283Coelho, João Paulo Carrachainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T18:55:43Zoai:dspace.uevora.pt:10174/12283Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:05:25.790783Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	NXT Mindstorms e aprendizagem por reforço
title	NXT Mindstorms e aprendizagem por reforço
spellingShingle	NXT Mindstorms e aprendizagem por reforço Coelho, João Paulo Carracha
title_short	NXT Mindstorms e aprendizagem por reforço
title_full	NXT Mindstorms e aprendizagem por reforço
title_fullStr	NXT Mindstorms e aprendizagem por reforço
title_full_unstemmed	NXT Mindstorms e aprendizagem por reforço
title_sort	NXT Mindstorms e aprendizagem por reforço
author	Coelho, João Paulo Carracha
author_facet	Coelho, João Paulo Carracha
author_role	author
dc.contributor.author.fl_str_mv	Coelho, João Paulo Carracha
description	A aprendizagem por reforço é uma aprendizagem por tentativa e erro, onde o agente, através da interação com o ambiente, aprende a realizar uma tarefa com base em recompensas positivas e negativas. Esta dissertação pretende analisar o comportamento de um robô implementado com um algoritmo de aprendizagem por reforço cujo objetivo consiste em seguir um percurso. Para tal, foi utilizado o robô educacional criado pela Lego, o NXT Mindstorms, implementado com um algoritmo de aprendizagem por reforço, o Q-learning, utilizando os métodos de pesquisa Softmax e -greedy. Para programar o robô utilizou-se a linguagem de programação lejOS NXJ. Realizaram-se várias experiências com o objetivo de determinar a influência das variáveis do algoritmo Q-learning (taxa de aprendizagem e fator de desconto) e dos métodos de pesquisa Softmax (temperatura) e -Greedy (taxa de exploração), dos valores da função de recompensa e da utilização de vários percursos. Concluiu-se, através das experiências realizadas, que um robô implementado com um algoritmo de aprendizagem por reforço consegue aprender a tarefa em poucas iterações (inferior a 100 iterações). Também é possivel concluir, através da experiência para determinar a influência das variáveis do algoritmo e dos métodos de pesquisa, que o robô apresenta melhores resultados quando o valor da taxa de exploração diminui. Conclui-se que, para todas as experiências realizadas neste trabalho o método de pesquisa Softmax apresenta melhores resultados em relação ao -Greedy e o robô tem melhor desempenho no percurso em linha recta. Como trabalho futuro pretende-se implementar outro algoritmo de aprendizagem por reforço , como o State-Action-Reward-State-Action (SARSA) e outros métodos de pesquisa com o objetivo de determinar qual o mais indicado para esta tarefa. Pretende-se também construir um robô para outras tarefas como seguir uma fonte de luz ou som; ### Abstract Reinforcement learning is learning by trial and error, where the agent through interaction with the environment, learn a task based on positive and negative rewards. This master's thesis aims to analyze the behavior of a robot implemented with a reinforcement learning algorithm whose goal is to follow a route. For this purpose was used the educational robot created by Lego, the NXT Mindstorms, implemented with a reinforcement learning algorithm, the Q-learning, using the research methods, Softmax and -Greedy, implemented with the programming language lejOS NXJ. Were performed several experiments in order to determine the in uence of the variables of Q-learnig algorithm (learning rate and discount factor) and research methods Softmax (temperature) e -Greedy (exploration rate), the values of reward function and the uses of several routes. It was concluded, through the experiments, that a robot implemented with a reinforcement learning algorithm can learn the task in a few iterations (less than 100 iterations). It is also possible to conclude, through the experiment to determine the in uence of the variables of the algorithm and of the methods of research, that the robot produces better results when the value of exploration rate decreases. It also conclude that for all experiments in this work the research method Softmax produces better results compared to the -Greedy and the robot has better performance following a straight line. As future work we intend to implement another reinforcement learning algorithm, such as SARSA and other research methods in order to determine the best to this task. The aim is also build a robot to perform other tasks such as following a light or sound source.
publishDate	2011
dc.date.none.fl_str_mv	2011-01-01T00:00:00Z 2015-01-06T16:20:30Z 2015-01-06
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10174/12283 http://hdl.handle.net/10174/12283
url	http://hdl.handle.net/10174/12283
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	Departamento de Informática teses@bib.uevora.pt 283
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade de Évora
publisher.none.fl_str_mv	Universidade de Évora
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136539357216768

NXT Mindstorms e aprendizagem por reforço

Registros relacionados