Learning from Demonstration using Hierarchical Inverse Reinforcement Learning
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/137320 |
Resumo: | With the joint evolution of Industry and Robotics, manufacturing systems are becoming more complex, resilient, and safer. At the same time, Industry 4.0 answers the requirements to adapt to society demands as fast, seamless, and flexible as possible. Despite bringing intelligence methodologies to the factory floor, the contemporary robotic engineering techniques fail to be flexible or resilient when faced with new configurations of new products or different production parameters. Also, currently used methodologies rarely allow to transfer skills from other tasks. All combined, these factors make the development and updating of robotic systems a cumbersome task that requires extensive resources. To address these limitations, this thesis explores how the use of Learning from Demonstration can contribute the improve robotic engineering. The first goal was to create a manufacturing simulation scenario that could easily transfer to real situations while reacting to Reinforcement Learning Techniques. The second objective was to study how to discretise a complex task. The last goal assessed the impact of reusing pre-trained models in different tasks. The methodology used the robo-gym framework, that connects the OpenAI Gym with the Gazebo physics engine, to create a modified pick and place task, where an object had to be fitted in a goal pose. The training involved expert demonstrations as part of the Learning from Demonstration scope. The algorithm employed was the Generative Adversarial Imitation Learning, which shares both Reinforcement Learning and Inverse Reinforcement Learning characteristics. The first key finding was that task discretisation could be achieved with reward function modelling. It can be done with a default smooth gradient error and positive rewards associated with sub-tasks completion. The positive rewards must be sequentially higher as well as the increments between them. This discretisation approach simplifies the complexity associated with the tasks and boosts performance compared to the sequential modelling approach. Secondly, we have proved that retraining models can be sometimes advantageous even when new skills are not required, or when the trade-off between adaptation and exploration is positive. In that case, the learning curve is more stable. This proposal gathers guidelines to flexibilise and simplify engineering associated with a manufacturing task based on a reward function. The work developed in this thesis resulted in a paper already submitted to ICAR and in a second paper under preparation. |
id |
RCAP_1cdafa12548518f20e5f72941df84421 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/137320 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Learning from Demonstration using Hierarchical Inverse Reinforcement LearningEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringWith the joint evolution of Industry and Robotics, manufacturing systems are becoming more complex, resilient, and safer. At the same time, Industry 4.0 answers the requirements to adapt to society demands as fast, seamless, and flexible as possible. Despite bringing intelligence methodologies to the factory floor, the contemporary robotic engineering techniques fail to be flexible or resilient when faced with new configurations of new products or different production parameters. Also, currently used methodologies rarely allow to transfer skills from other tasks. All combined, these factors make the development and updating of robotic systems a cumbersome task that requires extensive resources. To address these limitations, this thesis explores how the use of Learning from Demonstration can contribute the improve robotic engineering. The first goal was to create a manufacturing simulation scenario that could easily transfer to real situations while reacting to Reinforcement Learning Techniques. The second objective was to study how to discretise a complex task. The last goal assessed the impact of reusing pre-trained models in different tasks. The methodology used the robo-gym framework, that connects the OpenAI Gym with the Gazebo physics engine, to create a modified pick and place task, where an object had to be fitted in a goal pose. The training involved expert demonstrations as part of the Learning from Demonstration scope. The algorithm employed was the Generative Adversarial Imitation Learning, which shares both Reinforcement Learning and Inverse Reinforcement Learning characteristics. The first key finding was that task discretisation could be achieved with reward function modelling. It can be done with a default smooth gradient error and positive rewards associated with sub-tasks completion. The positive rewards must be sequentially higher as well as the increments between them. This discretisation approach simplifies the complexity associated with the tasks and boosts performance compared to the sequential modelling approach. Secondly, we have proved that retraining models can be sometimes advantageous even when new skills are not required, or when the trade-off between adaptation and exploration is positive. In that case, the learning curve is more stable. This proposal gathers guidelines to flexibilise and simplify engineering associated with a manufacturing task based on a reward function. The work developed in this thesis resulted in a paper already submitted to ICAR and in a second paper under preparation.2021-10-142021-10-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/137320TID:202822311engLeonor Baptista da Costa Silva Santosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:48:32Zoai:repositorio-aberto.up.pt:10216/137320Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:48:10.127382Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
title |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
spellingShingle |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning Leonor Baptista da Costa Silva Santos Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
title_full |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
title_fullStr |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
title_full_unstemmed |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
title_sort |
Learning from Demonstration using Hierarchical Inverse Reinforcement Learning |
author |
Leonor Baptista da Costa Silva Santos |
author_facet |
Leonor Baptista da Costa Silva Santos |
author_role |
author |
dc.contributor.author.fl_str_mv |
Leonor Baptista da Costa Silva Santos |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
With the joint evolution of Industry and Robotics, manufacturing systems are becoming more complex, resilient, and safer. At the same time, Industry 4.0 answers the requirements to adapt to society demands as fast, seamless, and flexible as possible. Despite bringing intelligence methodologies to the factory floor, the contemporary robotic engineering techniques fail to be flexible or resilient when faced with new configurations of new products or different production parameters. Also, currently used methodologies rarely allow to transfer skills from other tasks. All combined, these factors make the development and updating of robotic systems a cumbersome task that requires extensive resources. To address these limitations, this thesis explores how the use of Learning from Demonstration can contribute the improve robotic engineering. The first goal was to create a manufacturing simulation scenario that could easily transfer to real situations while reacting to Reinforcement Learning Techniques. The second objective was to study how to discretise a complex task. The last goal assessed the impact of reusing pre-trained models in different tasks. The methodology used the robo-gym framework, that connects the OpenAI Gym with the Gazebo physics engine, to create a modified pick and place task, where an object had to be fitted in a goal pose. The training involved expert demonstrations as part of the Learning from Demonstration scope. The algorithm employed was the Generative Adversarial Imitation Learning, which shares both Reinforcement Learning and Inverse Reinforcement Learning characteristics. The first key finding was that task discretisation could be achieved with reward function modelling. It can be done with a default smooth gradient error and positive rewards associated with sub-tasks completion. The positive rewards must be sequentially higher as well as the increments between them. This discretisation approach simplifies the complexity associated with the tasks and boosts performance compared to the sequential modelling approach. Secondly, we have proved that retraining models can be sometimes advantageous even when new skills are not required, or when the trade-off between adaptation and exploration is positive. In that case, the learning curve is more stable. This proposal gathers guidelines to flexibilise and simplify engineering associated with a manufacturing task based on a reward function. The work developed in this thesis resulted in a paper already submitted to ICAR and in a second paper under preparation. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-10-14 2021-10-14T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/137320 TID:202822311 |
url |
https://hdl.handle.net/10216/137320 |
identifier_str_mv |
TID:202822311 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799135800168808448 |