Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
Main Author: | |
---|---|
Publication Date: | 2021 |
Format: | Master thesis |
Language: | eng |
Source: | Biblioteca Digital de Teses e Dissertações da UFRGS |
Download full: | http://hdl.handle.net/10183/231836 |
Summary: | Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed. |
id |
URGS_9f3bd8543ee328327e1cfbba8be22578 |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/231836 |
network_acronym_str |
URGS |
network_name_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
repository_id_str |
1853 |
spelling |
Anquise, Candy Alexandra HuancaBazzan, Ana Lucia Cetertich2021-11-17T04:24:22Z2021http://hdl.handle.net/10183/231836001133526Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.application/pdfengSistemas multiagentesAprendizagemMulti-objectiveDecision-makingMulti-objective route choiceReinforcement learningMulti-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarityinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2021mestradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001133526.pdf.txt001133526.pdf.txtExtracted Texttext/plain101462http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txte315b3dcd3ed4e087c015975acb736bdMD52ORIGINAL001133526.pdfTexto completo (inglês)application/pdf1569013http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdfc47b2d66b3e7f3aa50fe776779b9e99fMD5110183/2318362021-11-20 05:41:42.464006oai:www.lume.ufrgs.br:10183/231836Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br||lume@ufrgs.bropendoar:18532021-11-20T07:41:42Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
title |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
spellingShingle |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity Anquise, Candy Alexandra Huanca Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
title_short |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
title_full |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
title_fullStr |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
title_full_unstemmed |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
title_sort |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
author |
Anquise, Candy Alexandra Huanca |
author_facet |
Anquise, Candy Alexandra Huanca |
author_role |
author |
dc.contributor.author.fl_str_mv |
Anquise, Candy Alexandra Huanca |
dc.contributor.advisor1.fl_str_mv |
Bazzan, Ana Lucia Cetertich |
contributor_str_mv |
Bazzan, Ana Lucia Cetertich |
dc.subject.por.fl_str_mv |
Sistemas multiagentes Aprendizagem |
topic |
Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
dc.subject.eng.fl_str_mv |
Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
description |
Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed. |
publishDate |
2021 |
dc.date.accessioned.fl_str_mv |
2021-11-17T04:24:22Z |
dc.date.issued.fl_str_mv |
2021 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/231836 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001133526 |
url |
http://hdl.handle.net/10183/231836 |
identifier_str_mv |
001133526 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
collection |
Biblioteca Digital de Teses e Dissertações da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdf |
bitstream.checksum.fl_str_mv |
e315b3dcd3ed4e087c015975acb736bd c47b2d66b3e7f3aa50fe776779b9e99f |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
lume@ufrgs.br||lume@ufrgs.br |
_version_ |
1797064743340474368 |