Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity

Anquise, Candy Alexandra Huanca

Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity

Bibliographic Details
Main Author:	Anquise, Candy Alexandra Huanca
Publication Date:	2021
Format:	Master thesis
Language:	eng
Source:	Biblioteca Digital de Teses e Dissertações da UFRGS
Download full:	http://hdl.handle.net/10183/231836
Summary:	Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.

Item metadata

id	URGS_9f3bd8543ee328327e1cfbba8be22578
oai_identifier_str	oai:www.lume.ufrgs.br:10183/231836
network_acronym_str	URGS
network_name_str	Biblioteca Digital de Teses e Dissertações da UFRGS
repository_id_str	1853
spelling	Anquise, Candy Alexandra HuancaBazzan, Ana Lucia Cetertich2021-11-17T04:24:22Z2021http://hdl.handle.net/10183/231836001133526Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.application/pdfengSistemas multiagentesAprendizagemMulti-objectiveDecision-makingMulti-objective route choiceReinforcement learningMulti-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarityinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2021mestradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001133526.pdf.txt001133526.pdf.txtExtracted Texttext/plain101462http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txte315b3dcd3ed4e087c015975acb736bdMD52ORIGINAL001133526.pdfTexto completo (inglês)application/pdf1569013http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdfc47b2d66b3e7f3aa50fe776779b9e99fMD5110183/2318362021-11-20 05:41:42.464006oai:www.lume.ufrgs.br:10183/231836Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br\|\|lume@ufrgs.bropendoar:18532021-11-20T07:41:42Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
title	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
spellingShingle	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity Anquise, Candy Alexandra Huanca Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning
title_short	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
title_full	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
title_fullStr	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
title_full_unstemmed	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
title_sort	Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
author	Anquise, Candy Alexandra Huanca
author_facet	Anquise, Candy Alexandra Huanca
author_role	author
dc.contributor.author.fl_str_mv	Anquise, Candy Alexandra Huanca
dc.contributor.advisor1.fl_str_mv	Bazzan, Ana Lucia Cetertich
contributor_str_mv	Bazzan, Ana Lucia Cetertich
dc.subject.por.fl_str_mv	Sistemas multiagentes Aprendizagem
topic	Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning
dc.subject.eng.fl_str_mv	Multi-objective Decision-making Multi-objective route choice Reinforcement learning
description	Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.
publishDate	2021
dc.date.accessioned.fl_str_mv	2021-11-17T04:24:22Z
dc.date.issued.fl_str_mv	2021
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/231836
dc.identifier.nrb.pt_BR.fl_str_mv	001133526
url	http://hdl.handle.net/10183/231836
identifier_str_mv	001133526
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Biblioteca Digital de Teses e Dissertações da UFRGS
collection	Biblioteca Digital de Teses e Dissertações da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdf
bitstream.checksum.fl_str_mv	e315b3dcd3ed4e087c015975acb736bd c47b2d66b3e7f3aa50fe776779b9e99f
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv	lume@ufrgs.br\|\|lume@ufrgs.br
_version_	1797064743340474368

Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity

Similar Items