Experience generalization for multi-agent reinforcement learning

Detalhes bibliográficos
Autor(a) principal: Pegoraro, Renê [UNESP]
Data de Publicação: 2001
Outros Autores: Costa, AHR, Ribeiro, CHC
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1109/SCCC.2001.972652
http://hdl.handle.net/11449/8273
Resumo: On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.
id UNSP_ec46785911510ff171df09a9cb1ec3e8
oai_identifier_str oai:repositorio.unesp.br:11449/8273
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Experience generalization for multi-agent reinforcement learningOn-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.Univ Estadual Paulista, Dept Computacao, BR-17033360 Bauru, SP, BrazilUniv Estadual Paulista, Dept Computacao, BR-17033360 Bauru, SP, BrazilInstitute of Electrical and Electronics Engineers (IEEE), Computer SocUniversidade Estadual Paulista (Unesp)Pegoraro, Renê [UNESP]Costa, AHRRibeiro, CHC2014-05-20T13:25:56Z2014-05-20T13:25:56Z2001-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject233-239http://dx.doi.org/10.1109/SCCC.2001.972652Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001.http://hdl.handle.net/11449/827310.1109/SCCC.2001.972652WOS:00017267450002771141742037052510000-0003-0314-8660Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengSccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedingsinfo:eu-repo/semantics/openAccess2024-04-23T16:11:12Zoai:repositorio.unesp.br:11449/8273Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-05-23T11:50:34.623758Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Experience generalization for multi-agent reinforcement learning
title Experience generalization for multi-agent reinforcement learning
spellingShingle Experience generalization for multi-agent reinforcement learning
Pegoraro, Renê [UNESP]
title_short Experience generalization for multi-agent reinforcement learning
title_full Experience generalization for multi-agent reinforcement learning
title_fullStr Experience generalization for multi-agent reinforcement learning
title_full_unstemmed Experience generalization for multi-agent reinforcement learning
title_sort Experience generalization for multi-agent reinforcement learning
author Pegoraro, Renê [UNESP]
author_facet Pegoraro, Renê [UNESP]
Costa, AHR
Ribeiro, CHC
author_role author
author2 Costa, AHR
Ribeiro, CHC
author2_role author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Pegoraro, Renê [UNESP]
Costa, AHR
Ribeiro, CHC
description On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.
publishDate 2001
dc.date.none.fl_str_mv 2001-01-01
2014-05-20T13:25:56Z
2014-05-20T13:25:56Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1109/SCCC.2001.972652
Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001.
http://hdl.handle.net/11449/8273
10.1109/SCCC.2001.972652
WOS:000172674500027
7114174203705251
0000-0003-0314-8660
url http://dx.doi.org/10.1109/SCCC.2001.972652
http://hdl.handle.net/11449/8273
identifier_str_mv Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001.
10.1109/SCCC.2001.972652
WOS:000172674500027
7114174203705251
0000-0003-0314-8660
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 233-239
dc.publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers (IEEE), Computer Soc
publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers (IEEE), Computer Soc
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1803045892919394304