Experience generalization for multi-agent reinforcement learning

Pegoraro, Renê [UNESP]; Costa, AHR; Ribeiro, CHC

Experience generalization for multi-agent reinforcement learning

Detalhes bibliográficos
Autor(a) principal:	Pegoraro, Renê [UNESP]
Data de Publicação:	2001
Outros Autores:	Costa, AHR, Ribeiro, CHC
Tipo de documento:	Artigo de conferência
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1109/SCCC.2001.972652 http://hdl.handle.net/11449/8273
Resumo:	On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.

Metadados do item

id	UNSP_ec46785911510ff171df09a9cb1ec3e8
oai_identifier_str	oai:repositorio.unesp.br:11449/8273
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Experience generalization for multi-agent reinforcement learningOn-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.Univ Estadual Paulista, Dept Computacao, BR-17033360 Bauru, SP, BrazilUniv Estadual Paulista, Dept Computacao, BR-17033360 Bauru, SP, BrazilInstitute of Electrical and Electronics Engineers (IEEE), Computer SocUniversidade Estadual Paulista (Unesp)Pegoraro, Renê [UNESP]Costa, AHRRibeiro, CHC2014-05-20T13:25:56Z2014-05-20T13:25:56Z2001-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject233-239http://dx.doi.org/10.1109/SCCC.2001.972652Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001.http://hdl.handle.net/11449/827310.1109/SCCC.2001.972652WOS:00017267450002771141742037052510000-0003-0314-8660Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengSccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedingsinfo:eu-repo/semantics/openAccess2024-04-23T16:11:12Zoai:repositorio.unesp.br:11449/8273Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T14:21:17.532845Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Experience generalization for multi-agent reinforcement learning
title	Experience generalization for multi-agent reinforcement learning
spellingShingle	Experience generalization for multi-agent reinforcement learning Pegoraro, Renê [UNESP]
title_short	Experience generalization for multi-agent reinforcement learning
title_full	Experience generalization for multi-agent reinforcement learning
title_fullStr	Experience generalization for multi-agent reinforcement learning
title_full_unstemmed	Experience generalization for multi-agent reinforcement learning
title_sort	Experience generalization for multi-agent reinforcement learning
author	Pegoraro, Renê [UNESP]
author_facet	Pegoraro, Renê [UNESP] Costa, AHR Ribeiro, CHC
author_role	author
author2	Costa, AHR Ribeiro, CHC
author2_role	author author
dc.contributor.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	Pegoraro, Renê [UNESP] Costa, AHR Ribeiro, CHC
description	On-line learning methods have been applied successfully in multi-agent systems to achieve coordination among agents. Learning in multi-agent systems implies in a non-stationary scenario perceived by the agents, since the behavior of other agents may change as they simultaneously learn how to improve their actions. Non-stationary scenarios can be modeled as Markov Games, which can be solved using the Minimax-Q algorithm a combination of Q-learning (a Reinforcement Learning (RL) algorithm which directly learns an optimal control policy) and the Minimax algorithm. However, finding optimal control policies using any RL algorithm (Q-learning and Minimax-Q included) can be very time consuming. Trying to improve the learning time of Q-learning, we considered the QS-algorithm. in which a single experience can update more than a single action value by using a spreading function. In this paper, we contribute a Minimax-QS algorithm which combines the Minimax-Q algorithm and the QS-algorithm. We conduct a series of empirical evaluation of the algorithm in a simplified simulator of the soccer domain. We show that even using a very simple domain-dependent spreading function, the performance of the learning algorithm can be improved.
publishDate	2001
dc.date.none.fl_str_mv	2001-01-01 2014-05-20T13:25:56Z 2014-05-20T13:25:56Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/conferenceObject
format	conferenceObject
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1109/SCCC.2001.972652 Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001. http://hdl.handle.net/11449/8273 10.1109/SCCC.2001.972652 WOS:000172674500027 7114174203705251 0000-0003-0314-8660
url	http://dx.doi.org/10.1109/SCCC.2001.972652 http://hdl.handle.net/11449/8273
identifier_str_mv	Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings. Los Alamitos: IEEE Computer Soc, p. 233-239, 2001. 10.1109/SCCC.2001.972652 WOS:000172674500027 7114174203705251 0000-0003-0314-8660
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Sccc 2001: Xxi International Conference of the Chilean Computer Science Society, Proceedings
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	233-239
dc.publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers (IEEE), Computer Soc
publisher.none.fl_str_mv	Institute of Electrical and Electronics Engineers (IEEE), Computer Soc
dc.source.none.fl_str_mv	Web of Science reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808128350862966784

Experience generalization for multi-agent reinforcement learning

Registros relacionados