Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis

Eduardo Lopes Pereira Neto

Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis

Detalhes bibliográficos
Autor(a) principal:	Eduardo Lopes Pereira Neto
Data de Publicação:	2023
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://doi.org/10.11606/D.45.2023.tde-06122023-173644
Resumo:	Reinforcement Learning has proven to be highly successful in addressing sequential decision problems in complex environments, with a focus on maximizing the expected accumulated reward. Although Reinforcement Learning has shown its value, real-world scenarios often involve inherent risks that go beyond expected outcomes where, sometimes, in the same situation different agents could consider taking different levels of risk. In such cases, Risk-Sensitive Reinforcement Learning emerges as a solution, incorporating risk criteria into the decision-making process. Among these criteria, exponential-based methods have been extensively studied and applied. However, the response of exponential criteria when integrated with learning parameters and approximations, particularly in combination with Deep Reinforcement Learning, remains relatively unexplored. This lack of knowledge can directly impact the practical applicability of these methods in real-world scenarios. In this dissertation, we present a comprehensive framework that facilitates the comparison of exponential risk criteria, such as Exponential Expected Utility, Exponential Temporal Difference Transformation, and Soft Indicator Temporal Difference Transformation with Reinforcement Learning algorithms such as Q-Learning and Deep Q-Learning. We formally demonstrate that Exponential Expected Utility and Exponential Temporal Difference Transformation converge to the same value. We also perform experiments to explore the relationship of each exponential risk criterion with the learning rate parameter, the risk factor, and sampling algorithms. The results reveal that Exponential Expected Utility exhibits superior stability. Additionally, this dissertation empirically analyzes numerical overflow issues. A truncation technique to handle this issue is analyzed. Furthermore, we propose the application of the LogSumExp technique to mitigate this problem in algorithms that use Exponential Expected Utility.

Metadados do item

id	USP_43fe02802f8acff009dc0c7402a6cd88
oai_identifier_str	oai:teses.usp.br:tde-06122023-173644
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis Sensibilidade ao risco com funções exponenciais em aprendizado por reforço: uma análise empírica 2023-10-05Karina Valdivia DelgadoReinaldo Augusto da Costa BianchiEsther Luna ColombiniKarina Valdivia DelgadoEduardo Lopes Pereira NetoUniversidade de São PauloCiência da ComputaçãoUSPBR Aprendizado por Reforço Exponential Expected Utility Markov Decision Process Processo de Decisão Markovianos Reinforcement Learning Risk Sensitive Sensivel a Risco Utilidade Esperada Exponencial Reinforcement Learning has proven to be highly successful in addressing sequential decision problems in complex environments, with a focus on maximizing the expected accumulated reward. Although Reinforcement Learning has shown its value, real-world scenarios often involve inherent risks that go beyond expected outcomes where, sometimes, in the same situation different agents could consider taking different levels of risk. In such cases, Risk-Sensitive Reinforcement Learning emerges as a solution, incorporating risk criteria into the decision-making process. Among these criteria, exponential-based methods have been extensively studied and applied. However, the response of exponential criteria when integrated with learning parameters and approximations, particularly in combination with Deep Reinforcement Learning, remains relatively unexplored. This lack of knowledge can directly impact the practical applicability of these methods in real-world scenarios. In this dissertation, we present a comprehensive framework that facilitates the comparison of exponential risk criteria, such as Exponential Expected Utility, Exponential Temporal Difference Transformation, and Soft Indicator Temporal Difference Transformation with Reinforcement Learning algorithms such as Q-Learning and Deep Q-Learning. We formally demonstrate that Exponential Expected Utility and Exponential Temporal Difference Transformation converge to the same value. We also perform experiments to explore the relationship of each exponential risk criterion with the learning rate parameter, the risk factor, and sampling algorithms. The results reveal that Exponential Expected Utility exhibits superior stability. Additionally, this dissertation empirically analyzes numerical overflow issues. A truncation technique to handle this issue is analyzed. Furthermore, we propose the application of the LogSumExp technique to mitigate this problem in algorithms that use Exponential Expected Utility. O Aprendizado por Reforço provou ser altamente bem-sucedido na resolução de problemas de decisão sequencial em ambientes complexos, com foco na maximização da recompensa acumulada esperada. Embora Aprendizado por Reforço tenha mostrado seu valor, os cenários do mundo real geralmente envolvem riscos inerentes que vão além dos resultados esperados, onde, na mesma situação, diferentes agentes podem considerar assumir diferentes níveis de risco. Nesses casos, o Aprendizado por Reforço Sensível ao Risco surge como uma solução, incorporando critérios de risco ao processo de tomada de decisão. Dentre esses critérios, métodos baseados em exponencial têm sido extensivamente estudados e aplicados. No entanto, a resposta de critérios exponenciais quando integrados com parâmetros de aprendizagem e aproximações, particularmente em combinação com Aprendizado por Reforço Profundo, permanece relativamente inexplorado. Essa falta de conhecimento pode impactar diretamente na aplicabilidade desses métodos em cenários do mundo real. Nesta dissertação, apresentamos um arcabouço que facilita a comparação de critérios de risco exponencial, como Utilidade Exponencial Esperada, Transformação Exponencial da Diferença Temporal e Transformação da Diferença Temporal com Soft Indicator considerando algoritmos de Aprendizagem por Reforço, como Q-Learning e Deep Q-Learning. Demonstramos formalmente que a Utilidade Esperada Exponencial e a Transformação Exponencial da Diferença Temporal convergem para o mesmo valor. Também realizamos experimentos para explorar a relação de cada critério de risco exponencial com o parâmetro de taxa de aprendizado, o fator de risco e os algoritmos de amostragem. Os resultados revelam que a Utilidade Esperada Exponencial apresenta estabilidade superior. Adicionalmente, esta dissertação analisa empiricamente problemas de estouro numérico. Uma técnica de truncamento para lidar com esse problema é analisada. Além disso, propomos a aplicação da técnica LogSumExp para mitigar este problema em algoritmos que utilizam a Utilidade Esperada Exponencial. https://doi.org/10.11606/D.45.2023.tde-06122023-173644info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USP2024-03-15T13:20:16Zoai:teses.usp.br:tde-06122023-173644Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-01-03T19:54:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.en.fl_str_mv	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
dc.title.alternative.pt.fl_str_mv	Sensibilidade ao risco com funções exponenciais em aprendizado por reforço: uma análise empírica
title	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
spellingShingle	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis Eduardo Lopes Pereira Neto
title_short	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
title_full	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
title_fullStr	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
title_full_unstemmed	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
title_sort	Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis
author	Eduardo Lopes Pereira Neto
author_facet	Eduardo Lopes Pereira Neto
author_role	author
dc.contributor.advisor1.fl_str_mv	Karina Valdivia Delgado
dc.contributor.referee1.fl_str_mv	Reinaldo Augusto da Costa Bianchi
dc.contributor.referee2.fl_str_mv	Esther Luna Colombini
dc.contributor.referee3.fl_str_mv	Karina Valdivia Delgado
dc.contributor.author.fl_str_mv	Eduardo Lopes Pereira Neto
contributor_str_mv	Karina Valdivia Delgado Reinaldo Augusto da Costa Bianchi Esther Luna Colombini Karina Valdivia Delgado
description	Reinforcement Learning has proven to be highly successful in addressing sequential decision problems in complex environments, with a focus on maximizing the expected accumulated reward. Although Reinforcement Learning has shown its value, real-world scenarios often involve inherent risks that go beyond expected outcomes where, sometimes, in the same situation different agents could consider taking different levels of risk. In such cases, Risk-Sensitive Reinforcement Learning emerges as a solution, incorporating risk criteria into the decision-making process. Among these criteria, exponential-based methods have been extensively studied and applied. However, the response of exponential criteria when integrated with learning parameters and approximations, particularly in combination with Deep Reinforcement Learning, remains relatively unexplored. This lack of knowledge can directly impact the practical applicability of these methods in real-world scenarios. In this dissertation, we present a comprehensive framework that facilitates the comparison of exponential risk criteria, such as Exponential Expected Utility, Exponential Temporal Difference Transformation, and Soft Indicator Temporal Difference Transformation with Reinforcement Learning algorithms such as Q-Learning and Deep Q-Learning. We formally demonstrate that Exponential Expected Utility and Exponential Temporal Difference Transformation converge to the same value. We also perform experiments to explore the relationship of each exponential risk criterion with the learning rate parameter, the risk factor, and sampling algorithms. The results reveal that Exponential Expected Utility exhibits superior stability. Additionally, this dissertation empirically analyzes numerical overflow issues. A truncation technique to handle this issue is analyzed. Furthermore, we propose the application of the LogSumExp technique to mitigate this problem in algorithms that use Exponential Expected Utility.
publishDate	2023
dc.date.issued.fl_str_mv	2023-10-05
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.11606/D.45.2023.tde-06122023-173644
url	https://doi.org/10.11606/D.45.2023.tde-06122023-173644
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade de São Paulo
dc.publisher.program.fl_str_mv	Ciência da Computação
dc.publisher.initials.fl_str_mv	USP
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade de São Paulo
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1794502336577536000

Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis

Registros relacionados