Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.14/43565 |
Resumo: | Deep reinforcement learning (DRL) has proven to be an effective, general-purpose technology to develop ‘good’ replenishment policies in inventory management. We show how transfer learning from existing, well-performing heuristics may stabilize the training process and improve the performance of DRL in inventory control. While the idea is general, we specifically implement potential-based reward shaping to a deep Q-network algorithm to manage inventory of perishable goods that, cursed by dimensionality, has proven to be notoriously complex. The application of our approach may not only improve inventory cost performance and reduce computational effort, the increased training stability may also help to gain trust in the policies obtained by black box DRL algorithms. |
id |
RCAP_5c04762c3d33c9a4fb1510e7c132a829 |
---|---|
oai_identifier_str |
oai:repositorio.ucp.pt:10400.14/43565 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory managementDeep reinforcement learningInventoryPerishable inventory managementReward shapingTransfer learningDeep reinforcement learning (DRL) has proven to be an effective, general-purpose technology to develop ‘good’ replenishment policies in inventory management. We show how transfer learning from existing, well-performing heuristics may stabilize the training process and improve the performance of DRL in inventory control. While the idea is general, we specifically implement potential-based reward shaping to a deep Q-network algorithm to manage inventory of perishable goods that, cursed by dimensionality, has proven to be notoriously complex. The application of our approach may not only improve inventory cost performance and reduce computational effort, the increased training stability may also help to gain trust in the policies obtained by black box DRL algorithms.Veritati - Repositório Institucional da Universidade Católica PortuguesaMoor, Bram J. deGijsbrechts, JorenBoute, Robert N.2022-09-012024-09-01T00:00:00Z2022-09-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.14/43565eng0377-221710.1016/j.ejor.2021.10.04585119188665000793723100010info:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-16T01:46:22Zoai:repositorio.ucp.pt:10400.14/43565Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:44:40.068345Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
title |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
spellingShingle |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management Moor, Bram J. de Deep reinforcement learning Inventory Perishable inventory management Reward shaping Transfer learning |
title_short |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
title_full |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
title_fullStr |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
title_full_unstemmed |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
title_sort |
Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management |
author |
Moor, Bram J. de |
author_facet |
Moor, Bram J. de Gijsbrechts, Joren Boute, Robert N. |
author_role |
author |
author2 |
Gijsbrechts, Joren Boute, Robert N. |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Veritati - Repositório Institucional da Universidade Católica Portuguesa |
dc.contributor.author.fl_str_mv |
Moor, Bram J. de Gijsbrechts, Joren Boute, Robert N. |
dc.subject.por.fl_str_mv |
Deep reinforcement learning Inventory Perishable inventory management Reward shaping Transfer learning |
topic |
Deep reinforcement learning Inventory Perishable inventory management Reward shaping Transfer learning |
description |
Deep reinforcement learning (DRL) has proven to be an effective, general-purpose technology to develop ‘good’ replenishment policies in inventory management. We show how transfer learning from existing, well-performing heuristics may stabilize the training process and improve the performance of DRL in inventory control. While the idea is general, we specifically implement potential-based reward shaping to a deep Q-network algorithm to manage inventory of perishable goods that, cursed by dimensionality, has proven to be notoriously complex. The application of our approach may not only improve inventory cost performance and reduce computational effort, the increased training stability may also help to gain trust in the policies obtained by black box DRL algorithms. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-09-01 2022-09-01T00:00:00Z 2024-09-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.14/43565 |
url |
http://hdl.handle.net/10400.14/43565 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0377-2217 10.1016/j.ejor.2021.10.045 85119188665 000793723100010 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
eu_rights_str_mv |
embargoedAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136942300856320 |