Skip to content

Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization for Product Delivery

License

Notifications You must be signed in to change notification settings

dsalgador/master-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solving an Optimization Problem for Product Delivery with Reinforcement Learning and Deep Neural Networks

Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization.

  • Author: Daniel Salgado Rojo
  • Tutor: Toni Lozano Bagén
  • University: Autonomous University of Barcelona
  • Master: Master's Degree in Modelling for Science and Engineering
  • Specialization: Data Science

Abstract

This master thesis is motivated by a real world problem for Product Delivery (PD) - an optimization problem which combines inventory control and vehicle routing - inspired by a project from the company, Grupo AIA, where I have been doing an internship from March to June 2018. The solution proposed to the client uses classical constraint optimization techniques which tend to be slow when finding optimal solutions and do not scale properly when the number of shops and trucks used to deliver a product increases.

Machine Learning (ML) has become a very popular field in Data Science due to the increase in computation power in recent years. It is usually said that ML techniques divide in two types, Supervised Learning and Unsupervised Learning. However, this is not a complete classification. Apart from these two types, we must distinguish another one which is very different from those two: Reinforcement Learning (RL). RL consists of techniques that have been used for decades in the field of Artificial Intelligence for many applications in fields such as robotics and industrial automation [1], health and medicine [2, 3], Media and Advertising [4, 5, 6], Finance [7], text, speech and dialog systems [8, 9], and so forth.

RL provides a nice framework to model a large variety of stochastic optimization problems [10]. Nevertheless, classical approaches to large RL problems suffer from three curses of dimensionality: explosions in state and action spaces, and a large number of possible next states of an action due to stochasticity [11, 12]. There is very few literature about the application of Reinforcement Learning to a PD optimization framework. The only paper we have found that focus on the practical application of RL to the domain of PD is from S. Proper and P. Tadepalli (2006) [12]. In that paper they propose a variant of a classical RL technique called ASH-learning, where they use “tabular linear functions” (TLF) to learn the so-called H-values, which are then used to decide how to control the product delivery system of interest. Proper and Tadepalli show the results of controlling a simplistic and discretised system of 5 shops and 4 trucks with ASH-learning, where the three curses of dimensionality are present, and the results are successful for that particular example with an small number of trucks and shops. However, in practical situations, the number of shops and trucks may be so large (for instance, lets say 30 shops and about 7 to 10 trucks) that the explosion of the dimensionality of the state and action spaces would make those classical RL techniques impractical.

In this thesis [13] we present a novel approach for solving product delivery problems by means of Reinforcement Learning and Deep Neural Networks (DNN), a field also referred to as Deep Reinforcement Learning (DRL). The idea is that the nonlinearity and complexity of DNN should be better for learning to solve complex optimization problems than TLF, and the tabular functions in general that have been used so far in classical RL. Moreover, we expect that DNN could be the key to solve some of the curses of dimensionality such as the explosion of the state-action spaces; in the framework of PD, we expect them to scale better than classical approaches to systems with a large number of shops and trucks. In addition, we have developed an OpenAI gym environment for our PD problem which is available in a GitHub repository here.

Contents

The following subsections are the main parts of the work. The first one is about classical reinforcement learning (the Q-learning algorithm). The second one focus on classical supervised Machine Learning using Neural Networks in the context of multiclass calssification. The third part focus on a more recent reinforcement learning algorithm called Policy Gradient, and which by the usage of Neural networks it scales much better than with Q-learning with the classical approach.

1. Q-learning (Classical Reinforcement Learning)

In this first part (from Chapter 4) we introduce Q-learning, one of the most popular value-based algorithms aimed to learn the optimal Q-values and from them define an optimal policy. Although Q-learning algorithm is a bit antiquate, it will serve us as an starting point to learn classical reinforcement learning by applying it to our product delivery problem.

2. Imitation-Learning (Supervised Machine Learning: Classification)

In Chapter 5 we introduce the basic concepts about Artificial Neural Networks, more concretely Deep Neural Networks (DNN). We start with an introduction of what is a NN model, and then focus on how to train it. Finally we present an application of DNN to classification, for a toy example related to the product delivery problem we are working with in this thesis. In this folder we can find the notebooks and simulation folders for that part.

3. Policy-Gradient (Deep Reinforcement Learning)

In chapter 6 we aruse DNN to play the role of a parametrized policy $\pi_\theta$, and introduce a particular type of algorithms, Policy Gradient (PG), that allow us to train the network to improve the policy by means of simulated episodes. The field where there are used Deep Neural Networks to solve Reinforcement Learning problems is called Deep Reinforcement Learning (DRL). In this folder we have put all simulations and notebooks related to that part.

References

  • [1] Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. Int. J. Rob. Res., 32(11):1238–1274, September 2013.

  • [2] Michael J. Frank, Lauren C. Seeberger, and Randall C. O’Reilly. By carrot or by stick: Cog- nitive reinforcement learning in parkinsonism. Science, 306(5703):1940–1943, 2004.

  • [3] Zhao Yufan, Kosorok Michael R., and Zeng Donglin. Reinforcement learning design for cancer clinical trials. Statistics in Medicine, 28(26):3294–3315.

  • [4] Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Alex Slivkins. A multiworld testing decision service. CoRR, abs/1606.03966, 2016.

  • [5] Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. Real-time bidding by reinforcement learning in display advertising. CoRR, abs/1701.02490, 2017.

  • [6] Naoki Abe, Naval Verma, Chid Apte, and Robert Schroko. Cross channel optimized marketing by reinforcement learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 767–772, New York, NY, USA, 2004. ACM.

  • [7] Francesco Bertoluzzo and Marco Corazza. Reinforcement learning for automated financial trading: Basics and applications. In Simone Bassis, Anna Esposito, and Francesco Carlo Morabito, editors, Recent Advances of Neural Network Models and Applications, pages 197– 213, Cham, 2014. Springer International Publishing.

  • [8] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive summarization. CoRR, abs/1705.04304, 2017.

  • [9] Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, and Li Deng. Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. ACL – Association for Computational Linguistics, July 2017.

  • [10] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.

  • [11] W. B. Powell, A. George, B. Bouzaiene-Ayari, and H. P. Simao. Approximate dynamic pro- gramming for high dimensional resource allocation problems. In Proceedings. 2005 IEEE In- ternational Joint Conference on Neural Networks, 2005., volume 5, pages 2989–2994 vol. 5, July 2005.

  • [12] Scott Proper and Prasad Tadepalli. Scaling model-based average-reward reinforcement learning for product delivery. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 735–742, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

For the rest of the references see the full document here

Prerequisites

  • Python (tested for python 3.5 and 3.6)
  • The gym-pdsystem python package is needed due to some of the python libraries that are found there. Just clone the repository from here
  • Tensorflow library

License

This work is under a GNU license