Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization for Product Delivery
Master’s thesis about Reinforcement Learning (RL), Deep RL and Optimization.
This master thesis is motivated by a real world problem for Product Delivery (PD) - an optimization
problem which combines inventory control and vehicle routing - inspired by a project from the
company, Grupo AIA, where I have been doing an internship from March to June 2018. The
solution proposed to the client uses classical constraint optimization techniques which tend to be
slow when finding optimal solutions and do not scale properly when the number of shops and trucks
used to deliver a product increases.
Machine Learning (ML) has become a very popular field in Data Science due to the increase in
computation power in recent years. It is usually said that ML techniques divide in two types,
Supervised Learning and Unsupervised Learning. However, this is not a complete classification.
Apart from these two types, we must distinguish another one which is very different from those
two: Reinforcement Learning (RL). RL consists of techniques that have been used for decades in
the field of Artificial Intelligence for many applications in fields such as robotics and industrial
automation [1], health and medicine [2, 3], Media and Advertising [4, 5, 6], Finance [7], text, speech
and dialog systems [8, 9], and so forth.
RL provides a nice framework to model a large variety of stochastic optimization problems [10].
Nevertheless, classical approaches to large RL problems suffer from three curses of dimensionality:
explosions in state and action spaces, and a large number of possible next states of an action
due to stochasticity [11, 12]. There is very few literature about the application of Reinforcement
Learning to a PD optimization framework. The only paper we have found that focus on the
practical application of RL to the domain of PD is from S. Proper and P. Tadepalli (2006) [12].
In that paper they propose a variant of a classical RL technique called ASH-learning, where they
use “tabular linear functions” (TLF) to learn the so-called H-values, which are then used to decide
how to control the product delivery system of interest. Proper and Tadepalli show the results of
controlling a simplistic and discretised system of 5 shops and 4 trucks with ASH-learning, where
the three curses of dimensionality are present, and the results are successful for that particular
example with an small number of trucks and shops. However, in practical situations, the number
of shops and trucks may be so large (for instance, lets say 30 shops and about 7 to 10 trucks) that
the explosion of the dimensionality of the state and action spaces would make those classical RL
techniques impractical.
In this thesis [13] we present a novel approach for solving product delivery problems by means
of Reinforcement Learning and Deep Neural Networks (DNN), a field also referred to as Deep
Reinforcement Learning (DRL). The idea is that the nonlinearity and complexity of DNN should
be better for learning to solve complex optimization problems than TLF, and the tabular functions
in general that have been used so far in classical RL. Moreover, we expect that DNN could be the
key to solve some of the curses of dimensionality such as the explosion of the state-action spaces;
in the framework of PD, we expect them to scale better than classical approaches to systems with
a large number of shops and trucks. In addition, we have developed an OpenAI gym environment
for our PD problem which is available in a GitHub repository here.
The following subsections are the main parts of the work. The first one is about classical reinforcement learning (the Q-learning algorithm). The second one focus on classical supervised Machine Learning using Neural Networks in the context of multiclass calssification. The third part focus on a more recent reinforcement learning algorithm called Policy Gradient, and which by the usage of Neural networks it scales much better than with Q-learning with the classical approach.
In this first part (from Chapter 4) we introduce Q-learning, one of the most popular value-based algorithms aimed to learn the optimal Q-values and from them define an optimal policy. Although Q-learning algorithm is a bit antiquate, it will serve us as an starting point to learn classical reinforcement learning by applying it to our product delivery problem.
In Chapter 5 we introduce the basic concepts about Artificial Neural Networks, more concretely
Deep Neural Networks (DNN). We start with an introduction of what is a NN model, and then focus
on how to train it. Finally we present an application of DNN to classification, for a toy example
related to the product delivery problem we are working with in this thesis. In this folder we can find the
notebooks and simulation folders for that part.
In chapter 6 we aruse DNN to play the role of a parametrized policy $\pi_\theta$, and introduce
a particular type of algorithms, Policy Gradient (PG), that allow us to train the network to improve
the policy by means of simulated episodes. The field where there are used Deep Neural Networks
to solve Reinforcement Learning problems is called Deep Reinforcement Learning (DRL). In this folder we have put all simulations and notebooks related to that part.
[11] W. B. Powell, A. George, B. Bouzaiene-Ayari, and H. P. Simao. Approximate dynamic pro-
gramming for high dimensional resource allocation problems. In Proceedings. 2005 IEEE In-
ternational Joint Conference on Neural Networks, 2005., volume 5, pages 2989–2994 vol. 5,
July 2005.
[12] Scott Proper and Prasad Tadepalli. Scaling model-based average-reward reinforcement learning
for product delivery. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors,
Machine Learning: ECML 2006, pages 735–742, Berlin, Heidelberg, 2006. Springer Berlin
Heidelberg.
For the rest of the references see the full document here
This work is under a GNU license