Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization for Product Delivery
dsalgador/master-thesis
Folders and files, repository files navigation, solving an optimization problem for product delivery with reinforcement learning and deep neural networks.
Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization.
- Author : Daniel Salgado Rojo
- Tuto r: Toni Lozano Bagén
- University : Autonomous University of Barcelona
- Master : Master's Degree in Modelling for Science and Engineering
- Specialization : Data Science
This master thesis is motivated by a real world problem for Product Delivery (PD) - an optimization problem which combines inventory control and vehicle routing - inspired by a project from the company, Grupo AIA, where I have been doing an internship from March to June 2018. The solution proposed to the client uses classical constraint optimization techniques which tend to be slow when finding optimal solutions and do not scale properly when the number of shops and trucks used to deliver a product increases.
Machine Learning (ML) has become a very popular field in Data Science due to the increase in computation power in recent years. It is usually said that ML techniques divide in two types, Supervised Learning and Unsupervised Learning. However, this is not a complete classification. Apart from these two types, we must distinguish another one which is very different from those two: Reinforcement Learning (RL). RL consists of techniques that have been used for decades in the field of Artificial Intelligence for many applications in fields such as robotics and industrial automation [1], health and medicine [2, 3], Media and Advertising [4, 5, 6], Finance [7], text, speech and dialog systems [8, 9], and so forth.
RL provides a nice framework to model a large variety of stochastic optimization problems [10]. Nevertheless, classical approaches to large RL problems suffer from three curses of dimensionality: explosions in state and action spaces, and a large number of possible next states of an action due to stochasticity [11, 12]. There is very few literature about the application of Reinforcement Learning to a PD optimization framework. The only paper we have found that focus on the practical application of RL to the domain of PD is from S. Proper and P. Tadepalli (2006) [12]. In that paper they propose a variant of a classical RL technique called ASH-learning, where they use “tabular linear functions” (TLF) to learn the so-called H-values, which are then used to decide how to control the product delivery system of interest. Proper and Tadepalli show the results of controlling a simplistic and discretised system of 5 shops and 4 trucks with ASH-learning, where the three curses of dimensionality are present, and the results are successful for that particular example with an small number of trucks and shops. However, in practical situations, the number of shops and trucks may be so large (for instance, lets say 30 shops and about 7 to 10 trucks) that the explosion of the dimensionality of the state and action spaces would make those classical RL techniques impractical.
In this thesis [13] we present a novel approach for solving product delivery problems by means of Reinforcement Learning and Deep Neural Networks (DNN), a field also referred to as Deep Reinforcement Learning (DRL). The idea is that the nonlinearity and complexity of DNN should be better for learning to solve complex optimization problems than TLF, and the tabular functions in general that have been used so far in classical RL. Moreover, we expect that DNN could be the key to solve some of the curses of dimensionality such as the explosion of the state-action spaces; in the framework of PD, we expect them to scale better than classical approaches to systems with a large number of shops and trucks. In addition, we have developed an OpenAI gym environment for our PD problem which is available in a GitHub repository here .
The following subsections are the main parts of the work. The first one is about classical reinforcement learning (the Q-learning algorithm). The second one focus on classical supervised Machine Learning using Neural Networks in the context of multiclass calssification. The third part focus on a more recent reinforcement learning algorithm called Policy Gradient, and which by the usage of Neural networks it scales much better than with Q-learning with the classical approach.
1. Q-learning (Classical Reinforcement Learning)
In this first part (from Chapter 4) we introduce Q-learning, one of the most popular value-based algorithms aimed to learn the optimal Q-values and from them define an optimal policy. Although Q-learning algorithm is a bit antiquate, it will serve us as an starting point to learn classical reinforcement learning by applying it to our product delivery problem.
2. Imitation-Learning (Supervised Machine Learning: Classification)
In Chapter 5 we introduce the basic concepts about Artificial Neural Networks, more concretely Deep Neural Networks (DNN). We start with an introduction of what is a NN model, and then focus on how to train it. Finally we present an application of DNN to classification, for a toy example related to the product delivery problem we are working with in this thesis. In this folder we can find the notebooks and simulation folders for that part.
3. Policy-Gradient (Deep Reinforcement Learning)
In chapter 6 we aruse DNN to play the role of a parametrized policy $\pi_\theta$ , and introduce a particular type of algorithms, Policy Gradient (PG), that allow us to train the network to improve the policy by means of simulated episodes. The field where there are used Deep Neural Networks to solve Reinforcement Learning problems is called Deep Reinforcement Learning (DRL). In this folder we have put all simulations and notebooks related to that part.
[1] Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. Int. J. Rob. Res., 32(11):1238–1274, September 2013.
[2] Michael J. Frank, Lauren C. Seeberger, and Randall C. O’Reilly. By carrot or by stick: Cog- nitive reinforcement learning in parkinsonism. Science, 306(5703):1940–1943, 2004.
[3] Zhao Yufan, Kosorok Michael R., and Zeng Donglin. Reinforcement learning design for cancer clinical trials. Statistics in Medicine, 28(26):3294–3315.
[4] Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Alex Slivkins. A multiworld testing decision service. CoRR, abs/1606.03966, 2016.
[5] Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. Real-time bidding by reinforcement learning in display advertising. CoRR, abs/1701.02490, 2017.
[6] Naoki Abe, Naval Verma, Chid Apte, and Robert Schroko. Cross channel optimized marketing by reinforcement learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 767–772, New York, NY, USA, 2004. ACM.
[7] Francesco Bertoluzzo and Marco Corazza. Reinforcement learning for automated financial trading: Basics and applications. In Simone Bassis, Anna Esposito, and Francesco Carlo Morabito, editors, Recent Advances of Neural Network Models and Applications, pages 197– 213, Cham, 2014. Springer International Publishing.
[8] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive summarization. CoRR, abs/1705.04304, 2017.
[9] Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, and Li Deng. Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. ACL – Association for Computational Linguistics, July 2017.
[10] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
[11] W. B. Powell, A. George, B. Bouzaiene-Ayari, and H. P. Simao. Approximate dynamic pro- gramming for high dimensional resource allocation problems. In Proceedings. 2005 IEEE In- ternational Joint Conference on Neural Networks, 2005., volume 5, pages 2989–2994 vol. 5, July 2005.
[12] Scott Proper and Prasad Tadepalli. Scaling model-based average-reward reinforcement learning for product delivery. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 735–742, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
For the rest of the references see the full document here
Prerequisites
- Python (tested for python 3.5 and 3.6)
- The gym-pdsystem python package is needed due to some of the python libraries that are found there. Just clone the repository from here
- Tensorflow library
This work is under a GNU license
- Jupyter Notebook 99.8%
- Python 0.2%
Deep Reinforcement Learning for Bipedal Robots
Contributors
Published Date
Reuse Rights
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Reinforcement Learning (RL) is a general purpose framework for designing controllers for non-linear systems. It tries to learn a controller (policy) by trial and error. This makes it highly suitable for systems which are difficult to control using conventional control methodologies, such as walking robots. Traditionally, RL has only been applicable to problems with low dimensional state space, but use of Deep Neural Networks as function approximators with RL have shown impressive results for control of high dimensional systems. This approach is known as Deep Reinforcement Learning (DRL). A major drawback of DRL algorithms is that they generally require a large number of samples and training time, which becomes a challenge when working with real robots. Therefore, most applications of DRL methods have been limited to simulation platforms. Moreover, due to model uncertainties like friction and model inaccuracies in mass, lengths etc., a policy that is trained on a simulation model might not work directly on a real robot. The objective of the thesis is to apply a DRL algorithm, the Deep Deterministic Policy Gradient (DDPG), for a 2D bipedal robot. The bipedal robot used for analysis is developed by the Delft BioRobotics Lab for Reinforcement Learning purposes and is known as LEO. The DDPG method is applied on a simulated model of LEO and compared with traditional RL methods like SARSA. To overcome the problem of high sample requirement when learning a policy on the real system, an iterative approach is developed in this thesis which learns a difference model and then learns a new policy with this difference model. The difference model captures the mismatch between the real robot and the simulated model. The approach is tested for two experimental setups in simulation, an inverted pendulum problem and LEO. The difference model is able to learn a policy which is almost optimal compared to training on a real system from scratch, with only \SI{10}{\percent} of the samples required.
IMAGES
VIDEO
COMMENTS
Master's thesis about Reinforcement Learning (RL), Deep RL and Optimization for Product Delivery Topics. deep-neural-networks reinforcement-learning tensorflow deep-reinforcement-learning q-learning policy-gradient gym-environment product-delivery Resources. Readme License. GPL-3.0 license Activity. Stars. 18 stars
Master of Science Thesis Siddharth Ravi. ii Table of Contents ... Most reinforcement learning algorithms are therefore currently deployed only at lower frequencies, which greatly limit their capabilities. This project begins by investigating two important questions relating to the choice of sampling
DEEP REINFORCEMENT LEARNING FOR ADAPTIVE CONTROL IN ROBOTICS By Luke Bhan Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of MASTER of SCIENCE in Computer Science May 13, 2022 Nashville, Tennessee Approved: Gautam Biswas, Ph.D. Marcos Quinnones-Grueiro, Ph.D.
The objective of this thesis is to investigate the use of Deep Reinforcement Learning as path planning method at the MiR100 robot. The outcome provides a proof-of-concept and evaluates to what extent further investments should be made in this field. 1.2. Related Work The popularity of Deep Reinforcement Learning (DRL) increased immensely in ...
Master thesis (2017) Authors. S. Ravi Mechanical Engineering ... We call this the disappearing reinforcements problem. Moreover, truly multi-task reinforcement learning is only possible when agents are able to operate across frequencies, as different platforms operate at different frequencies. Most algorithms from control theory working on ...
Deep Reinforcement Learning applied to robotic grasping through vision-based systems, IntegrIA is researching solutions that could reduce the software costs of robotic grasping applications focused on pick-and-place tasks. Thus, this master's thesis implements a state-of-the-art reinforcement learning algorithm
Master thesis (2017) Authors. D. Rastogi Mechanical ... Reinforcement Learning (RL) is a general purpose framework for designing controllers for non-linear systems. ... The objective of the thesis is to apply a DRL algorithm, the Deep Deterministic Policy Gradient (DDPG), for a 2D bipedal robot. The bipedal robot used for analysis is developed ...
Master's Thesis Comparing Deep Reinforcement Learning Methods for Engineering Applications Author: Shengnan Chen August 25, 2018 Advisors: Prof. Dr.-Ing. habil. Sanaz Mostaghim ... Reinforcement Learning methods (Deep Q-Learning, Deep Deterministic Policy Gra-dient and Distributed Proximal Policy Optimization); to solve tasks on three di ...
Master Thesis Dani el A. Duwaer Department of Mathematics and Computer Science Web Engineering Group Eindhoven, July 2016 Supervisors TU/e: prof. dr. M. Pechenizkiy ... reinforcement learning, by Abdulhai et al. in 2003 [3] and by a follow-up PhD thesis in 2012 [4]. However, as will be explained later on in Section2, these approaches su er from ...
Reinforcement learning for robotic assisted tasks 7 List of Acronyms ADL Activities of Daily Living. 9, 11, 45 DMPs Dynamic Movement Primitives. 18, 21, 27 DOF Degrees of Freedom. 28 IL Imitation Learning. 24, 25 PPO Proximal Policy Optimization. 13, 16, 18, 25 RL Reinforcement Learning. 13, 24, 25 ROS Robot Operating System. 18