Correct-by-synthesis reinforcement learning with temporal logic constraints (CoRL)
This is a repo for the deep learning course project : Correct-by-synthesis reinforcement learning with temporal logic constraints 1.
More information regarding the course can be found here
The main aim of this project is implementation and evaluating the results for the examples discussed by the authors in the paper.
Table of Contents
Follow these three steps to succesffully install and run the project.
Install anaconda on your OS and then run the below mentioned shell commands with conda comand path
Use the terminal or an Anaconda Prompt for the following steps:
Create the environment from the environment.yml file under conda/
conda env create -f CoRL_env.yml
The first line of the yml file sets the new environment’s name. In my case it is dl_project_env
. You could modify it before compiling this file.
Activate the new environment: note replace myenv with “dl_project_env”
conda activate myenv
Verify that the new environment was installed correctly. You should see a star(‘*‘) in front the current active env
conda env list
You can also use
conda info --envs
frames
folder within the src
folder. This folder is used to save the frames used to create a gif in the code.slugs
executable. cd into the src folder in slugs
and run the make
. ls
to ensure that the executable exists. cd into the src/ folder and type
python3 main.py <save_flag> <grid_size>
For convenience I would recommend using save_flag = True
and the value of grid_size(N) = 4 - 7
.
├── conda # contains the environment file necessary to reproduce the python env
│ └── CoRL_env.yml
├── lib # contains the source code for slugs tool
│ ├── README.md
│ └── slugs
├── README.md
└── src # containts all the code relevant to the project
├── figures_and_gifs # directory containing all the saved gifs for Example 1. and Example 2 (refer to the project report).
├── frames # required to store frames. Need to make (using mkdir) one before running the main.py module
├── learning_reward_function # source code containing various player, learning algothing and the rl env.
├── main.py # main file
├── results # directory to dump the run time stats
├── saved_players # binary files of learned player
└── two_player_game # source code relevant to construction of two-player game
└── plots # directory containing all the high resolution plots of policy, Q plot, Valid transitions, max V change while learning
Autonomous systems are gradually becoming ubiquitous. Beyond simply
demonstrating these systems, it is becoming increasingly important to provide guarantees that they behave safely and reliably. We can leverage
methods developed from the Formal methods community to synthesize controllers/strategies to achieve a given task with task critical safety and performance guarantees. Traditional open-loop synthesis may work well in static
environments but they may fail to find strategies that guarantee task completion under uncertain or adversarial behavior of the environment. Reactive synthesis is the field that deals with systems that continuously interact
with the environment they operate in. These interactions have novel constraints such as real-time constraints, concurrency, parallelism that make
them difficult to model correctly. We can model the interaction between the
system(robot in our case) and the environment as a two player game and
synthesize a winning strategy that satisfies the given specification formulated
within a fragment of temporal logic framework. Thus we can synthesize controllers that guarantee completion of the task using Formal methods. We
can then employ reinforcement learning techniques to learn to achieve the
given task optimally by learning the underlying unknown reward function.
Thus we establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance
criterion) in regards to the task in a stochastic environment for a fragment
of temporal logic specification. Hence, we can guarantee both qualitative
(encoded as the winning condition) and quantitative (optimal reactive controllers) performance for a system operating in an unknown environment.
This project can be decoupled into two major sub-problems :
Task : The system robot (green) should always be in a cell diagonally opposite to the env robot
Please refer to project_report directory for the presentation and the paper report for this project. Please contact me if you have questions at :karan.muvvala@colorado.edu
[1]: Min Wen, Rüdiger Ehlers, and Ufuk Topcu. “Correct-by-synthesis reinforcement learning with temporal logic constraints”. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2015, pp. 4983–4990.