项目作者: MuvvalaKaran

项目描述 :
Correct-by-synthesis reinforcement learning with temporal logic constraints (CoRL)
高级语言: C
项目地址: git://github.com/MuvvalaKaran/CoRL.git
创建时间: 2020-03-20T02:00:19Z
项目社区:https://github.com/MuvvalaKaran/CoRL

开源协议:

下载


Project Description

This is a repo for the deep learning course project : Correct-by-synthesis reinforcement learning with temporal logic constraints 1.

More information regarding the course can be found here

The main aim of this project is implementation and evaluating the results for the examples discussed by the authors in the paper.

Table of Contents

Instructions to run

Follow these three steps to succesffully install and run the project.

Installing the env

Install anaconda on your OS and then run the below mentioned shell commands with conda comand path

  • Install dependencies - Install conda env

Use the terminal or an Anaconda Prompt for the following steps:

  • Create the environment from the environment.yml file under conda/

    1. conda env create -f CoRL_env.yml
  • The first line of the yml file sets the new environment’s name. In my case it is dl_project_env. You could modify it before compiling this file.

  • Activate the new environment: note replace myenv with “dl_project_env”

    1. conda activate myenv
  • Verify that the new environment was installed correctly. You should see a star(‘*‘) in front the current active env

    1. conda env list
  • You can also use

    1. conda info --envs
  • Make a frames folder within the src folder. This folder is used to save the frames used to create a gif in the code.

Make Slugs executable

  • Finally, you need create the slugs executable. cd into the src folder in slugs and run the make. ls to ensure that the executable exists.

Running the code

cd into the src/ folder and type

python3 main.py <save_flag> <grid_size>

For convenience I would recommend using save_flag = True and the value of grid_size(N) = 4 - 7.

Directory Hierarchy

  1. ├── conda # contains the environment file necessary to reproduce the python env
  2. └── CoRL_env.yml
  3. ├── lib # contains the source code for slugs tool
  4. ├── README.md
  5. └── slugs
  6. ├── README.md
  7. └── src # containts all the code relevant to the project
  8. ├── figures_and_gifs # directory containing all the saved gifs for Example 1. and Example 2 (refer to the project report).
  9. ├── frames # required to store frames. Need to make (using mkdir) one before running the main.py module
  10. ├── learning_reward_function # source code containing various player, learning algothing and the rl env.
  11. ├── main.py # main file
  12. ├── results # directory to dump the run time stats
  13. ├── saved_players # binary files of learned player
  14. └── two_player_game # source code relevant to construction of two-player game
  15. └── plots # directory containing all the high resolution plots of policy, Q plot, Valid transitions, max V change while learning

About the project

Introduction

Autonomous systems are gradually becoming ubiquitous. Beyond simply
demonstrating these systems, it is becoming increasingly important to provide guarantees that they behave safely and reliably. We can leverage
methods developed from the Formal methods community to synthesize controllers/strategies to achieve a given task with task critical safety and performance guarantees. Traditional open-loop synthesis may work well in static
environments but they may fail to find strategies that guarantee task completion under uncertain or adversarial behavior of the environment. Reactive synthesis is the field that deals with systems that continuously interact
with the environment they operate in. These interactions have novel constraints such as real-time constraints, concurrency, parallelism that make
them difficult to model correctly. We can model the interaction between the
system(robot in our case) and the environment as a two player game and
synthesize a winning strategy that satisfies the given specification formulated
within a fragment of temporal logic framework. Thus we can synthesize controllers that guarantee completion of the task using Formal methods. We
can then employ reinforcement learning techniques to learn to achieve the
given task optimally by learning the underlying unknown reward function.
Thus we establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance
criterion) in regards to the task in a stochastic environment for a fragment
of temporal logic specification. Hence, we can guarantee both qualitative
(encoded as the winning condition) and quantitative (optimal reactive controllers) performance for a system operating in an unknown environment.

Proposed approach

This project can be decoupled into two major sub-problems :

  • Compute a set of permissive (winning) strategies that are realizable for a given game
  • Choose a strategy that maximizes the underlying unknow reward function using maximin-Q learning algorithm.

RESULTS

Task : The system robot (green) should always be in a cell diagonally opposite to the env robot

" class="reference-link">

" class="reference-link">

" class="reference-link">

" class="reference-link">

Conclusion

Please refer to project_report directory for the presentation and the paper report for this project. Please contact me if you have questions at :karan.muvvala@colorado.edu

Reference

[1]: Min Wen, Rüdiger Ehlers, and Ufuk Topcu. “Correct-by-synthesis reinforcement learning with temporal logic constraints”. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2015, pp. 4983–4990.

state85.dot_1647941635974.pdf
state86.dot_1647941636001.pdf
state87.dot_1647941636011.pdf
state88.dot_1647941636022.pdf
state89.dot_1647941636047.pdf
state9.dot_1647941636068.pdf
state90.dot_1647941636088.pdf
state91.dot_1647941636109.pdf
state92.dot_1647941636116.pdf
state93.dot_1647941636134.pdf
state94.dot_1647941636166.pdf
state95.dot_1647941636173.pdf
state96.dot_1647941636190.pdf
state97.dot_1647941636210.pdf
state98.dot_1647941636217.pdf
state99.dot_1647941636295.pdf
dl_project_CoRL_Karan_Muvvala_1647941625648.pdf
results_and_findings_1647941626493.pdf
state0.dot_1647941627288.pdf
state1.dot_1647941627315.pdf
state10.dot_1647941627355.pdf
state100.dot_1647941627372.pdf
state101.dot_1647941627383.pdf
state102.dot_1647941627394.pdf
state103.dot_1647941627443.pdf
state104.dot_1647941627469.pdf
state105.dot_1647941627479.pdf
state106.dot_1647941627518.pdf
state107.dot_1647941627537.pdf
state108.dot_1647941627554.pdf
state109.dot_1647941627576.pdf
state11.dot_1647941627599.pdf
state110.dot_1647941627619.pdf
state111.dot_1647941627642.pdf
state112.dot_1647941627661.pdf
state113.dot_1647941627671.pdf
state114.dot_1647941627700.pdf
state115.dot_1647941627804.pdf
state116.dot_1647941627838.pdf
state117.dot_1647941627860.pdf
state118.dot_1647941627884.pdf
state119.dot_1647941627910.pdf
state12.dot_1647941627933.pdf
state120.dot_1647941627943.pdf
state121.dot_1647941627964.pdf
state122.dot_1647941627974.pdf
state123.dot_1647941628004.pdf
state124.dot_1647941628014.pdf
state125.dot_1647941628038.pdf
state126.dot_1647941628064.pdf
state127.dot_1647941628086.pdf
state128.dot_1647941628096.pdf
state129.dot_1647941628121.pdf
state13.dot_1647941628152.pdf
state130.dot_1647941628244.pdf
state131.dot_1647941628266.pdf
state132.dot_1647941628276.pdf
state133.dot_1647941628320.pdf
state134.dot_1647941628340.pdf
state135.dot_1647941628371.pdf
state136.dot_1647941628381.pdf
state137.dot_1647941628416.pdf
state138.dot_1647941628441.pdf
state139.dot_1647941628457.pdf
state14.dot_1647941628497.pdf
state140.dot_1647941628530.pdf
state141.dot_1647941628551.pdf
state142.dot_1647941628578.pdf
state143.dot_1647941628595.pdf
state144.dot_1647941628605.pdf
state145.dot_1647941628646.pdf
state146.dot_1647941628675.pdf
state147.dot_1647941628686.pdf
state148.dot_1647941628717.pdf
state149.dot_1647941628748.pdf
state15.dot_1647941628777.pdf
state150.dot_1647941628794.pdf
state151.dot_1647941628851.pdf
state152.dot_1647941628990.pdf
state153.dot_1647941629011.pdf
state154.dot_1647941629021.pdf
state155.dot_1647941629031.pdf
state156.dot_1647941629040.pdf
state157.dot_1647941629086.pdf
state158.dot_1647941629103.pdf
state159.dot_1647941629118.pdf
state16.dot_1647941629125.pdf
state160.dot_1647941629164.pdf
state161.dot_1647941629194.pdf
state162.dot_1647941629216.pdf
state163.dot_1647941629226.pdf
state164.dot_1647941629267.pdf
state165.dot_1647941629289.pdf
state166.dot_1647941629299.pdf
state167.dot_1647941629322.pdf
state168.dot_1647941629349.pdf
state169.dot_1647941629366.pdf
state17.dot_1647941629395.pdf
state170.dot_1647941629422.pdf
state171.dot_1647941629442.pdf