Safe Option-Critic: Learning Safety in the Option-Critic Architecture
We introduce a novel framework known as “Safe-Option-Critic” (SOC), which provides safety in Option framework. Here safety is defined as “Prevention from accidents in ML systems due to poor designing on AI systems”. Options provide a method to incorporate temporal abstractions in RL setting. Here is link to Between MDPs and semi-MDPs:
A framework for temporal abstraction
in reinforcement learning. The Option-Critic Architecture provides a method for end-to-end learning of options including option policies, termination condition and policy over options.
The repo contains code for running SOC framework on tabular and continous state-space environments. For experiments in ALE see repo
* Numpy
* OpenAI Gym
* Matplotlib
* Seaborn
The following command is used for training Tabular (FrozenFourRoom Environment)
python --nruns 50 --nepisodes 500 --beta 0.1 --temperature 0.001 --lr_critic 0.5 --lr_intra 0.01 --lr_term 0.1
Use deafult parameters in code for best setting
The following command is used for training Continuous State Space Env : Puddle-World Env
python --nruns 50 --nepisodes 200 --beta 0.015 --temperature 0.1 --lr_critic 0.5 --lr_intra 0.05 --lr_term 0.05
Use the following PolttingPolicies.ipynb (iPython Notebook) for visualization of option policies and option termination in Frozen FourRoom Env.
Use the following ReturnPlots.ipynb (iPython Notebook) for visualization of return plots in Frozen FourRoom Env and Puddle-World Env.