A TF2.0 implementation of RL baselines.
A Deep Reinforcement Learning codebase in TensorFlow 2.0 with an unified, flexible and highly customizable structure for fast prototyping.
Features | Unstable Baselines | Stable-Baselines3 | OpenAI Baselines |
---|---|---|---|
State of the art RL methods | ![]() |
![]() |
![]() |
Documentation | ![]() |
![]() |
![]() |
Custom callback (2) | ![]() |
![]() |
![]() |
TensorFlow 2.0 support | ![]() |
![]() |
![]() |
Clean, elegant code | ![]() |
![]() |
![]() |
Easy to trace, customize | ![]() |
![]() |
![]() |
Standalone implementations | ![]() |
![]() |
![]() |
(1) Currently only support DQN, C51, PPO, TD3, …etc. We are still working on other algorithms.
(2) For example, in Stable-Baselines, you need to write this disgusting custom callback to save the best-performed model , while in Unstable Baselines, they are automatically saved.
(3) If you have traced Stable-baselines or OpenAI/baselines once, you’ll never do that again.
(4) Many cross-dependencies across all algos make the code very hard to trace, for example baselines/common/policies.py, baselines/a2c/a2c.py…. Great job! OpenAI!
We don’t have any documentation yet.
Basic requirements:
You can install from PyPI
$ pip install unstable_baselines
Or you can also install the latest version from this repository
$ pip install git+https://github.com/Ending2015a/unstable_baselines.git@master
Done! Now, you can
Algorithm | Box |
Discrete |
MultiDiscrete |
MultiBinary |
---|---|---|---|---|
DQN | ![]() |
![]() |
![]() |
![]() |
PPO | ![]() |
![]() |
![]() |
![]() |
TD3 | ![]() |
![]() |
![]() |
![]() |
SD3 | ![]() |
![]() |
![]() |
![]() |
Algorithm | Box |
Discrete |
MultiDiscrete |
MultiBinary |
---|---|---|---|---|
C51 | ![]() |
![]() |
![]() |
![]() |
QRDQN | ![]() |
![]() |
![]() |
![]() |
IQN | ![]() |
![]() |
![]() |
![]() |
This example shows how to train a PPO agent to play CartPole-v0
. You can find the full scripts in example/cartpole/train_ppo.py.
First, import dependencies
import gym
import unstable_baselines as ub
from unstable_baselines.algo.ppo import PPO
Create environments for training and evaluation
# create environments
env = ub.envs.VecEnv([gym.make('CartPole-v0') for _ in range(10)])
eval_env = gym.make('CartPole-v0')
Create a PPO model and train it
model = PPO(
env,
learning_rate=1e-3,
gamma=0.8,
batch_size=128,
n_steps=500
).learn( # train for 20000 steps
20000,
verbose=1
)
Save and load the trained model
model.save('./my_ppo_model')
model = PPO.load('./my_ppo_model')
Evaluate the training results
model.eval(eval_env, 20, 200, render=True)
# don't forget to close the environments!
env.close()
eval_env.close()
More examples: