项目作者: keiohta

项目描述 :
TensorFlow2 Reinforcement Learning
高级语言: Python
项目地址: git://github.com/keiohta/tf2rl.git
创建时间: 2019-04-10T13:27:05Z
项目社区:https://github.com/keiohta/tf2rl

开源协议:MIT License

下载


Test
Coverage Status
MIT License
GitHub issues open
PyPI version

TF2RL

TF2RL is a deep reinforcement learning library that implements various deep reinforcement learning algorithms using TensorFlow 2.x.

1. Algorithms

Following algorithms are supported:

Algorithm Dicrete action Continuous action Support Category
VPG, PPO GAE Model-free On-policy RL
DQN (including DDQN, Prior. DQN, Duel. DQN, Distrib. DQN, Noisy DQN) - ApeX Model-free Off-policy RL
DDPG (including TD3, BiResDDPG) - ApeX Model-free Off-policy RL
SAC ApeX Model-free Off-policy RL
CURL, SAC-AE - - Model-free Off-policy RL
MPC, ME-TRPO - Model-base RL
GAIL, GAIfO, VAIL (including Spectral Normalization) - Imitation Learning

Following papers have been implemented in tf2rl:

Also, some useful techniques are implemented:

2. Installation

There are several ways to install tf2rl.
The recommended way is “2.1 Install from PyPI”.

If TensorFlow is already installed, we try to identify the best
version of TensorFlow Probability.

2.1 Install from PyPI

You can install tf2rl from PyPI:

  1. $ pip install tf2rl

2.2 Install from Source Code

You can also install from source:

  1. $ git clone https://github.com/keiohta/tf2rl.git tf2rl
  2. $ cd tf2rl
  3. $ pip install .

2.3 Preinstalled Docker Container

Instead of installing tf2rl on your (virtual) system, you can use
preinstalled Docker containers.

Only the first execution requires time to download the container image.

At the following commands, you need to replace <version> with the
version tag which you want to use.

2.3.1 CPU Only

The following simple command starts preinstalled container.

  1. $ docker run -it ghcr.io/keiohta/tf2rl/cpu:<version> bash

If you also want to mount your local directory /local/dir/path at
container /mount/point

  1. $ docker run -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/cpu:<version> bash

2.3.2 GPU Support (Linux Only, Experimental)

WARNING: We encountered unsolved errors when running ApeX multiprocess learning.

Requirements

  • Linux
  • NVIDIA GPU
    • TF2.2 compatible driver
  • Docker 19.03 or later

The following simple command starts preinstalled container.

  1. $ docker run --gpus all -it ghcr.io/keiohta/tf2rl/nvidia:<version> bash

If you also want to mount your local directory /local/dir/path at
container /mount/point

  1. $ docker run --gpus all -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/nvidia:<version> bash

If your container can see GPU correctly, you can check inside
container by the following comand;

  1. $ nvidia-smi

3. Getting started

Here is a quick example of how to train DDPG agent on a Pendulum environment:

  1. import gym
  2. from tf2rl.algos.ddpg import DDPG
  3. from tf2rl.experiments.trainer import Trainer
  4. parser = Trainer.get_argument()
  5. parser = DDPG.get_argument(parser)
  6. args = parser.parse_args()
  7. env = gym.make("Pendulum-v1")
  8. test_env = gym.make("Pendulum-v1")
  9. policy = DDPG(
  10. state_shape=env.observation_space.shape,
  11. action_dim=env.action_space.high.size,
  12. gpu=-1, # Run on CPU. If you want to run on GPU, specify GPU number
  13. memory_capacity=10000,
  14. max_action=env.action_space.high[0],
  15. batch_size=32,
  16. n_warmup=500)
  17. trainer = Trainer(policy, env, args, test_env=test_env)
  18. trainer()

You can check implemented algorithms in examples.
For example if you want to train DDPG agent:

  1. # You must change directory to avoid importing local files
  2. $ cd examples
  3. # For options, please specify --help or read code for options
  4. $ python run_ddpg.py [options]

You can see the training progress/results from TensorBoard as follows:

  1. # When executing `run_**.py`, its logs are automatically generated under `./results`
  2. $ tensorboard --logdir results

4. Usage

In basic usage, what you need is initializing one of the policy
classes and Trainer class.

As a option, tf2rl supports command line program style, so that you
can also pass configuration parameters from command line arguments.

4.1 Command Line Program Style

Trainer class and policy classes have class method get_argument,
which creates or updates
ArgParser object.

You can parse the command line arguments with the
ArgParser.parse_args method, which returns Namespace object.

Policy’s constructor option can be extracted from the Namespace
object explicitly. Trainer constructor accepts the Namespace
object.

  1. from tf2rl.algos.dqn import DQN
  2. from tf2rl.experiments.trainer import Trainer
  3. env = ... # Create gym.env like environment.
  4. parser = DQN.get_argument(Trainer.get_argument())
  5. args = parser.parse_args()
  6. policy = DQN(enable_double_dqn = args.enable_double_dqn,
  7. enable_dueling_dqn = args.enable_dueling_dqn,
  8. enable_noisy_dqn = args.enable_noisy_dqn)
  9. trainer = Trainer(policy, env, args)
  10. trainer()

4.2 Non Command Line Program Style (e.g. on Jupyter Notebook)

ArgParser doesn’t fit the usage on Jupyter Notebook like
envrionment. Trainer constructor can accept dict as args
argument instead of Namespace object.

  1. from tf2rl.algos.dqn import DQN
  2. from tf2rl.experiments.trainer import Trainer
  3. env = ... # Create gym.env like environment.
  4. policy = DQN( ... )
  5. trainer = Trainer(policy, env, {"max_steps": int(1e+6), ... })
  6. trainer()

4.3 Results

The Trainer class saves logs and models under
<logdir>/%Y%m%dT%H%M%S.%f. The default logdir is "results", and
it can be changed by --logdir command argument or "logdir" key in
constructor args.

5. Citation

  1. @misc{ota2020tf2rl,
  2. author = {Kei Ota},
  3. title = {TF2RL},
  4. year = {2020},
  5. publisher = {GitHub},
  6. journal = {GitHub repository},
  7. howpublished = {\url{https://github.com/keiohta/tf2rl/}}
  8. }