项目作者: kngwyu

项目描述 :
:umbrella: Deep RL agents with PyTorch:umbrella:
高级语言: Python
项目地址: git://github.com/kngwyu/Rainy.git
创建时间: 2018-06-22T09:48:45Z
项目社区:https://github.com/kngwyu/Rainy

开源协议:Apache License 2.0

下载


Rainy

Actions Status
PyPI version
Black

Reinforcement learning utilities and algrithm implementations using PyTorch.

Example

Rainy has a main decorator which converts a function that returns rainy.Config
to a CLI app.
All function arguments are re-interpreted as command line arguments.

  1. import os
  2. from torch.optim import RMSprop
  3. import rainy
  4. from rainy import Config, net
  5. from rainy.agents import DQNAgent
  6. from rainy.envs import Atari
  7. from rainy.lib.explore import EpsGreedy, LinearCooler
  8. @rainy.main(DQNAgent, script_path=os.path.realpath(__file__))
  9. def main(
  10. envname: str = "Breakout",
  11. max_steps: int = int(2e7),
  12. replay_size: int = int(1e6),
  13. replay_batch_size: int = 32,
  14. ) -> Config:
  15. c = Config()
  16. c.set_env(lambda: Atari(envname))
  17. c.set_optimizer(
  18. lambda params: RMSprop(params, lr=0.00025, alpha=0.95, eps=0.01, centered=True)
  19. )
  20. c.set_explorer(lambda: EpsGreedy(1.0, LinearCooler(1.0, 0.1, int(1e6))))
  21. c.set_net_fn("dqn", net.value.dqn_conv())
  22. c.replay_size = replay_size
  23. c.replay_batch_size = replay_batch_size
  24. c.train_start = 50000
  25. c.sync_freq = 10000
  26. c.max_steps = max_steps
  27. c.eval_env = Atari(envname)
  28. c.eval_freq = None
  29. return c
  30. if __name__ == "__main__":
  31. main()

Then you can use this script like

  1. python dqn.py --replay-batch-size=64 train --eval-render

See examples directory for more.

API documentation

COMING SOON

Supported python version

Python >= 3.7

Implementation Status

Algorithm Multi Worker(Sync) Recurrent Discrete Action Continuous Action MPI support
DQN/Double DQN :heavy_check_mark: :x: :heavy_check_mark: :x: :x:
BootDQN/RPF :x: :x: :heavy_check_mark: :x: :x:
DDPG :heavy_check_mark: :x: :x: :heavy_check_mark: :x:
TD3 :heavy_check_mark: :x: :x: :heavy_check_mark: :x:
SAC :heavy_check_mark: :x: :x: :heavy_check_mark: :x:
PPO :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
A2C :heavy_check_mark: :small_red_triangle:(1) :heavy_check_mark: :heavy_check_mark: :x:
ACKTR :heavy_check_mark: :x:(2) :heavy_check_mark: :heavy_check_mark: :x:
AOC :heavy_check_mark: :x: :heavy_check_mark: :heavy_check_mark: :x:
PPOC :heavy_check_mark: :x: :heavy_check_mark: :heavy_check_mark: :x:
ACTC(3) :heavy_check_mark: :x: :heavy_check_mark: :heavy_check_mark: :x:

(1): Very unstable

(2): Needs https://openreview.net/forum?id=HyMTkQZAb implemented

(3): Incomplete implementation. β is often too high.

Sub packages

References

DQN (Deep Q Network)

DDQN (Double DQN)

Bootstrapped DQN

RPF(Randomized Prior Functions)

DDPQ(Deep Deterministic Policy Gradient)

TD3(Twin Delayed Deep Deterministic Policy Gradient)

SAC(Soft Actor Critic)

A2C (Advantage Actor Critic)

ACKTR (Actor Critic using Kronecker-Factored Trust Region)

PPO (Proximal Policy Optimization)

AOC (Advantage Option Critic)

PPOC (Proximal Option Critic)

ACTC (Actor Critic Termination Critic)

Implementaions I referenced

Thank you!

https://github.com/openai/baselines

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

https://github.com/ShangtongZhang/DeepRL

https://github.com/chainer/chainerrl

https://github.com/Thrandis/EKFAC-pytorch (for ACKTR)

https://github.com/jeanharb/a2oc_delib (for AOC)

https://github.com/mklissa/PPOC (for PPOC)

https://github.com/sfujim/TD3 (for DDPG and TD3)

https://github.com/vitchyr/rlkit (for SAC)

License

This project is licensed under Apache License, Version 2.0
(LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).