项目作者: fiberleif

项目描述 :
Pytorch implementation of popular deep reinforcement learning algorithms towards SOA performance.
高级语言: Python
项目地址: git://github.com/fiberleif/Pytorch-RL.git
创建时间: 2018-12-04T13:29:20Z
项目社区:https://github.com/fiberleif/Pytorch-RL

开源协议:

下载


Pytorch-RL

Pytorch implementation of popular deep reinforcement learning algorithms towards SOA performance.

Implemented algorithms:

  • Proximal Policy Optimization (PPO)
  • Deep Deterministic Policy Gradient (DDPG)

To be implemented algorithms:

  • Trust Region Policy Optimization (TRPO)
  • Generative Adversatial Imitation Learning (GAIL)
  • (Double/Dueling) Deep Q-Learning (DQN)

Dependency

  • Python 3.6
  • Numpy 1.15
  • Scipy 1.1.0
  • Mujoco-py 0.5.7
  • Gym 0.9.0
  • sklearn 0.0
  • PyTorch v0.4.0

Code Usage

Run PPO algorithm in MuJoCo Suite

  1. cd ppo
  2. python ppo_train.py --e Reacher-v1 -n 60000 -b 50
  3. python ppo_train.py --e InvertedPendulum-v1
  4. python ppo_train.py --e InvertedDoublePendulum-v1 -n 12000
  5. python ppo_train.py --e Swimmer-v1 -n 2500 -b 5
  6. python ppo_train.py --e Hopper-v1 -n 30000
  7. python ppo_train.py --e HalfCheetah-v1 -n 3000 -b 5
  8. python ppo_train.py --e Walker2d-v1 -n 25000
  9. python ppo_train.py --e Ant-v1 -n 100000
  10. python ppo_train.py --e Humanoid-v1 -n 200000
  11. python ppo_train.py --e HumanoidStandup-v1 -n 200000 -b 5

Run DDPG algorithm in MuJoCo Suite

  1. cd ddpg
  2. python ddpg_train.py --e Reacher-v1 --start_timesteps 1000
  3. python ddpg_train.py --e InvertedPendulum-v1 --start_timesteps 1000
  4. python ddpg_train.py --e InvertedDoublePendulum-v1 --start_timesteps 1000
  5. python ddpg_train.py --e Swimmer-v1 --start_timesteps 1000
  6. python ddpg_train.py --e Hopper-v1 --start_timesteps 1000
  7. python ddpg_train.py --e HalfCheetah-v1 --start_timesteps 10000
  8. python ddpg_train.py --e Walker2d-v1 --start_timesteps 1000
  9. python ddpg_train.py --e Ant-v1 --start_timesteps 10000

References