项目作者: ac-93

项目描述 :
Modified versions of the SAC algorithm from spinningup for discrete action spaces and image observations.
高级语言: Python
项目地址: git://github.com/ac-93/soft-actor-critic.git
创建时间: 2019-09-23T12:50:17Z
项目社区:https://github.com/ac-93/soft-actor-critic

开源协议:MIT License

下载


soft-actor-critic

This repo consists of modifications to the Spinningup implementation of the Soft Actor-Critic algorithm to allow for both image observations and discrete action spaces.

Trained Atari agents (courtesy of https://github.com/yining043):

BeamRider
Enduro
Breakout
SpaceInvaders
Qbert

Dependencies:

  1. tensorflow 1.15.0
  2. gym[atari] 0.15.7
  3. cv2
  4. mpi4py
  5. numpy
  6. matplotlib

Implentations of Soft Actor Critic (SAC) algorithms from:

  1. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018 https://arxiv.org/abs/1801.01290

  2. Soft Actor-Critic Algorithms and Applications, Haarnoja et al, 2019, https://arxiv.org/abs/1812.05905

  3. Soft Actor Critic for Discrete Action Settings, Petros Christodoulou, 2019, https://arxiv.org/abs/1910.07207
    (authors implementation here: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)

Based on the implementations given in Spinningup

https://spinningup.openai.com/en/latest/algorithms/sac.html

Different approaches for discrete setting

Two different methods given for using SAC with discrete action spaces.

  • sac_discrete_gb uses the Gumbel Softmax distribtuion to reparameterize the discrete action space. This keeps algorithm similar to the original SAC implementation for continuous action spaces.

  • sac_discrete avoids reparmeterisation and calculate the entropy and KL divergence from the discrete actions given by the policy network. This is based on the method described in [3] and is most accurate to the original SAC papers, I also find best results with this method.

Versions of the algorithms that work with image observations such as the atari gym environments are in the image observation directory.