soft-actor-critic

This repo consists of modifications to the Spinningup implementation of the Soft Actor-Critic algorithm to allow for both image observations and discrete action spaces.

Trained Atari agents (courtesy of https://github.com/yining043):

BeamRider
Enduro
Breakout
SpaceInvaders
Qbert

Dependencies:

tensorflow 1.15.0
gym[atari] 0.15.7
cv2
mpi4py
numpy
matplotlib

Implentations of Soft Actor Critic (SAC) algorithms from:

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018 https://arxiv.org/abs/1801.01290
Soft Actor-Critic Algorithms and Applications, Haarnoja et al, 2019, https://arxiv.org/abs/1812.05905
Soft Actor Critic for Discrete Action Settings, Petros Christodoulou, 2019, https://arxiv.org/abs/1910.07207
(authors implementation here: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch)

Based on the implementations given in Spinningup

https://spinningup.openai.com/en/latest/algorithms/sac.html

Different approaches for discrete setting

Two different methods given for using SAC with discrete action spaces.

sac_discrete_gb uses the Gumbel Softmax distribtuion to reparameterize the discrete action space. This keeps algorithm similar to the original SAC implementation for continuous action spaces.
sac_discrete avoids reparmeterisation and calculate the entropy and KL divergence from the discrete actions given by the policy network. This is based on the method described in [3] and is most accurate to the original SAC papers, I also find best results with this method.

Versions of the algorithms that work with image observations such as the atari gym environments are in the image observation directory.