Basic versions of agents from Spinning Up in Deep RL written in PyTorch
Basic versions of agents from Spinning Up in Deep RL written in PyTorch. Designed to run quickly on CPU on Pendulum-v0
from OpenAI Gym.
To see differences between algorithms, try running diff -y <file1> <file2>
, e.g., diff -y ddpg.py td3.py
.
For MPI versions of on-policy algorithms, see the mpi
branch.
vpg.py
)trpo.py
)ppo.py
)ddpg.py
)td3.py
)sac.py
)dqn.py
)Note that implementation details can have a significant effect on performance, as discussed in What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. This codebase attempts to be as simple as possible, but note that for instance on-policy algorithms use separate actor and critic networks, a state-independent policy standard deviation, per-minibatch advantage normalisation, and several critic updates per minibatch, while the deterministic off-policy algorithms use layer normalisation. Equally, soft actor-critic uses a transformed Normal distribution by default, but this can also help the on-policy algorithms.