项目作者: BY571

项目描述 :
PyTorch implementation of the Q-Learning Algorithm Normalized Advantage Function for continuous control problems + PER and N-step Method
高级语言: Jupyter Notebook
项目地址: git://github.com/BY571/Normalized-Advantage-Function-NAF-.git
创建时间: 2020-06-30T14:10:16Z
项目社区:https://github.com/BY571/Normalized-Advantage-Function-NAF-

开源协议:MIT License

下载


Normalized Advantage Function (NAF)

PyTorch implementation of the NAF algorithm based on the paper: Continuous Deep Q-Learning with Model-based Acceleration.

Two versions are implemented:

  1. Jupyter notebook version
  2. Script version (results tracking with wandb)

Recently added PER and n-step method

To run the script version: python naf.py

with the arguments:

  1. '-env' : Name of the environment (default: Pendulum-v0)
  2. '-info' : Name of the Experiment (default: Experiment-1)
  3. '-f', --frames : Number of training frames (default: 40000)
  4. '-mem' : Replay buffer size (default: 100000)
  5. '-b', --batch_size : Batch size (default: 128)
  6. '-l', --layer_size : Neural Network layer size (default: 256)
  7. '-g'--gamma : Discount factor gamma (default: 0.99)
  8. '-t', --tau : Soft update factor tau (default: 1e-3)
  9. '-lr', --learning_rate : Learning rate (default: 1e-3)
  10. '-u', --update_every : update the network every x step (default: 1)
  11. '-n_up', --n_updates : update the network for x steps (default: 1)
  12. '-s', --seed : random seed (default: 0)
  13. '-per', choices=[0,1] : Use prioritized experience replay (default: 0)
  14. '-nstep' : nstep_bootstrapping (default: 1)
  15. '-d2rl': Using Deep Dense Network if set to 1 (default: 0)
  16. '--eval_every': Doing an evaluation of the current policy every X frames (default: 1000)
  17. '--eval_runs': Number of evaluation runs - performance is averaged over all runs (default: 3)

alttext

In the paper they compared NAF with DDPG and showed faster and more stable learning: We show that, in comparison to recently proposed deep actor-critic algorithms, our method tends to learn faster and acquires more accurate policies.

To verify and support their statement I tested NAF on Pendulum-v0 and LunarLanderConinuous-v2 and compared it with the results of my implementation of DDPG.

The results shown do not include the model-based acceleration! Only the base NAF algorithm was tested.

alttext

alttext

Indeed the results show a faster and more stable learning!

TODO:

  • Test with Double Q-nets like SAC
  • Test with Entropy Regularization (like sac)
  • Test with REDQ Q-Net ensemble

Feel free to use this code for your own projects or research:

  1. @misc{Normalized Advantage Function,
  2. author = {Dittert, Sebastian},
  3. title = {PyTorch Implementation of Normalized Advantage Function},
  4. year = {2020},
  5. publisher = {GitHub},
  6. journal = {GitHub repository},
  7. howpublished = {\url{https://github.com/BY571/NAF}},
  8. }