项目作者: sritee

项目描述 :
Monte-Carlo Policy Gradient, Stochastic Policy Gradient and Numerical Gradient Policy Gradient
高级语言: Python
项目地址: git://github.com/sritee/Stochastic-Policy-Gradient-Methods.git
创建时间: 2017-02-19T18:24:42Z
项目社区:https://github.com/sritee/Stochastic-Policy-Gradient-Methods

开源协议:MIT License

下载


Stochastic Policy Gradient Methods

For a detailed discussion, visit : https://sridhartee.blogspot.in/2016/11/policy-gradient-methods.html

cartpole-actorcritic

We design and test 3 policy gradient methods in this repository

1) Monte Carlo Policy Gradient : Baseline used is average of rewards obtained, no baseline results in high variance

2) Actor Critic Method : Using Softmax policy and Q-learning Critic for value function estimation

3) Numerical Gradient Estimation : perturb the parameters and estimate the gradient using regression (X’X)^-1X’y.
Change num_rollouts to change the number of training examples we learn the gradient from.
Note that the actual number of runs is number of episodes * num_rollouts