项目作者: markub3327

项目描述 :
Deep Deterministic Policy Gradients
高级语言: Python
项目地址: git://github.com/markub3327/MountainCarContinuous.git
创建时间: 2020-11-24T10:15:01Z
项目社区:https://github.com/markub3327/MountainCarContinuous

开源协议:MIT License

下载


Deep Deterministic Policy Gradient (DDPG)

Release
Tag
Issues
Commits
Languages
Size

Theory

Agent is using DDPG algorithm to predict continuous actions in continuous state space. It has two networks: Actor and Critic.

https://towardsdatascience.com/reinforcement-learning-w-keras-openai-actor-critic-models-f084612cfd69



https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404



https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Actor topology



Actor

Critic topology



Critic

Inputs/Outputs

 The Actor network has 2 inputs from game: position, velocity. The output layer consists from fully-connected ‘tanh()’ layer for doing actions in range (-1.0, 1.0): force. Hidden layers are using ReLU function.

 The Critic network has 2 inputs from game (states) and 1 input from Actor network (action). Hidden layers are using ReLU function. The main function of this network is estimate quality of the action[t] in the state[t].

The Critic network is trained by Bellman equation:

  1. Q_target = reward + (1-done) * gamma * Q_next_state
  2. Q_target -> Q value to be trained,
  3. reward -> reward from game for action in state,
  4. gamma -> discount factor,
  5. Q_next_state -> quality of action in next state
  6. done -> 1, if it's terminal state or 0 in non-terminal state

Summary



Critic

Framework: Tensorflow 2.0


Languages: Python 3


Author: Martin Kubovcik