项目作者: JuliaPOMDP

项目描述 :
Julia implementations of temporal difference Reinforcement Learning algorithms like Q-Learning and SARSA
高级语言: Julia
项目地址: git://github.com/JuliaPOMDP/TabularTDLearning.jl.git
创建时间: 2017-03-06T07:05:33Z
项目社区:https://github.com/JuliaPOMDP/TabularTDLearning.jl

开源协议:Other

下载


TabularTDLearning

CI
codecov

This repository provides Julia implementations of the following Temporal-Difference reinforcement learning algorithms:

  • Q-Learning
  • SARSA
  • SARSA lambda
  • Prioritized Sweeping

Note that these solvers are tabular, and will only work with MDPs that have discrete state and action spaces.

Installation

  1. Pkg.add("TabularTDLearning")

Example

  1. using POMDPs
  2. using TabularTDLearning
  3. using POMDPModels
  4. using POMDPTools
  5. mdp = SimpleGridWorld()
  6. # use Q-Learning
  7. exppolicy = EpsGreedyPolicy(mdp, 0.01)
  8. solver = QLearningSolver(exploration_policy=exppolicy, learning_rate=0.1, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100)
  9. policy = solve(solver, mdp)
  10. # Use SARSA
  11. solver = SARSASolver(exploration_policy=exppolicy, learning_rate=0.1, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100)
  12. policy = solve(solver, mdp)
  13. # Use SARSA lambda
  14. solver = SARSALambdaSolver(exploration_policy=exppolicy, learning_rate=0.1, lambda=0.9, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100)
  15. policy = solve(solver, mdp)
  16. # Use Prioritized Sweeping
  17. mdp_ps = SimpleGridWorld(tprob=1.0)
  18. solver = PrioritizedSweepingSolver(exploration_policy=exppolicy, learning_rate=0.1, n_episodes=5000, max_episode_length=50, eval_every=50, n_eval_traj=100,pq_threshold=0.5)
  19. policy = solve(solver,mdp_ps)