项目作者: MaxenceGiraud

项目描述 :
On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
高级语言: Python
项目地址: git://github.com/MaxenceGiraud/ucb-nonstationary.git
创建时间: 2020-12-01T13:15:43Z
项目社区:https://github.com/MaxenceGiraud/ucb-nonstationary

开源协议:

下载


On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Implementation of the paper by Aurélien Garivier and Eric Moulines, On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems [1]. We also try some variants of the algorithms and compare them together.

Our experiments with the different algorithms are compiled in the notebook experiements.ipynb/

Installation

To install simply clone the project :

  1. git clone https://github.com/MaxenceGiraud/ucb-nonstationary
  2. cd ucb-nonstationary/

Usage

  1. import numpy as np
  2. import nsucb
  3. from bandit_env import *
  4. # Arms sequence
  5. def arm_f(t):
  6. arms = [Bernoulli(0.5),Bernoulli(0.1),Bernoulli(0.4)]
  7. if t> 300 and t<500 :
  8. arms[1] = Bernoulli(0.9)
  9. return arms
  10. n=3 # nb of arms
  11. mab = MAB_NS(3,arm_f)
  12. # Algorithms
  13. ucb = nsucb.UCB(n)
  14. d= nsucb.DiscountedUCB(n)
  15. sw= nsucb.SlidingUCB(n)
  16. # Run simulations
  17. RunExpes([ucb,d,sw],mab,50,T,non_stationary=True,quantiles=False)

To compile the report, you will need latex installed and an appropriate compiler, then you can simply :

  1. cd report/
  2. pdflatex main.tex

TODO

  • Implement non stationary Bandit
  • Discounted UCB
  • Sliding-Window UCB

References

[1] Garivier, Aurélien & Moulines, Eric. (2008). On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems.