On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
Implementation of the paper by Aurélien Garivier and Eric Moulines, On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems [1]. We also try some variants of the algorithms and compare them together.
Our experiments with the different algorithms are compiled in the notebook experiements.ipynb/
To install simply clone the project :
git clone https://github.com/MaxenceGiraud/ucb-nonstationary
cd ucb-nonstationary/
import numpy as np
import nsucb
from bandit_env import *
# Arms sequence
def arm_f(t):
arms = [Bernoulli(0.5),Bernoulli(0.1),Bernoulli(0.4)]
if t> 300 and t<500 :
arms[1] = Bernoulli(0.9)
return arms
n=3 # nb of arms
mab = MAB_NS(3,arm_f)
# Algorithms
ucb = nsucb.UCB(n)
d= nsucb.DiscountedUCB(n)
sw= nsucb.SlidingUCB(n)
# Run simulations
RunExpes([ucb,d,sw],mab,50,T,non_stationary=True,quantiles=False)
To compile the report, you will need latex installed and an appropriate compiler, then you can simply :
cd report/
pdflatex main.tex
[1] Garivier, Aurélien & Moulines, Eric. (2008). On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems.