Reinforcement learning agents in Python (dynamic programming, temporal-difference, deep Q-learning, stochastic/deterministic policy gradients)