Online Regret Bounds for a New Reinforcement Learning Algorithm
This is the latest version of this eprint.
We present a new learning algorithm for undiscounted finite-step reinforcement learning with restarts. Our algorithm is based on upper confidence bounds and achieves logarithmic online regret in respect to an optimal policy.
Available Versions of this Item