Online Regret Bounds for a New Reinforcement Learning Algorithm
There is a more recent version of this eprint available. Click here to view it.
We present a new learning algorithm for undiscounted finite-step reinforcement learning with restarts. Our algorithm is based on upper confidence bounds and achieves logarithmic online regret in respect to an optimal policy.
Available Versions of this Item