|
Online Regret Bounds for a New Reinforcement Learning Algorithm There is a more recent version of this eprint available. Click here to view it. AbstractWe present a new learning algorithm for undiscounted finite-step reinforcement learning with restarts. Our algorithm is based on upper confidence bounds and achieves logarithmic online regret in respect to an optimal policy.
Available Versions of this Item
[Edit] |