Robot Learning with Regularized Reinforcement Learning
In this work, we present L2-regularized counterparts of two widely-used reinforcement learning / dynamic programming algorithms: approximate policy iteration and fitted Q-iteration. We show how our regularized algorithms can be implemented efficiently when the value function approximator belongs to 1) a space spanned by a finite number of linearly independent basis functions (a parametric approach), and 2) a reproducing kernel Hilbert space (a non-parametric approach). We also prove finite-sample performance bounds for our algorithms. In particular, we show that they are able to achieve rates that are as good as the corresponding regression rates when the value functions belong to a known smoothness class. Finally, we report the results of applying our algorithms to a visual-servoing problem.