## AbstractIn this report we investigate various Batch Mode Reinforcement Learning (BRL) Algorithms for continuous control problems. There is an increasing interest for Batch Mode Reinforcement Learning algorithms in the research community, because BRL has some interesting properties. Training data is used more efficiently, we can learn completely offline from randomly generated episodes and we can use supervised batch-mode regression algorithms like regression trees or batch mode neural network learning algorithms (which are known to have better convergence properties). In this paper we investigate Experience Replay, one of the first Batch mode algorithms, Monte Carlo Learning, Fitted Q-Iteration and some modifications of these algorithms. The results are compared for different function approximator schemes like Regression Forests, Model Trees (Forests), Local Regression, Neural Networks, LWPR and RBF networks. We compare the results for the Point to Point Movement task, which implements the basic characteristics of moving the Center of Mass of a humanoid robot.
[Edit] |