PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation
Dotan Di Castro, Dmitry Volkinshtein and Ron Meir
(2009) NIPS , Volume 21 , Number 21. MIT Press , United States .


Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal ganglia loops. We derive a temporal difference based actor critic learning algorithm, for which convergence can be proved without assuming widely separated time scales for the actor and the critic. The approach is demonstrated by applying it to networks of spiking neurons. The established relation between phasic dopamine and the temporal difference signal lends support to the biological relevance of such algorithms.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5250
Deposited By:Ron Meir
Deposited On:24 March 2009