Temporal Difference Based Actor Critic Learning -
Convergence and Neural Implementation
Dotan Di Castro, Dmitry Volkinshtein and Ron Meir
, Volume 21
, Number 21.
, United States
Actor-critic algorithms for reinforcement learning are achieving renewed popularity
due to their good convergence properties in situations where other approaches
often fail (e.g., when function approximation is involved). Interestingly, there is
growing evidence that actor-critic approaches based on phasic dopamine signals
play a key role in biological learning through cortical and basal ganglia loops.
We derive a temporal difference based actor critic learning algorithm, for which
convergence can be proved without assuming widely separated time scales for the
actor and the critic. The approach is demonstrated by applying it to networks
of spiking neurons. The established relation between phasic dopamine and the
temporal difference signal lends support to the biological relevance of such algorithms.