SPSA based Actor-Critic Algorithm by Using Deterministic Perturbation Sequences
Centre for Discrete and Applicable Mathematics, London, UK.
We develop a simulation-based
actor-critic algorithm for infinite horizon Markov decision
processes with finite state space and finite action space,
with a discounted cost criterion. The algorithm essentially
does gradient search in the space of randomized policies
and uses simultaneous deterministic perturbation stochastic
approximation (SDPSA) type estimates. The algorithm combines
the features of two-time scale actor-critic algorithms with
those of gradient search based SDPSA technique.