|
SPSA based Actor-Critic Algorithm by Using Deterministic Perturbation Sequences AbstractWe develop a simulation-based actor-critic algorithm for infinite horizon Markov decision processes with finite state space and finite action space, with a discounted cost criterion. The algorithm essentially does gradient search in the space of randomized policies and uses simultaneous deterministic perturbation stochastic approximation (SDPSA) type estimates. The algorithm combines the features of two-time scale actor-critic algorithms with those of gradient search based SDPSA technique.
[Edit] |