Stability of Learning Dynamics in Two-Agent, Imperfect-Information Games
John M. Butterworth and Jonathan L. Shapiro
In: FOGA 2009, 9-11 January 2009, Orlando, Florida USA.
One issue in multi-agent co-adaptive learning concerns convergence. When two (or more) agents play a game with different information and different payoffs, the general behaviour tends to be oscillation around a Nash equilibrium. Several algorithms have been proposed to force convergence to mixed-strategy Nash equilibria in
imperfect-information games when the agents are aware of their
opponent's strategy. We consider the effect on one such algorithm,
the lagging anchor algorithm, when each agent must also infer the
gradient information from observations, in the infinitesimal time-step limit. Use of an estimated gradient, either by opponent
modelling or stochastic gradient ascent, destabilises the algorithm
in a region of parameter space. There are two phases of behaviour.
If the rate of estimation is low, the Nash equilibrium becomes unstable in the mean. If the rate is high, the Nash equilibrium is an attractive fixed point in the mean, but the uncertainty acts as arrow-band coloured noise, which causes dampened oscillations.