## AbstractHumans and animals can learn to make close to optimal decisions, even if there are multiple sources of uncertainty in their environment. The Bayesian framework has proved to be a valuable computational tool for modeling learning and decision making processes under uncertainty. We present a reward-modulated Hebbian plasticity rule, the Bayesian Hebb rule, which provides a possible mechanism within the Bayesian framework for fast reward-based learning in the brain, using only a simple and experimentally well supported type of synaptic plasticity. We show analytically that the Bayes-optimal synaptic weights of model neurons are attractors in weight space for expected updates under the Bayesian Hebb rule. Thus, premature convergence to local minima like in gradient-descent approaches cannot occur. We also suggest a suitable pre-processing of sensory evidence, based on factor graph representations of statistical dependencies between input signals. If the Bayesian Hebb rule is applied to the proposed pre-processing, convergence to optimal decision making policies is guaranteed, even for scenarios with complex input and reward distributions. The reward-modulated Bayesian Hebb rule is a biologically inspired rule that provides both an efficient learning mechanism, and a link between the theory of Bayesian inference and animal (or human) learning in simple instrumental conditioning tasks. As an example we provide a model for a recent experiment by Yang and Shadlen, in which rhesus monkeys were trained to learn decision making under uncertainty in a task that was adapted from the weather prediction task, used to study human learning. Information from four different visual stimuli had to be integrated to obtain probabilities that one of two possible actions would lead to reward. It was shown that firing rates of neurons in area LIP of the monkey cortex represent log-likelihood ratios of obtaining rewards with one of the actions. If networks of model neurons are trained with the reward-modulated Bayesian Hebb rule, the same neural representation of log-likelihood ratios and the same behavioral strategies arise. Hence our simple model may bring us one step closer to understanding the neural implementation of learning and decision making under uncertainty in monkey or human brains.
[Edit] |