Reward-modulated Hebbian Learning of Decision Making
We introduce a framework for Bayesian decision making in which the learning of optimal decisions is reduced to its simplest and biologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded, and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and post-synaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-take all operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the co-activation of the pre and postsynaptic neurons, and not on the weighted sum of all presynaptic inputs to the post-synaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to learning Bayes-optimal decisions requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably pre-processed form resulting from other adaptive processes (acting on a larger time scale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of Bayes-optimal decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.