PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Information Theory of Decisions and Actions
Naftali Tishby and Daniel Polani
In: Vassilis, Hussain, Taylor (editors) "Perception-reason-action cycle: Models, algorithms and systems" (Springer, 2010) 2010.


The perception-action cycle is often defined as “the circular flow of information between an organismand its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster 2001, 2006). The question we address in this paper is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the Perception-Action-Cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity — the minimal number of (binary) decisions required for reaching a goal — directly bounded by information measures, as in communication. This analogy allows us to extend the standard Reinforcement Learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to Reinforcement Learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut-Arimoto iterations. This trade-off between value-to-go and information-togo provides a complete analogy with the compression-distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5915
Deposited By:Naftali Tishby
Deposited On:08 March 2010