Information Theory of Decisions and Actions ## AbstractThe perception-action cycle is often defined as “the circular flow of information between an organismand its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster 2001, 2006). The question we address in this paper is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the Perception-Action-Cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity — the minimal number of (binary) decisions required for reaching a goal — directly bounded by information measures, as in communication. This analogy allows us to extend the standard Reinforcement Learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to Reinforcement Learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut-Arimoto iterations. This trade-off between value-to-go and information-togo provides a complete analogy with the compression-distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.
[Edit] |