PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Algorithmic Rate-Distortion Theory
N.K. Vereshchagin and P.M.B. Vitanyi
(2005) CWI, Amsterdam.

Abstract

We propose and develop rate-distortion theory in the Kolmogorov complexity setting. This gives the ultimate limits of lossy compression of individual data objects, taking all effective regularities of the data into account.

EPrint Type:Other
Additional Information:Kolmogorov complexity is the accepted absolute measure of the information content of an individual finite object. It gives the ultimate limit on the number of bits resulting from lossless compression of the object---more precisely, the number of bits from which effective lossless decompression of the object is possible. A similar absolute approach is needed for lossy compression, that is, a rate-distortion theory giving the ultimate effective limits for individual finite data objects. We give natural definitions of the rate-distortion functions of individual data (independent of a random source producing those data). We analyze the possible shapes of the rate-distortion graphs for all data and all computable distortions. The classic Shannon rate-distortion curve corresponds approximately to the individual curves of typical (random) data from the postulated random source, while the nonrandom data have completely different curves. It is easy to see that one is generally interested in the behavior of lossy compression on complex structured nonrandom data, like pictures, movies, music, while the typical unstructured random data like noise (represented by the Shannon curve) is discarded (we are not likely to want to store it). %Given a probability %distribution, the expected Kolmogorov complexity equals the entropy %up to an additive term expressing the Kolmogorov complexity of the %distribution in question. The %pointswise expectation of the individual rate-distortion graphs %equals the Shannon rate-distortion graph up to a similar additive term. Finally, we formulate a new problem related to the practice of lossy compression. Is it the case that a code word that realizes least distortion of the source word at a given rate also captures the most properties of that source word that are possible at this rate? Clearly, this question cannot be well posed in the Shannon setting, where we deal with expected distortion, while also the notion of capturing a certain amount of the properties of the data cannot be well expressed. We show that in our setting this question is answered in the affirmative for every distortion measure that satisfies a certain parsimony-of-covering property.
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:1837
Deposited By:Paul Vitányi
Deposited On:29 December 2005