PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Context methods and infinite alphabets in information theory
Aurelien Garivier
(2006) PhD thesis, Université Paris Sud Orsay.


This thesis adresses some contemporary aspects of information theory,from source coding to some issues of model selection. For a start, we consider the problem of coding memoryless sources on a countable, infinite alphabet. As it is impossible to provide an solution both efficient and general, we use two approaches: we first establish conditions under which the entropic rate can be approached, and we consider restricted classes for which tail probabilities are controlled. The second approach sets no condition on the sources but provides a partial solution by coding only a part of the information - the pattern - which only captures repetitions in the message. In order to study more complex processes, we come back to the case of finite memory sources on a finite alphabet : it has given rise to many works and efficient algorithms like the Context Tree Weighting (CTW) Method. We show here that this method is also efficient on an non-parametric class of infinite memory sources: the renewal processes. We show then that the ideas CTW is based on can lead to a consistent estimator of the memory structure of a process, when this one is finite. In fact, we complete the study of the BIC context tree estimator for Variable Length Markov Chains. In the last part, it is shown how theses ideas can be generalized for more complex sources on an (countable or not) infinite alphabet. We obtain consistent estimators for the order of hidden Markov models with Poisson and Gaussian emission.

Postscript - Requires a viewer, such as GhostView
EPrint Type:Thesis (PhD)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:2707
Deposited By:Aurelien Garivier
Deposited On:22 November 2006