PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Smoothing and Compression with Stochastic $k$-testable Tree Languages
Juan Ramón Rico-Juan, Jorge Calera-Rubio and Rafael C. Carrasco
Pattern Recognition Journal Volume Accepted, to appear, 2003.

Abstract

In this paper, we describe some techniques to learn probabilistic $k$-testable tree models, a generalization of the well known $k$-gram models, that can be used to compress or classify structured data. These models are easy to infer from samples and allow for incremental updates. Moreover, as shown here, backing-off schemes can be defined to solve data sparseness, a problem that often arises when using trees to represent the data. These features make them suitable to compress structured data files at a better rate than string-based methods.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:761
Deposited By:Jorge Calera-Rubio
Deposited On:30 December 2004