PAC-learnability of Probabilistic Deterministic Finite State Automata in terms of Variation Distance
Nick Palmer and Paul Goldberg
In: 16th Algorithmic Learning Theory Conference, 8-11 Oct 2005, Singapore.
We consider the problem of PAC-learning distributions over strings,
represented by probabilistic deterministic finite automata (PDFAs).
PDFAs are a probabilistic model for the generation of strings of
symbols, that have been used in the context of speech and handwriting
recognition, and bioinformatics. Recent work on learning PDFAs from
random examples has used the KL-divergence as the error measure; here
we use the variation distance. We build on recent work by Clark and
Thollard, and show that the use of the variation distance allows
simplifications to be made to the algorithms, and also a strengthening
of the results; in particular that using the variation distance, we
obtain polynomial sample size bounds that are independent of the
expected length of strings.
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Paul Goldberg|
|Deposited On:||28 November 2005|