Blocked inference in Bayesian tree substitution grammars
Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently converges in less time. A core component of the algorithm is a grammar transformation which represents an inﬁnite tree substitution grammar in a ﬁnite context free grammar. This enables efﬁcient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.