Piecewise Training for Structured Prediction
Charles Sutton and Andrew McCallum
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training (PW) that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood (PWPL), designed for when variables have large cardinality. On several real-world natural language processing tasks, piecewise training performs superior to Besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to PW and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.