Discriminative Machine Learning with Structure
PhD thesis, University of California, Berkeley.
Some of the best performing classifiers in modern machine learning have been designed using discriminative learning, as exemplified by Support Vector Machines. The ability of discriminative learning to use flexible features via the kernel trick has enlarged the possible set of applications for machine learning. With the expanded range of possible applications though, it has become apparent that real world data exhibits more structure than has been assumed by classical methods. In this thesis, we show how to extend the discriminative learning framework to exploit different types of structure: on one hand, the structure on outputs, such as the combinatorial structure in word alignment; on the other hand, a latent variable structure on inputs, such as in text document classification.
In the context of structured output classification, we present a scalable algorithm for maximum-margin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem that allows us to use simple projection methods based on the dual extragradient algorithm of Nesterov. We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment. We then show how one can obtain state-of-the-art results for the word alignment task by formulating it as a quadratic assignment problem within our discriminative learning framework.
In the context of latent variable models, we present DiscLDA, a discriminative variant of the Latent Dirichlet Allocation (LDA) model which has been popular to model collections of text documents or images. In DiscLDA, we introduce a class-dependent linear transformation on the topic mixture proportions of LDA and estimate it discriminatively by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. Our experiments on the 20 Newsgroups document classification task show how our model can identify shared topics across classes as well as discriminative class-dependent topics.