Hybrid model of Conditional Random Field and Support Vector Machine
Qinfeng Shi, Mark Reid and Tiberio Caetano
In: Workshop at the 23rd Annual Conference on Neural Information Processing Systems, 11-12 Dec 2009, Whistler, Canada.

## Abstract

Conditional Random Fields (CRFs) are semi-generative (despite often being classified as discriminative models) in the sense that it estimates the conditional probability $D(y|x)$ (given any observation $x$) of any label $y$, which is {\bf generated} from $D(y|x)$. Estimating $D(y|x)$ is usually more efficient than estimating $D(x|y)$ when there aren't sufficient observation $x$ per class or there are too many labels (e.g. there are exponential many $y$ for a chain-like $x$). Unlike CRFs, Support Vector Machine (SVM) seeks for a predicting function regardless of modeling the underlying distribution. It is fisher inconsistent in multiclass and structured label case, however, it does provide a PAC bound on the true error. Particularly, its PAC-Bayes margin bound is rather tight, which states that, knowing training sample size $m$, hypothesis space $\Hcal$ and margin threshold $\gamma$, with overwhelming probability at least $1-\delta$, the true error is upper bounded by the empirical error $+ O(\sqrt{\frac{\gamma^{-2}\log|\Hcal|\log m+\log {\delta^{-1}}}{m}}).$ Is there a model that is fisher consistent for classification and has a generalization bound? We use a naive combination of two models by simply weighted summing up the losses of two. It turns out a surprising theoretical result --- the hybrid loss could be fisher consistent in some circumstance and it has a PAC-bayes bound on its true error.