Learning and Inference in Phrase Recognition: A Filtering-Ranking Architecture Using Perceptron
PhD thesis, Universitat Politècnica de Catalunya.
This thesis takes a machine learning approach to the general problem of recognizing phrases in a sentence. This general problem instantiates in many disambiguation tasks of Natural Language Processing, such as Shallow Syntactic Parsing, Clause Identification, Named Entity Extraction or Semantic Role Labeling. In all of them, a sentence has to be segmented into many labeled phrases, that form a sequence or hierarchy of phrases.
We study such problems under a unifying framework for recognizing a structure of phrases in a sentence. The methodology combines learning and inference techniques, and consists of decomposing the problem of recognizing a complex structure into many intermediate steps or local decisions, each recognizing a simple piece of the structure. Such decisions are solved with supervised learning, by training functions from data that predict outcomes for the decisions. Inference combines the outcomes of learning functions applied to different parts of a given sentence to build a phrase structure for it.
In a phrase recognition architecture, two issues are of special interest: efficiency and learnability. By decomposing the general problem into lower-level problems, both properties can be achieved. On the one hand, the type of local decisions we deal with are simple enough to be learned with reasonable accuracy. On the other hand, the type of representations of a decomposed structure allows efficient inference algorithms that build a structure by combining many different pieces. Within this framework, we discuss a modeling choice related to the granularity at which the problem is decomposed: word-level or phrase-level. Word-level decompositions, used commonly in shallow parsing tasks, reduce the phrase recognition problem into a sequential tagging problem, for which many techniques exist.
In this thesis, we concentrate on phrase-based models, that put learning in a context more expressive than word-based models, at the cost of increasing the complexity of learning and inference processes. We describe incremental inference strategies for both type of models that go from greedy to robust, with respect to their ability to trade off local predictions to form a coherent phase structure. Finally, we describe discriminative learning strategies for training the components of a phrase recognition architecture. We focus on large margin learning algorithms, and discuss the difference between training each predictor locally and independently, and training globally and dependently all predictors.
As a main contribution, we propose a phrase recognition architecture that we name Filtering-Ranking. Here, a filtering component is first used to substantially reduce the space of possible solutions, by applying learning at word level. On the top of it, a ranking component applies learning at phrase level to discriminate the best structure among those that pass the filter. We also present a global learning algorithm based on Perceptron, that we name FR-Perceptron. The algorithm trains the filters and
rankers of the architecture at the same time, and benefits from the interactions that these predictors exhibit within the architecture.
We present exhaustive experimentation with FR-Perceptron in the context of several partial parsing problems proposed in the CoNLL Shared Tasks. We provide empirical evidence that our global learning algorithm is advantageous over a local learning strategy. Furthermore, the results we obtain are among the best results published on the tasks, and in some cases they improve the state-of-the-art.