Multi-Word Expression Identification Using Sentence Surface Features
Ram Boukobza and Ari Rappoport
In: EMNLP 2009(2009).
Much NLP research on Multi-Word Expressions
(MWEs) focuses on the discovery
of new expressions, as opposed to the
identification in texts of known expressions.
However, MWE identification is
not trivial because many expressions allow
variation in form and differ in the
range of variations they allow. We show
that simple rule-based baselines do not
perform identification satisfactorily, and
present a supervised learning method for
identification that uses sentence surface
features based on expressions’ canonical
form. To evaluate the method, we have
annotated 3350 sentences from the British
National Corpus, containing potential uses
of 24 verbal MWEs. The method achieves
an F-score of 94.86%, compared with
80.70% for the leading rule-based baseline.
Our method is easily applicable to
any expression type. Experiments in previous
research have been limited to the
while we also test on sentences in
which the words comprising the MWE appear
but not as an expression.