Automatic Selection of High Quality Parses Created By a Fully
Roi Reichart and Ari Rappoport
In: CoNLL 2009(2009).
The average results obtained by unsupervised
statistical parsers have greatly improved in the
last few years, but on many specific sentences
they are of rather low quality. The output of
such parsers is becoming valuable for various
applications, and it is radically less expensive
to create than manually annotated training
data. Hence, automatic selection of high quality
parses created by unsupervised parsers is
an important problem.
In this paper we present PUPA, a POS-based
Unsupervised Parse Assessment algorithm.
The algorithm assesses the quality of a parse
tree using POS sequence statistics collected
from a batch of parsed sentences. We evaluate
the algorithm by using an unsupervised
POS tagger and an unsupervised parser, selecting
high quality parsed sentences from English
(WSJ) and German (NEGRA) corpora.
We show that PUPA outperforms the leading
previous parse assessment algorithm for supervised
parsers, as well as a strong unsupervised
baseline. Consequently, PUPA allows
obtaining high quality parses without any human