Thai Paragraph Shortening Based on Binary Classiﬁcation Model
Kitsuchart Pasupa and Ponrudee Netisopakul
In: Joint International Symposium on Natural Language Processing and Agricultural Ontology Service (SNLP-AOS'2011), 9-10 Feb 2012, Bangkok, Thailand.
Thai sentences can be simplified or shortened by simply cutting some words out without changing its meaning. In this paper, Linear and non-linear Fisher discriminant analysis are applied to shorten Thai paragraph in a corpus. Features used in this paper are unique word ID and part of speech of the target word, as well as its three previous and three next adjacent words, and also its role as content/function word. Two scenarios are investigated namely global model and document-specific model. The results demonstrated that both Fisher discriminant analysis and kernel Fisher discriminant analysis significantly improved classification accuracy over the baseline for both scenarios. We found that, part of speech of the target word is the most relevant feature followed by part of speech of adjacent words. Moreover, the document-specific model achieved higher accuracy than the global model. This could be an evidence that author's writing style plays an important role in paragraph shortening task.