PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Determining an Author's Native Language by Mining a Text for Errors
Moshe Koppel, Jonathan Schler and Kfir Zigdon
Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005) Number 2005, 2005.


In this paper, we show that stylistic text features can be exploited to determine an anonymous author's native language with high accuracy. Specifically, we first use automatic tools to ascertain frequencies of various stylistic idiosyncrasies in a text. These frequencies then serve as features for support vector machines that learn to classify texts according to author native language.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:1433
Deposited By:Jonathan Schler
Deposited On:30 October 2005