PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets
Teemu Roos and Tuomas Heikkilä
Literary and Linguistic Computing 2009. ISSN 0268-1145


Given a collection of imperfect copies of a textual document, the aim of stemmatology is to reconstruct the history of the text, indicating for each variant the source text from which it was copied. We describe an experiment involving three artificial benchmark data sets to which a number of computer-assisted stemmatology methods were applied. Contrary to earlier similar experiments, we propose and use a numerical criterion to evaluate all the solutions. Moreover, our primary data set is significantly larger than used before. The results suggest the superiority of two computer-assisted methods amongst those tested: the maximum parsimony method implemented in the PAUP* software package and a related compression-based method we have proposed in earlier work.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
ID Code:5164
Deposited By:Teemu Roos
Deposited On:24 March 2009