Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets
Teemu Roos and Tuomas Heikkilä
Literary and Linguistic Computing
Given a collection of imperfect copies of a textual document, the aim of stemmatology is to reconstruct the history of the text, indicating for each variant the source text from which it was copied. We describe an experiment involving three artificial benchmark data sets to which a number of computer-assisted stemmatology methods were applied. Contrary to earlier similar experiments, we propose and use a numerical criterion to evaluate all the solutions. Moreover, our primary data set is significantly larger than used before. The results suggest the superiority of two computer-assisted methods amongst those tested: the maximum parsimony method implemented in the PAUP* software package and a related compression-based method we have proposed in earlier work.