PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Compression-based stemmatology: A study of the legend of St. Henry of Finland
Teemu Roos, Tuomas Heikkilä, Rudi Cilibrasi and Petri Myllymäki
(2005) Technical Report. Helsinki Institute for Information Technology, Helsinki, Finland.


Stemmatology studies relations among different variants of a text that has been gradually altered as a result of imperfectly copying the text over and over again. We propose a new computer-assisted method for stemmatic analysis based on compression of the variants. The method is related to phylogenetic reconstruction criteria such as maximum parsimony and maximum likelihood. We apply our method to the tradition of the legend of St. Henry of Finland, and report encouraging preliminary results. The obtained family tree of the variants, the stemma, corresponds to a large extent with results obtained with more traditional methods. Some of the identified groups of manuscripts are previously unrecognized ones. Moreover, due to the impossibility of manually exploring all plausible alternatives among the vast number of possible trees, this work is the first attempt at a complete stemma for the legend of St. Henry. The used methods are being released as open-source software.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Natural Language Processing
ID Code:1818
Deposited By:Teemu Roos
Deposited On:28 November 2005