|
A compression-based method for stemmatic analysis AbstractStemmatology studies relations among different variants of a text that has been gradually altered as a result of imperfectly copying the text over and over again. Applications are mainly in humanities, especially textual criticism, but the methods can be used to study the evolution of any symbolic objects, including chain letters and computer viruses.We propose an algorithm for stemmatic analysis based on a minimum-information criterion and stochastic tree optimization. Our approach is related to phylogenetic reconstruction criteria such as maximum parsimony and maximum likelihood, and builds upon algorithmic techniques developed for bioinformatics. Unlike many earlier methods, the proposed method does not require significant preprocessing of the data but rather, operates directly on aligned text files. We demonstrate our method on a real-world experiment involving all 52 known variants of the legend of St. Henry of Finland, and provide the first computer-generated family tree of the legend. The obtained tree of the variants is supported to a large extent by results obtained with more traditional methods, and identifies a number of previously unrecognized relations.
[Edit] |