PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Stochastic Approach to Median String Computation
Jose Oncina and Cristian Olivares
Lecture Notes in Computer Science Volume 5342, pp. 431-440, 1008.


Abstract. Due to its robustness to outliers, many Pattern Recognition algorithms use the median as a representative of a set of points. A special case arises in Syntactical Pattern Recognition when the points (prototypes) are represented by strings. However, when the edit distance is used, finding the median becomes a NP-Hard problem. Then, either the search is restricted to strings in the data (set-median) or some heuristic approach is applied. In this work we use the (conditional) stochastic edit distance instead of the plain edit distance. It is not yet known if in this case the problem is also NP-Hard so an approximation algorithm is described. The algorithm is based on the extension of the string structure to multistrings (strings of stochastic vectors where each element represents the probability of each symbol) to allow the use of the Expectation Maximization technique. We carry out some experiments over a chromosomes corpus to check the efficiency of the algorithm.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:4573
Deposited By:Jose Oncina
Deposited On:13 March 2009