PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Binary Similarity Measures and their Applications in Machine Learning
Ben Veal
(2011) PhD thesis, London School of Economics.

Abstract

Measures which quantify the similarity between two vectors have been of interest in the Machine Learning community where they are used in both supervised and unsupervised learning algorithms, and much attention has been paid to their general theory. In this the- sis we consider, by contrast, measuring the similarity between a binary vector and a set of binary vectors. We explore a number of different such measures and investigate how these may be used in binary classification tasks. We investigate mathematical properties of particular binary similarity measures, and the relationships among them. The measures studied build on a particular similarity measure initially investigated by Anthony and Hammer. In their paper they give characterisations of the similarity measure in terms of logical formulae in disjunctive normal form (DNF). We examine this relationship further in the context of binary classification tasks. We show that with some assumptions on the parameters of a DNF representing the true classifica- tions, high similarity of an example to the training set can ensure correct classification. Work by Subasi et. al. and Morrow has investigated the use of binary similarity mea- sures for classification confidence. We use a different methodology for classifying and obtaining classification confidences using similarity measures, and report on experiments performed using these methods. We find that some of the similarity measures perform relatively well compared with standard classification algorithms, and others not so well. We show how the parameters of a particular binary similarity measure can be optimised to improve its performance. We also introduce a new DNF learning algorithm to try to improve on the well known ID3 algorithm, but find we cannot improve on it.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Thesis (PhD)
Additional Information:(Version uploaded is slightly revised from PhD thesis)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:8587
Deposited By:Martin Anthony
Deposited On:12 February 2012