Domain Adaptation with Good Edit Similarities: A Sparse Way to Deal with Scaling and Rotation Problems in Image Classification
In many real-life applications, the available source training information is either too small or not representative enough of the underlying target test problem. In the past few years, a new line of machine learning research has been developed to overcome such awkward situations, called Domain Adaptation (DA), giving rise to many adaptation algorithms and theoretical results in the form of generalization bounds. In this paper, a novel contribution is proposed in the form of a DA algorithm dealing with string-structured data, inspired from the DA support vector machine (SVM) technique introduced in [Bruzzone et al, PAMI 2010]. To ensure the convergence of SVM-based learning, the similarity functions involved in the process must be valid kernels, i.e. positive semi-definite (PSD) and symmetric. However, in the string-based context that we are considering in this paper, this condition is often not satisfied. Indeed, it has been proven that most string similarity functions based on the edit distance are not PSD. To overcome this drawback, we make use in this paper of the new theory of learning with good similarity functions introduced by Balcan et al., which (i) does not require the use of a valid kernel to learn well and (ii) allows us to induce sparser models. We take advantage of this theoretical framework to propose a new DA algorithm using good edit similarity functions. Using a suitable string-representation of handwritten digits, we show that are our new algorithm is very efficient to deal with the scaling and rotation problems usually encountered in image classification.