PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning shared and separate features from two related data sets using GPLVM's
Gayle Leen and Colin Fyfe
In: NIPS 2008 workshop: "Learning from Multiple Sources", 13 Dec 2008, Whistler, Canada.

Abstract

Dual source learning problems can be formulated as learning a joint representation of the data sources, where the shared information is represented in terms of a shared underlying process. However, there may be situations in which the shared information is not the only useful information, and interesting aspects of the data are not common to both data sets. Some useful features within one data set may not be present in the other and vice versa; this complementary property motivates the use of multiple data sources over single data sources which capture only one type of useful information. For instance, having two eyes (and two streams of visual data) allows us to gain a 3-D impression of the world. This ability of stereo vision combines both shared features and features private to each data stream to form a coherent representation of the world; common shifted features can be used in disparity estimation to infer depths of objects, while some features which may be seen in one view but not in the other, due to occlusions, can provide additional information about the scene. In this work, we present a probabilistic generative framework for analysing two sets of data, where the structure of each data set is represented in terms of a shared and private latent space. Explicitly modeling a private component for each data set avoids an oversimplified representation of the within-set variation such that the between-set variation can be modeled more accurately, as well as giving insight into potentially interesting features particular to a data set. Since two data sets may have a complex (possibly nonlinear) relationship, we use nonparametric Bayesian techniques - we define Gaussian process priors over the functions from latent to data spaces, such that each data set is modelled as a Gaussian Process Latent Variable Model (GPLVM) where the dependency structure is captured in terms of shared and private kernels.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4777
Deposited By:Gayle Leen
Deposited On:24 March 2009