Learning to Integrate Data from Different Sources and Tasks
PhD thesis, University College London.
Supervised learning aims at developing models with good generalization properties using input/output empirical data. Methods which use linear functions and especially kernel methods, such as ridge regression, support vector machines and logistic regression, have been extensively applied for this purpose. The first question we study deals with selecting kernels appropriate for a specific supervised task. To this end we formulate a methodology for learning combinations of prescribed basic kernels, which can be applied to a variety of kernel methods. Unlike previous approaches, it can address cases in which the set of basic kernels is infinite and even uncountable, like the set of all Gaussian kernels. We also propose an algorithm which is conceptually simple and is based on existing kernel methods. Secondly, we address the problem of learning common feature representations across multiple tasks. It has been empirically and theoretically shown that, when different tasks are related, it is possible to exploit task relatedness in order to improve prediction on each task - as opposed to treating each task in isolation. We propose a framework which is based on learning a common set of features jointly for a number of tasks. This framework favors sparse solutions, in the sense that only a small number of features are involved. We show that the problem can be reformulated as a convex one and can be solved with a conceptually simple alternating algorithm, which is guaranteed to converge to an optimal solution. Moreover, the formulation and algorithm we propose can be phrased in terms of kernels and hence can incorporate nonlinear feature maps. Finally, we connect the two main questions explored in this thesis by demonstrating the analogy between learning combinations of kernels and learning common feature representations across multiple tasks.