PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Lp-Norm Multiple Kernel Learning
Marius Kloft
(2011) PhD thesis, Technische Universität Berlin.


The goal of machine learning is to learn unknown concepts from data. In real-world applications such as bioinformatics and computer vision, data frequently arises from multiple heterogeneous sources or is represented by various complementary views, the right choice|or even combination|of which being unknown. To this end, the multiple kernel learning (MKL) framework provides a mathematically sound solution. Previous approaches to learning with multiple kernels promote sparse kernel combinations to support interpretability and scalability. Unfortunately, classical approaches to learning with multiple kernels are rarely observed to outperform trivial baselines in practical applications. In this thesis, I approach learning with multiple kernels from a unifying view which shows previous works to be only particular instances of a much more general family of multi-kernel methods. To allow for more eective kernel mixtures, I have developed the `p-norm multiple kernel learning methodology, which, to sum it up, is both more ecient and more accurate than previous approaches to multiple kernel learning, as demonstrated on several data sets. In particular, I derive optimization algorithms that are much faster than the commonly used ones, allowing to deal with up to ten thousands of data points and thousands of kernels at the same time. Empirical applications of `p-norm MKL to diverse, challenging problems from the domains of bioinformatics and computer vision show that `p-norm MKL achieves accuracies that surpass the state-of- the-art. The proposed techniques are underpinned by deep foundations in the theory of learning: I prove tight lower and upper bounds on the local and global Rademacher complexities of the hypothesis class associated with `p-norm MKL, which yields excess risk bounds with fast convergence rates, thus being tighter than existing bounds for MKL, which only achieve slow convergence rates. I also connect the minimal values of the bounds with the soft sparsity of the underlying Bayes hypothesis, proving that for a large range of learning scenarios `p-norm MKL attains substantial stronger general- ization guarantees than classical approaches to learning with multiple kernels. Using a methodology based on the theoretical bounds, and exemplied by means of a controlled toy experiment, I investigate why MKL is eective in real applications. Data sets, source code and implementations of the algorithms, additional scripts for model selection, and further information are freely available online.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Thesis (PhD)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:9404
Deposited By:Marius Kloft
Deposited On:16 March 2012