Kernels: Regularization and optimization
Cheng Soon Ong
PhD thesis, Australian National University.
This thesis extends the paradigm of machine learning with kernels. This
paradigm is based on the idea of generalizing an inner product between vectors
to a similarity measure between objects. The kernel implicitly defines a
feature mapping between the space of objects and the space of functions,
called the reproducing kernel Hilbert space. There have been many successful
applications of positive semidefinite kernels in diverse fields.
Among the reasons for its success are a theoretically motivated regularization
method and efficient algorithms for optimizing the resulting problems.
Since the kernel has to effectively capture the domain knowledge in an
application, we study the problem of learning the kernel itself from training
data. The proposed solution is a kernel on the space of kernels itself, which
we called a hyperkernel. This provides a method for regularization via the
norm of the kernel. We show that for several machine learning tasks, such
as binary classification, regression and novelty detection, the resulting
optimization problem is a semidefinite program. We solve the corresponding
optimization problems using the same parameter settings across all problems,
and demonstrate that we have further automated machine learning methods.
We observe that the restriction for kernels to be positive semidefinite can be
removed. The non-positive kernels, called indefinite kernels, have
corresponding functional theory, and define reproducing kernel Krein spaces.
We derive machine learning problems with indefinite kernels and prove the
representer theorem as well as generalization error bounds.
We provide theoretical
and experimental evidence to support the idea of regularization by early
stopping of conjugate gradient type algorithms.
Conjugate gradient type algorithms are iterative methods that
generate solutions in Krylov
subspaces, and exhibit semi-convergence.
We analyse the sequence of Krylov subspaces that determine the associated
filter function on the spectrum of the inverse problem, and quantitatively
These algorithms are then used for machine learning with indefinite kernels.