Primal-Dual Kernel Machines
PhD thesis, K.U.Leuven, Belgium.
This text presents a structured overview of recent advances in the research on machine learning and kernel machines. The general objective is the formulation and study of a broad methodology assisting the user in making decisions and predictions based on collections of observations in a number of complex tasks. The research issues are directly motivated by a number of questions of direct concern to the user. The proposed approaches are mainly studied in the context of convex optimization. The two main messages of the dissertation can be summarized as follows. At first the structure of the text reflects the observation that the problem of designing a good machine learning problem is intertwined with the question of regularization and kernel design. Those three different issues cannot be considered independently, and their relation can be studied consistently using tools of optimization theory. Furthermore, the problem of automatic model selection fused with model training is approached from an optimization point of view. It is argued that the joint problem can be written as an hierarchical programming problem which contrasts with other approaches of multiobjective programming problems. This viewpoint results in a number of formulations where one performs model training and model selection at the same time by solving a (convex) programming problem. We refer to such formulations as to fusion of training and model selection. Its relation to the use of appropriate regularization schemes is
Secondly, the thesis argues that the use of the primal-dual argument which originatesfrom the theory on convex optimization constitutes a powerfull building block for designing appropriate kernel machines. This statement is largely motivated by the elaboration of new leaning machines incorporating prior knowlege known from the problem under study. Structure as additive models, semi-parameteric models, model symmetries and noise coloring schemes turn out to be related closely to the design of the kernel. Prior knowledge in the form of pointwise inequalities, occurence of known censoring mechanisms and a known noise level can be incorporated into an appriate learning machine easily using the primal-dual argument. This approach is related and contrasted to other commonly encountered techniques as smoothing splines, Guassian
processes, wavelet methods and others. A related important step is the definition and study of the relevance of the measure of maximal variation which can be used to obtain an efficient way for detecting structure in the data and handling missing values.
The text is glued together to a consistent story by the addition of new results, including the formulation of new learning machines (e.g. the Support Vector Tube), study of new advanced regularization schemes (e.g. alternative least squares), investigation of the relation of the kernel design with model formulations and results in signalprocessing and system identification (e.g. the relation of kernels with Fourier and wavelet decompositions). This results in a data-driven way to design an appropriate kernel for the learning machine based on the correlation measured in the data.