Regularized output kernel methods for protein network inference
Prediction of a physical interaction between two proteins has been addressed in the context of supervised learning, unsupervised learning and more recently, semi-supervised learning using various sources of information (genomic, phylogenetic, protein localization and function). The problem can be seen as a kernel matrix completion task if one defines a kernel that encodes similarity between proteins as nodes in a graph or alternatively, as a binary supervised classification task where inputs are pairs of proteins. In this talk, we first make a review of existing works (matrix completion, SVM for pairs, metric learning, training set expansion), identifying the relevant features of each approach. Then we define the framework of output kernel regression (OKR) that uses the kernel trick in the output feature space. After recalling the results obtained so far with tree-based output kernel regression methods, we develop a new family of methods based on Kernel Ridge Regression that benefit from the use of kernels both in the input feature space and the output feature space. The main interest of such methods is that imposing various regularization constraints still leads to closed form solutions. We show especially how such an approach allows to handle unlabeled data in a transductive setting of the network inference problem and multiple networks in a multi-task like inference problem. New results on simulated data and yeast data illustrate the talk.