Efficient peptide-MHC-I binding prediction for alleles with few known binders
Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods in particular are widely used to score candidate binders based on their similarity with known binders and non-binders. The genes coding for the MHC molecules, however, are highly polymorphic, and statistical methods have difficulties building models for alleles with few known binders. In this context, recent work has demonstrated the utility of leveraging information across alleles to improve the performance of the prediction. Results: We design a support vector machine algorithmthat is able to learn peptide-MHC-I binding models for many alleles simultaneously, by sharing binding information across alleles. The sharing of information is controlled by a user-defined measure of similarity between alleles. We show that this similarity can be defined in terms of supertypes, or more directly by comparing key residues known to play a role in the peptide-MHC binding. We illustrate the potential of this approach on various benchmark experiments where it outperforms other state-of-the-art methods.