Glycan classification with tree kernels
Motivation: Glycans are covalent assemblies of sugar that play crucial roles in many cellular processes. Recently, comprehensive data about the structure and function of glycans have been accumulated, therefore the need for methods and algorithms to analyze these data is growing fast. Results: This article presents novel methods for classifying glycans and detecting discriminative glycan motifs with support vector machines (SVM). We propose a new class of tree kernels to measure the similarity between glycans. These kernels are based on the comparison of tree substructures, and take into account several glycan features such as the sugar type, the sugar bound type or layer depth. The proposed methods are tested on their ability to classify human glycans into four blood components: leukemia cells, erythrocytes, plasma and serum. They are shown to outperform a previously published method. We also applied a feature selection approach to extract glycan motifs which are characteristic of each blood component. We confirmed that some leukemia-specific glycan motifs detected by our method corresponded to several results in the literature.