PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Kernels for gene regulatory regions
Jean-Philippe Vert, Robert Thurman and William S. Noble
In: NIPS 2005, 5-8 Dec 2005, Vancouver, Canada.


We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes.

Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
Deposited On:28 November 2005