PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An accurate and interpretable model for siRNA efficacy prediction
Jean-Philippe Vert, Nicolas Foveau, Christian Lajaunie and Yves Vandenbrouck
BMC Bioinformatics Volume 7, Number 520, 2006.


Background The use of exogenous small interfering RNAs (siRNAs) for gene silencing has quickly become a widespread molecular tool providing a powerful means for gene functional study and new drug target identification. Although considerable progress has been made recently in understanding how the RNAi pathway mediates gene silencing, the design of potent siRNAs remains challenging. Results We propose a simple linear model combining basic features of siRNA sequences for siRNA efficacy prediction. Trained and tested on a large dataset of siRNA sequences made recently available, it performs as well as more complex state-of-the-art models in terms of potency prediction accuracy, with the advantage of being directly interpretable. The analysis of this linear model allows us to detect and quantify the effect of nucleotide preferences at particular positions, including previously known and new observations. We also detect and quantify a strong propensity of potent siRNAs to contain short asymmetric motifs in their sequence, and show that, surprisingly, these motifs alone contain at least as much relevant information for potency prediction as the nucleotide preferences for particular positions. Conclusion The model proposed for prediction of siRNA potency is as accurate as a state-of-the-art nonlinear model and is easily interpretable in terms of biological features. It is freely available on the web at

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:3237
Deposited By:Jean-Philippe Vert
Deposited On:29 January 2008