PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Predicting and understanding the stability of G-quadruplexes
Oliver Stegle, Linda Payet, Jean-Louis Mergny, David J.C. MacKay and Julian L. Huppert
Bioinformatics Volume 25, Number 12, i374-i1382, 2009.

Abstract

Motivation: G-quadruplexes are stable four-stranded guanine-rich structures that can form in DNA and RNA. They are an important component of human telomeres and play a role in the regulation of transcription and translation. The biological significance of a G-quadruplex is crucially linked with its thermodynamic stability. Hence the prediction of G-quadruplex stability is of vital interest. Results: In this article, we present a novel Bayesian prediction framework based on Gaussian process regression to determine the thermodynamic stability of previously unmeasured G-quadruplexes from the sequence information alone. We benchmark our approach on a large G-quadruplex dataset and compare our method to alternative approaches. Furthermore, we propose an active learning procedure which can be used to iteratively acquire data in an optimal fashion. Lastly, we demonstrate the usefulness of our procedure on a genome-wide study of quadruplexes in the human genome. Availability: A data table with the training sequences is available as supplementary material. Source code is available online at http://www.inference.phy.cam.ac.uk/os252/projects/quadruplexes

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:6463
Deposited By:Oliver Stegle
Deposited On:08 March 2010