PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Statistical Challenges of High-dimensional Data
Iain M. Johnstone and Mike Titterington
Philosophical Transaction of the Royal Society A Volume 367, Number 1906, pp. 4237-4253, 2009.


Modern applications of statistical theory and methods can involve extremely large data-sets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this theme issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:5761
Deposited By:Mike Titterington
Deposited On:08 March 2010