PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Causation and Prediction Challenge: Challenges in Machine Learning, volume 2
Isabelle Guyon, Constantin Aliferis, Gregory Cooper, André Elisseeff, Jean-Philippe Pellet, Peter Spirtes and Alexander Statnikov, ed. (2011) Challenges in Machine Learning , Volume 2 . Microtome Publishing , Brookline, MA . ISBN 978-0-9719777-2-3


The Causality Workbench Team was founded in January 2007 with the objective of evaluating methods for solving causal problems. The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. Advancing the methodology for reliably determining causal relationships would therefore have an immediate and important impact, both economical and fundamental. The goal of determining causal relationships is to predict the consequences of given actions or manipulations. For instance, the effect of taking a drug on health status, or the effect of reducing taxes on the economy. This is fundamentally different from making predictions from observations. Observations imply no experimentation, no interventions on the system under study, whereas actions introduce a disruption in the natural functioning of the system. The canonical way of determining whether events are causally related is to conduct controlled experiments in which the system of interest is “manipulated” to verify hypothetical causal relationships. However, experimentation is often costly, infeasible or unethical. This has prompted a lot of recent research on learning causal relationships from available observational data. These methods can unravel causal relationships to a certain extent, but must generally be complemented by experimentation. The need for assisting policy making and the availability of massive amounts of “observa- tional” data triggered a proliferation of proposed causal discovery techniques. Each scientific discipline has its favorite approach (e.g. Bayesian networks in biology and structural equation modeling in social sciences, not necessarily reflecting better match of techniques to domains, but rather historical tradition. Standard benchmarks are needed to foster scientific progress, but the design of a good causal discovery benchmark platform, which is not biased in favor a par- ticular model or approach, is not trivial. To stimulate research in causal discovery, the Causality Workbench Team created a platform in the form of a web service, which will allow researchers to share problems and test methods. See This vol- ume gathers the material of the first causality challenge organized by the Causality Workbench Team for the World Congress in Artificial Intelligence (WCCI), June 3, 2008 in Hong-Kong. Most feature selection algorithms emanating from machine learning do not seek to model mech- anisms: they do not attempt to uncover cause-effect relationships between feature and target. This is justified because uncovering mechanisms is unnecessary for making good predictions in a purely observational setting. Usually the samples in both the training and tests sets are assumed to have been obtained by identically and independently sampling from the same “nat- ural” distribution. In contrast, in this challenge, we investigate a setting in which the training and test data are not necessarily identically distributed. For each task (e.g. REGED, SIDO, etc.), we have a single training set, but several test sets (associated with the dataset name, e.g. REGED0, REGED1, and REGED2). The training data come from a so-called “natural distri- bution”, and the test data in version zero of the task (e.g. REGED0) are also drawn from the same distribution. We call this test set “unmanipulated test set”. The test data from the two other versions of the task (REGED1 and REGED2) are “manipulated test sets” resulting from interventions of an external agent, which has “manipulated” some or all the variables in a cer- tain way. The effect of such manipulations is to disconnect the manipulated variables from their natural causes. This may affect the predictive power of a number of variables in the system, in- iii cluding the manipulated variables. Hence, to obtain optimum predictions of the target variable, feature selection strategies should take into account such manipulations. The book contains a collection of papers first published in JMLR W&CP, including a paper summarizing the results of the challenge and contributions of the top ranking entrants. We added in appendix fact sheets describing the methods used by participants and a technical report with details on the datasets. The book is complemented by a web site from which the datasets can be downloaded and post-challenge submissions can be made to benchmark new algorithms, see

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:9181
Deposited By:Isabelle Guyon
Deposited On:21 February 2012