Hands-on Pattern Recognition: Challenges in Machine Learning, volume 1
Isabelle Guyon, Gavin Cawley, Gideon Dror and Amir Saffari, ed.
Challenges in Machine Learning
, Volume 1
, Brookline, MA
Recently organized competitions have been instrumental in pushing the state-of-the-art in
machine learning, establishing benchmarks to fairly evaluate methods, and identifying
techniques, which really work.
This book harvests three years of effort of hundreds of researchers who have participated to
three competitions we organized around ﬁve datasets from various application domains. Three
aspects were explored:
• Data representation.
• Model selection.
• Performance prediction.
With the proper data representation, learning becomes almost trivial. For the defenders of fully
automated data processing, the search for better data representations is just part of learning.
At the other end of the spectrum, domain specialists engineer data representations, which are
tailored to particular applications. The results of the “Agnostic Learning vs. Prior Knowledge”
challenge are discussed in the book, including longer versions of the best papers from the IJCNN
2007 workshop on “Data Representation Discovery” where the best competitors presented their
Given a family of models with adjustable parameters, Machine Learning provides us with
means of “learning from examples” and obtaining a good predictive model. The problem be-
comes more arduous when the family of models possesses so-called hyper-parameters or when
it consists of heterogenous entities (e.g. linear models, neural networks, classiﬁcation and re-
gression trees, kernel methods, etc.) Both practical and theoretical considerations may yield to
split the problem into multiple levels of inference. Typically, at the lower level, the parame-
ters of individual models are optimized and at the second level the best model is selected, e.g.
via cross-validation. This problem is often referred to as model selection. The results of the
“Model Selection Game” are included in this book as well as the best papers of the NIPS 2006
“Multi-level Inference” workshop.
In most real world situations, it is not sufﬁcient to provide a good predictor, it is important to
assess accurately how well this predictor will perform on new unseen data. Before deploying a
model in the ﬁeld, one must know whether it will meet the speciﬁcations or whether one should
invest more time and resources to collect additional data and/or develop more sophisticated
models. The performance prediction challenge asked participants to provide prediction results
on new unseen test data AND to predict how good these predictions were going to be on a test set
for which they did not know the labels ahead of time. Therefore, participants had to design both
a good predictive model and a good performance estimator. The results of the “Performance
Prediction Challenge” and the best papers of the “WCCI 2006 workshop of model selection”
will be included in the book.
A selection of the special topic of JMLR on model selection, including longer contributions
of the best challenge participants, are also reprinted in the book.