Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database
Isabelle Guyon, Vincent Lemaire, Marc Boullé, Gideon Dror and David Vogel
We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of
modern marketing strategies. The KDD Cup 2009 oﬀered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy
upgrades or add-ons proposed to them to make the sale more proﬁtable (up-selling). The challenge started on March 10, 2009 and ended on May 11, 2009. This challenge attracted over 450 participants from 46 countries. We attribute the popularity of the challenge to several factors: (1) A generic problem relevant to the Industry (a classiﬁcation problem), but presenting a number of scientiﬁc and technical challenges of practical interest including: a large number of training examples (50,000) with a large number of missing values (about 60%) and a large number of features (15,000), unbalanced class proportions (fewer than 10% of the examples of the positive class), noisy data, presence of categorical variables with many diﬀerent values. (2) Prizes (Orange oﬀered 10,000 Euros in prizes). (3) A well designed protocol and web site (we beneﬁtted from past experience). (4) An eﬀective advertising campaign using mailings and a teleconference to answer potential participants
questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very eﬀective and that ensemble of decision trees oﬀer oﬀ-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at http://www.kddcup- orange.com/.