Datasets of the Unsupervised and Transfer Learning Challenge
Classification problems are found in many application domains, including in pattern recognition (classification of images or videos, speech recognition), medical diagnosis, marketing (customer categorization), and text categorization (filtering of spam). The category identifiers are referred to as "labels". Predictive models capable of classifying new instances (correctly predicting the labels) usually require “training” (parameter adjustment) using large amounts of labeled training data (pairs of examples of instances and associated labels). Unfortunately, few labeled training data may be available due to the cost or burden of manually annotating data. Recent research has been focusing on making use of the vast amounts of unlabeled data available at low cost including: space transformations, dimensionality reduction, hierarchical feature representations ("deep learning"), and kernel learning. However, these advances tend to be ignored by practitioners who continue using a handful of popular algorithms like PCA, ICA, k-means, and hierarchical clustering. The goal of this challenge was to perform an evaluation of unsupervised and transfer learning algorithms free of inventor bias to help to identify and popularize algorithms that have advanced the state of the art.
Five datasets from various domains are made available. The participants had to submit on-line transformed data representations (or similarity/kernel matrices) on a validation set and a final evaluation set in a prescribed format. The data representations (or similarity/kernel matrices) were evaluated by the organizers on supervised learning tasks unknown to the participants. The results on the validation set were displayed on the learderboard to provide immediate feed-back. The results on the final evaluation set were revealed only at the end of the challenge. To emphasize the capability of the learning systems to develop useful abstractions, the supervised learning tasks used to evaluate them make use of very few labeled training examples and the classifier used is a simple linear discriminant classifier. The platform remains open for post-challenge submissions: http://clopinet.com/ul.