|
MACHINE LEARNING ON SETS OF DOCUMENTS CONNECTED IN GRAPHS AbstractThis paper deals with the problem of machine learning on sets of documents connected into graphs. Our strategy is to represent each document by a diverse set of heterogeneous attributes, including traditional binary and categorical attributes, textual attributes, and attributes derived from the graphs. We present experiments on two datasets, showing the usefulness of graph-based attributes and the importance of weighting the different attributes suitably before learning. On the download estimation task, the approach presented here achieved the best results on the KDD Cup 2003 challenge.
[Edit] |