## AbstractState-of-the-art graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we attempt to rectify this situation. We compare graphs by counting common {\it graphlets}, \ie subgraphs with k nodes where k \in { 3, 4, 5 }. Exhaustive enumeration of all graphlets is prohibitively expensive, scaling as O(n^k), where n is the number of nodes. We propose two theoretically grounded speedups. First, by bounding the deviation of the empirical estimates of the distribution of these graphlets from their true distribution, we show that it suffices to sample a fixed number of graphlets, independent of the size of the input graphs. Second, we show that for graphs of bounded degree, as is often the case with large sparse real-world graphs, exhaustively enumerating graphlets takes only O(nd^{k-1}) time, where d << n is the maximum degree. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.
[Edit] |