Convergence of the graph Laplacian: application to dimensionality estimation and image segmentation
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. We will present the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a nonuniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator. We will give two applications of these theoretical results. First, we provide a method to estimate the intrinsic dimensionality of a submanifold M in R^d from random samples. The procedure is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data and can quantify the influence of the extrinsic curvature of the manifold. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets. Secondly, our theoretical results can be used in transductive learning. This allows us to address the problem of segmenting an image into regions consistent with user-supplied seeds (e.g., a sparse set of broad brush strokes). Indeed, this task can be viewed as a statistical transductive inference, in which some pixels are already associated with given zones and the remaining ones need to be classified. Segmentation is modeled as the task of finding matting coefficients for unclassified pixels given known matting coefficients for seed pixels. The proposed algorithm is simple, and accurate, as demonstrated by qualitative results on natural images and a quantitative comparison with state-of-the-art methods on the Microsoft GrabCut segmentation database.