Frequent Subgraph Retrieval in Geometric Graph Databases
Sebastian Nowozin and Koji Tsuda
Max Planck Institute for Biological Cybernetics, Tuebingen, Germany.
Discovery of knowledge from geometric graph databases is of particular
importance in chemistry and biology, because chemical compounds and proteins
are represented as graphs with 3D geometric coordinates.
In such applications, scientists are not interested in the statistics of
the whole database. Instead they need information about a novel drug candidate
or protein at hand, represented as a query graph. We propose a
polynomial-delay algorithm for geometric frequent subgraph retrieval.
It enumerates all subgraphs of a single given query graph which are
frequent geometric epsilon-subgraphs under the entire class of rigid
geometric transformations in a database. By using geometric
epsilon-subgraphs, we achieve tolerance against variations in geometry.
We compare the proposed algorithm to gSpan on chemical compound data, and we
show that for a given minimum support the total number of frequent patterns is
substantially limited by requiring geometric matching. Although the
computation time per pattern is larger than for non-geometric graph mining,
the total time is within a reasonable level even for small minimum support.