What is the dimension of your binary data?
Nikolaj Tatti, Taneli Mielikäinen, Aristides Gionis and Heikki Mannila
In: ICDM 2006, 18-22 Dec 2006, Hon Kong, China.
Many 0/1 datasets have a very large number of variables;
however, they are sparse and the dependency structure
of the variables is simpler than the number of variables
would suggest. Defining the effective dimensionality
of such a dataset is a nontrivial problem. We consider the
problem of defining a robust measure of dimension for 0/1
datasets, and show that the basic idea of fractal dimension
can be adapted for binary data. However, as such the fractal
dimension is difficult to interpret. Hence we introduce
the concept of normalized fractal dimension. For a dataset
D, its normalized fractal dimension counts the number of
independent columns needed to achieve the unnormalized
fractal dimension of D. The normalized fractal dimension
measures the degree of dependency structure of the data. We
study the properties of the normalized fractal dimension and
discuss its computation. We give empirical results on the
normalized fractal dimension, comparing it against PCA.