PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Semi-Supervised Learning of Hierarchical Latent Trait Models for Data Visualisation
Ian Nabney, Y Sun, P Tino and A Kaban
IEEE Transaction on Knowledge and Data Engineering 2004.

Abstract

Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualisation of large high-dimensional real-valued data sets. In this paper, we propose a more general visualisation system by extending HGTM in 3 ways, which allow the user to visualise a wider range of datasets and better support the model development process. (i) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualise data of inherently discrete nature, e.g. collections of documents in a hierarchical manner. (ii) We give the user a choice of initialising the child plots of the current plot in either {\em interactive}, or {\em automatic} mode. In the interactive mode the user selects ``regions of interest'', whereas in the automatic mode an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualising large data sets. (iii) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:907
Deposited By:Dharmesh Maniyar
Deposited On:06 January 2005