PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Modelling the jobs of a Grid System
Xiangliang Zhang, Michele Sebag and Cecile Germain
In: RFIA 2008, January 22-25, 2008, Amiens, France.

Abstract

The rise of grid systems, made of a large number of heterogeneous resources, motivated the highly challenging field of Autonomic Computing, aimed at the self-management of such complex systems. A preliminary step, this paper is interested in modeling the jobs submitted to the grid and discovering meaningful job categories, beyond the coarse distinction between "successfully executed" and "failed" jobs. The difficulty lies in the huge size of the available observations (the Logs of the grid) and their heterogeneity, severely hindering Machine Learning algorithms at the state of the art. This difficulty is addressed through an original 3-step process: i) the data are firstly sliced into (more) homogeneous subsets, where a data slice involves jobs submitted by a single user, or during a single period of time; ii) supervised ML algorithms are used to construct discriminant hypotheses on each data slice; iii) these hypotheses are used to map the dataset onto a metric space, and thus enable clustering. The approach is validated from the cluster stability w.r.t. the supervised learning step in ii), and the ``natural'' interpretation of the clusters after the expert.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:3686
Deposited By:Xiangliang Zhang
Deposited On:14 February 2008