Soft Failure Detection using Factorial Hidden Markov Models
Guillaume Bouchard and Jean-Marc Andreoli
In: ICMLA 2007, 13-15 Dec 2007, Cincinnati, US.
In modern business, educational, and other settings, it is common to provide a digital network that interconnects hardware devices for shared access by the users (e.g., in an office where printers are available for use by all the office workers). In such a context, so-called ``soft'' failures, where a device silently starts working in degraded mode, may easily go un-noticed for a long time, resulting in potential productivity loss. It is therefore advantageous to enable system administrators to identify soft failures at an early stage. We propose here a probabilistic method using variational inference on a factorial hidden Markov model to automatically discover soft failures, based on the analysis of simple usage information which is normally logged by the network infrastructure. We propose to mine these logs in order to discover statistically significant deviations in the usage behavior of the overall infrastructure, and we identify such deviations with soft failures, or, in any case, situations of interest to an administrator.