PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Cost-sensitive parsimonious linear regression
Robby Goetschalckx, Scott Sanner and Kurt Driessens
In: In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM-08), 15-19 Dec 2008, Pisa, Italy.


We examine linear regression problems where the features may only be observable at some cost (e.g., in medical or financial domains where features may correspond to diagnostic tests or information-gathering that takes time and costs money). To do this, we define a \emph{parsimonious} linear regression objective criterion that jointly minimizes prediction error and feature cost, assuming they can be expressed in commensurable units. Formally, this objective results in an unconstrained non-convex optimization problem that can be recast as a mixed 0-1 integer quadratic program (MIQP). While this MIQP can be solved using off-the-shelf software, such approaches typically cannot scale to large numbers of features. Noting that a linear regression model in this setting will induce a feature cost for all features having non-zero weights, we are able to modify least angle regression algorithms commonly used for sparse linear regression (with non-costly features) to produce the ParLiR algorithm. ParLiR not only provides an efficient and parsimonious solution to linear regression with costly features as we demonstrate empirically, but it also provides formal guarantees on parsimony that we prove theoretically.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:5277
Deposited By:Scott Sanner
Deposited On:24 March 2009