The p-norm generalization of the LMS algorithm for adaptive filtering
Jyrki Kivinen, Manfred Warmuth and Babak Hassibi
IEEE Transactions on Signal Processing
Recently much work has been done analyzing online machine learning
algorithms in a worst case setting,
where no probabilistic assumptions are made about the data.
This is analogous to the H-infinity setting used in
adaptive linear filtering.
Bregman divergences have become a standard tool
for analyzing online machine learning algorithms.
Using these divergences,
we motivate a generalization of the
the Least Mean Squared (LMS) algorithm.
The loss bounds for these so-called p-norm
algorithms involve other norms than the standard 2-norm.
The bounds can be significantly better
if a large proportion of the
input variables are irrelevant,
i.e., if the weight vector we are trying
to learn is sparse.
We also prove results for nonstationary targets.
We only know how to apply kernel methods
to the standard LMS algorithm (i.e., p=2).
However even in the general p-norm case we can handle generalized
linear models where the output of the system is a linear function
combined with a nonlinear transfer function
(e.g., the logistic sigmoid).