PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Relaxed regret minimization under average cost constraints
Andrey Bernstein, Shie Mannor and Nahum Shimkin
In: The 23rd Annual Conference on Learning Theory (COLT 2010), 27 - 29 Jun 2010, Haifa, Israel.

Abstract

We consider the online decision problem where an agent interacts with unpredictable and possibly adversarial environment. The goal of the agent is to maximize his long-term average reward subject to long-term average cost constraints. As is well known, without constraints, there exist a number of online algorithms that have the no-regret property, in the sense that they guarantee a long-term average reward as high as could be achieved by any fixed action of the agent, given the observed sequence of the environment's actions. We refer to the latter as the best-response envelope. In the constrained setting, we propose a relaxed form of the best-response envelope as the reference level for a no-regret algorithm. This relaxed best-response envelope incorporates a vector of relaxation parameters; we characterize the minimal value of this parameters vector which ensures that the relaxed best-response envelope is attainable, while satisfying long-term cost constraints. A computationally feasible algorithm, Constrained Regret Matching (CRM), is proposed and analyzed. In addition, an adaptive variant of the CRM algorithm is introduced, which tunes the relaxation parameters according to the observed actions of the environment.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5857
Deposited By:Andrey Bernstein
Deposited On:08 March 2010