Algorithms for Inﬁnitely Many-Armed Bandits
Yizao Wang, Rémi Munos and Jean-Yves Audibert
In: Neural Information Processig Systems, Vancouver(2008).
We consider multi-armed bandit problems where the number of arms is larger
than the possible number of experiments. We make a stochastic assumption on
the mean-reward of a new selected arm which characterizes its probability of be-
ing a near-optimal arm. Our assumption is weaker than in previous works. We
describe algorithms based on upper-conﬁdence-bounds applied to a restricted set
of randomly selected arms and provide upper-bounds on the resulting expected
regret. We also derive a lower-bound which matches (up to a logarithmic factor)
the upper-bound in some cases.