This is the latest version of this eprint.
In the on-line learning model the learner needs to make predictions sequentially, one after the other, and receives a reward or loss after each prediction. Typically, the learner receives some input before making a prediction. The goal of the learner is to maximize the accumulated rewards or minimize the accumulated losses, respectively.
Available Versions of this Item