Learning exercise policies for american options
Options are important instruments in modern nance. In this paper, we investigate reinforcement learning (RL) methods| in particular, least-squares policy iteration (LSPI)|for the problem of learning exercise policies for American options. We develop nite-time bounds on the performance of the policy obtained with LSPI and compare LSPI and the tted Q-iteration algorithm (FQI) with the Longsta-Schwartz method (LSM), the standard least-squares Monte Carlo algorithm from the nance community. Our empirical results show that the exercise policies discovered by LSPI and FQI gain larger payo s than those discovered by LSM, on both real and synthetic data. Furthermore, we nd that for all methods the policies learned from real data generally gain similar payo s to the policies learned from simulated data. Our work shows that solution methods developed in machine learning can advance the state-of-the-art in an important and challenging application area, while demonstrating that computational nance remains a promising area for future applications of machine learning methods.