Automatic Exploration of Datacenter Performance Regimes
Peter Bodik, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan and David A. Patterson
In: Workshop on Automated Control for Datacenters and Clouds(2009).
Horizontally scalable Internet services present an opportunity to use automatic resource allocation strategies for system manage- ment in the datacenter. In most of the previous work, a controller employs a performance model of the system to make decisions about the optimal allocation of resources. However, these mod- els are usually trained offline or on a small-scale deployment and will not accurately capture the performance of the controlled ap- plication. To achieve accurate control of the web application, the models need to be trained directly on the production system and adapted to changes in workload and performance of the application. In this paper we propose to train the performance model using an exploration policy that quickly collects data from different performance regimes of the application. The goal of our approach for managing the exploration process is to strike a balance between not violating the performance SLAs and the need to collect suffi- cient data to train an accurate performance model, which requires pushing the system close to its capacity. We show that by using our exploration policy, we can train a performance model of a Web 2.0 application in less than an hour and then immediately use the model in a resource allocation controller.