Capturing Data Uncertainty in High-Volume Stream Processing
Yanlei Diao, Buduo Li, Anna Liu, Liping Peng, Charles Sutton, Thanh Tran and Michael Zink
In: Conference on Innovative Data Systems Research(2009).
We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous random variables such as many types of sensor data. To provide an end-to-end solution, our system employs probabilistic modeling and inference to generate uncertainty description for raw data, and then a suite of statistical techniques to capture changes of uncertainty as data propagates through query operators. To cope with high-volume streams, we explore advanced approximation techniques for both space and time efficiency. We are currently working with a group of scientists to evaluate our system using traces collected from real-world applications for hazardous weather monitoring and for object tracking and monitoring.