In this chapter, we introduce a new promising technique for query processing, online aggregation. Online aggregation is proposed based on the assumption that for some applications, the precise results are not always required. Instead, the approximate results can provide a good enough estimation. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. To generate the approximate result, online aggregation retrieves samples continuously from the database. The samples are streamed to the query engine for processing the query. The accuracy of the approximate result is described by a statistical model. Normally, the result is refined as more samples are obtained. The user can terminate the processing at any time, when he/she is satisfied with the quality of the result.
The performance of online aggregation relies on the sampling approach and estimation model. In this chapter, our discussion is focused on these two components. Besides introducing the basic principles of online aggregation, we also review some new applications built on top of it. We complete the chapter by discussing the challenges of online aggregation and some future directions.