Abstract
Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under simple declarative expressions. As a result, developers of these applications are increasingly abandoning relational databases in favor of imperative code written against distributed key/value stores, losing the many benefits of data independence in the process. Instead, we propose PIQL, a declarative language that also provides scale independence by calculating an upper bound on the number of key/value store operations that will be performed for any query. Coupled with a service level objective (SLO) compliance prediction model and PIQL's scalable database architecture, these bounds make it easy for developers to write success-tolerant applications that support an arbitrarily large number of users while still providing acceptable performance. In this paper, we present the PIQL query processing system and evaluate its scale independence on hundreds of machines using two benchmarks, TPC-W and SCADr.
- Apache HBase, 2011. http://hadoop.apache.org/hbase/.Google Scholar
- Google AppEngine: The Google Query Language, 2011. http://code.google.com/appengine/docs/python/datastore/.Google Scholar
- M. Armbrust et al. SCADS: Scale-Independent Storage for Social Computing Applications. In Proc. of CIDR, 2009.Google Scholar
- M. Armbrust et al. The Case for PIQL: a Performance Insightful Query Language. In Proc. of SoCC, pages 131--136, 2010. Google Scholar
- C. Arthur. Average twitter user has 126 followers... Guardian Technology Blog, June 2009.Google Scholar
- J. Baker et al. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proc. of CIDR, 2011.Google Scholar
- A. L. Ball. Are 5001 friends on Facebook one too many? New York Times, May 2010.Google Scholar
- M. Brantner, D. Florescu, D. A. Graf, D. Kossmann, and T. Kraska. Building a Database on S3. In Proc. of SIGMOD, 2008. Google Scholar
- M. J. Carey and D. Kossmann. On saying "Enough already!" in SQL. SIGMOD Rec., 26(2):219--230, 1997. Google Scholar
- F. Chang et al. Bigtable: A Distributed Storage System for Structured Data. In Proc. of OSDI, pages 205--218, 2006. Google Scholar
- R. Clifford. Distributed suffix trees. Journal of Discrete Algorithms, 2005. Combinatorial Pattern Matching (CPM) Special Issue.Google Scholar
- B. F. Cooper et al. PNUTS: Yahoo!'s Hosted Data Serving Platform. Proc. VLDB Endow., 1(2):1277--1288, 2008. Google Scholar
- G. DeCandia et al. Dynamo: Amazon's Highly Available Key-Value Store. SIGOPS, 41(6):205--220, 2007. Google Scholar
- R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. In Proc. of PODS, pages 102--113. ACM, 2001. Google Scholar
- A. Ganapathi et al. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In Proc. of ICDE, 2009. Google Scholar
- R. Goldman and J. Widom. WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In Proc. of. SIGMOD, pages 285--296, 2000. Google Scholar
- G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surv., 25(2):73--169, 1993. Google Scholar
- A. Hernando et al. Unravelling the size distribution of social groups with information theory in complex networks. European Physical Journal B, 2010.Google Scholar
- I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting Top-K Join Queries in Relational Databases. VLDB Journal, 13(3), 2004. Google Scholar
- I. F. Ilyas, G. Beskales, and M. A. Soliman. A Survey of Top-K Query Processing Techniques in Relational Database Systems. ACM Comput. Surv., 40(4):1--58, 2008. Google Scholar
- D. Kossmann, T. Kraska, and S. Loesing. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. In Proc. of SIGMOD, pages 579--590, 2010. Google Scholar
- T. Kraska. Building Database Applications in the Cloud. PhD thesis, ETH Zurich, 2010.Google Scholar
- R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of Nonrecursive Queries. In Proc. of VLDB, pages 128--137, 1986. Google Scholar
- A. Lakshman and P. Malik. Cassandra: Structured Storage System on a P2P Network. In Proc. of PODC, page 5, 2009. Google Scholar
- G. Rivlin. Wallflower at the Web Party. New York Times, October 15 2006.Google Scholar
- E. Schurman and J. Brutlag. Performance Related Changes and their User Impact. Presented at Velocity Web Performance and Operations Conference, June 2009.Google Scholar
- R. Scoble. The you-don't-need-more-friends lobby. http://scobleizer.com/2007/10/14/the-you-dont-need-more-friends-lobby/.Google Scholar
- B. Trushkowsky et al. The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements. In Proc. of FAST, 2011. Google Scholar
- B. Urgaonkar et al. An Analytical Model for Multi-tier Internet Services and Its Applications. June 2005.Google Scholar
- B. J. Watson et al. Probabilistic Performance Modeling of Virtualized Resource Allocation. In Proc. of ICAC, pages 99--108, 2010. Google Scholar
Index Terms
- PIQL: success-tolerant query processing in the cloud
Recommendations
The case for PIQL: a performance insightful query language
SoCC '10: Proceedings of the 1st ACM symposium on Cloud computingLarge-scale, user-facing applications are increasingly moving from relational databases to distributed key/value stores for high-request-rate, low-latency workloads. Often, this move is motivated not only by key/value stores' ability to scale simply by ...
PIQL: a performance insightful query language
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataLarge-scale websites are increasingly moving from relational databases to distributed key-value stores for high request rate, low latency workloads. Often this move is motivated not only by key-value stores' ability to scale simply by adding more ...
Comments