ABSTRACT
While much of the research on transaction processing has focused on improving overall performance in terms of throughput and mean latency, surprisingly less attention has been given to performance predictability: how often individual transactions exhibit execution latency far from the mean. Performance predictability is increasingly important when transactions lie on the critical path of latency-sensitive applications, enterprise software, or interactive web services.
In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today's transactional databases. We conduct the first quantitative study of major sources of variance in MySQL, Postgres (two of the largest and most popular open-source products on the market), and VoltDB (a non-conventional database). We carry out our study with a tool called TProfiler that, given the source code of a database system and programmer annotations indicating the start and end of a transaction, is able to identify the dominant sources of variance in transaction latency. Based on our findings, we investigate alternative algorithms, implementations, and tuning strategies to reduce latency variance without compromising mean latency or throughput. Most notably, we propose a new lock scheduling algorithm, called Variance-Aware Transaction Scheduling (VATS), and a lazy buffer pool replacement policy. In particular, our modified MySQL exhibits significantly lower variance and 99th percentile latencies by up to 5.6× and 6.3×, respectively. Our proposal has been welcomed by the open-source community, and our VATS algorithm has already been adopted as of MySQL's 5.7.17 release (and been made the default scheduling policy in MariaDB).
- 15.5.2 InnoDB Transaction Model. http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html.Google Scholar
- Google Cloud SQL. http://code.google.com/apis/sql.Google Scholar
- MariaDB source repository. https://github.com/MariaDB/server/pull/248.Google Scholar
- Oracle database cloud service. http://cloud.oracle.com.Google Scholar
- Program2 Two-Phase Locking (2PL) vs. Timestamp Ordering (TSO) vs. A Real Protocol. http://cobweb.cs.uga.edu/shasha/course/csci8370/prog2/prog2.pdf.Google Scholar
- XtraDB Performance Improvements for I/O-Bound Highly-Concurrent Workloads. https://www.percona.com/doc/percona-server/5.7/performance/xtradb performance improvements for io-bound highly-concurrent workloads.html.Google Scholar
- SQL Server Locking and You! https://www.brentozar.com/archive/2011/06/sql-server-locking/, 2011.Google Scholar
- MT LRU flusher. https://blueprints.launchpad.net/percona-server/+spec/mt-lru, 2016.Google Scholar
- MySQL Transaction Lock Manager Source Code. https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/lock/lock0lock.cc, 2016.Google Scholar
- Parallel doublewrite buffer. https://blueprints.launchpad.net/percona-server/+spec/parallel-doublewrite, 2016.Google Scholar
- Percona Server 5.7: multi-threaded LRU flushing. https://www.percona.com/blog/2016/05/05/percona-server-5--7-multi-threaded-lru-flushing/, 2016.Google Scholar
- Postgre Transaction Lock Manager Source Code. https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/proc.c, 2016.Google Scholar
- R. K. Abbott et al. Scheduling real-time transactions: A performance evaluation. TODS, 1992. Google ScholarDigital Library
- R. Agrawal. A parallel logging algorithm for multiprocessor database machines. In Database Machines. 1985.Google ScholarCross Ref
- R. Agrawal and D. J. DeWitt. Recovery architectures for multiprocessor database machines. ACM, 1985.Google ScholarDigital Library
- M. Armbrust, K. Curtis, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. Piql: Success-tolerant query processing in the cloud. PVLDB, 5, 2011. Google ScholarDigital Library
- M. Armbrust, E. Liang, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. Generalized scale independence through incremental precomputation. In SIGMOD, 2013. Google ScholarDigital Library
- J. Arulraj, A. Pavlo, and S. R. Dulloor. Let's talk about storage; recovery methods for non-volatile memory database systems. In SIGMOD, 2015. Google ScholarDigital Library
- S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The datapath system: a data-centric analytic processing engine for large data warehouses. In SIGMOD, 2010. Google ScholarDigital Library
- Attariyan et al. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In OSDI, 2012. Google ScholarDigital Library
- P. D. Bailis. Coordination Avoidance in Distributed Databases. PhD thesis, UC Berkeley, 2015.Google Scholar
- Bector et al. V-shape property of optimal sequence of jobs about a common due date on a single machine. Computers & operations research, 1989.Google Scholar
- A. R. Bernat and B. P. Miller. Incremental call-path profiling. Concurrency and Computation: Practice and Experience, 2007.Google ScholarCross Ref
- Bhatia et al. Lightweight, high-resolution monitoring for troubleshooting production systems. In OSDI, 2008. Google ScholarDigital Library
- G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2009. Google ScholarDigital Library
- S. Chaudhuri, H. Lee, and V. R. Narasayya. Variance aware optimization of parameterized queries. In SIGMOD, 2010. Google ScholarDigital Library
- Chen et al. Sequencing heuristic for bicriteria scheduling in a single machine problem. Journal of Information and Optimization Sciences, 2006.Google Scholar
- S. Chen. Flashlogging: exploiting flash devices for synchronous logging performance. In SIGMOD, 2009. Google ScholarDigital Library
- F. Chu et al. Least expected cost query optimization: An exercise in utility. In PODS, 1999. Google ScholarDigital Library
- B. F. Cooper et al. Benchmarking cloud serving systems with ycsb. In SoCC, 2010. Google ScholarDigital Library
- P. De et al. On the minimization of completion time variance with a bicriteria extension. Operations Research, 1992.Google Scholar
- D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for bench-marking relational databases. PVLDB, 7, 2013. Google ScholarDigital Library
- S. Eilon and I. Chowdhury. Minimising waiting time variance in the single machine problem. Management Science, 1977. Google ScholarDigital Library
- D. Florescu and D. Kossmann. Rethinking cost and performance of database systems. Sigmod Record, 2009. Google ScholarDigital Library
- G. Giannikis et al. Shareddb: killing one thousand queries with one stone. PVLDB, 2012. Google ScholarDigital Library
- S. L. Graham et al. Gprof: A call graph execution profiler. In Sigplan Notices, 1982. Google ScholarDigital Library
- B. Gregg. DTrace pid Provider return. http://tinyurl.com/jzpphne, 2011.Google Scholar
- R. J. Hall. Call path profiling. In ICSE, 1992. Google ScholarDigital Library
- S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stone-braker. OLTP through the looking glass, and what we found there. In SIGMOD, 2008. Google ScholarDigital Library
- S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: a simultaneously pipelined relational query engine. In SIGMOD, 2005. Google ScholarDigital Library
- P. Helland, H. Sammer, J. Lyon, R. Carr, P. Garrett, and A. Reuter. Group commit timers and high volume transaction systems. In HPTS. 1989. Google ScholarDigital Library
- J. Huang, B. Mozafari, and T. Wenisch. Statistical analysis of latency through semantic profiling. In EuroSys, 2017. Google ScholarDigital Library
- Y.-K. Kim and S. H. Son. Supporting predictability in real-time database systems. In RTAS, 1996. Google ScholarDigital Library
- W. Kubiak. Completion time variance minimization on a single machine is difficult. Operations Research Letters, 1993. Google ScholarDigital Library
- W. Kubiak. New results on the completion time variance minimization. Discrete Applied Mathematics, 1995. Google ScholarDigital Library
- W. Kubiak et al. Fast fully polynomial approximation schemes for minimizing completion time variance. Eur. Journal of Operational Research, 2002.Google Scholar
- T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle timesten: An in-memory database for enterprise applications. IEEE Data Eng. Bull., 2013.Google Scholar
- P. Massa and P. Avesani. An experimental study on epinions.com community. In NCAI, 2005.Google Scholar
- B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013. Google ScholarDigital Library
- B. Mozafari, C. Curino, and S. Madden. DBSeer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.Google Scholar
- B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015. Google ScholarDigital Library
- B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. Cliffguard: An extended report. Technical report, University of Michigan, Ann Arbor, 2015.Google ScholarDigital Library
- V. Narasayya, I. Menache, M. Singh, F. Li, M. Syamala, and S. Chaudhuri. Sharing buffer pool memory in multitenant relational database-as-a-service. PVLDB, 2015. Google ScholarDigital Library
- V. R. Narasayya, S. Das, M. Syamala, B. Chandramouli, and S. Chaudhuri. SQLVM: performance isolation in multi-tenant relational database-as-a-service. In CIDR, 2013. Google ScholarDigital Library
- P. O'Neil et al. A two-phase approach to predictably scheduling real-time transactions., 1996.Google Scholar
- S. Pelley et al. Storage management in the nvram era. PVLDB, 2013. Google ScholarDigital Library
- M. Pinedo. Scheduling: theory, algorithms, and systems. Springer Science, 2012. Google ScholarDigital Library
- L. Qiao, V. Raman, F. Reiss, P. Haas, and G. Lohman. Main-memory scan sharing for multi-core cpus. PVLDB, 2008. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE, 2008. Google ScholarDigital Library
- M. Sadoghi et al. Making updates disk-i/o friendly using ssds. PVLDB, 2013. Google ScholarDigital Library
- J. M. Spivey. Fast, accurate call graph profiling.Software: Practice and Experience, 2004. Google ScholarDigital Library
- M. Stonebraker and A. Pavlo. The seats airline ticketingsystems benchmark.Google Scholar
- R. Strom and S. Yemini. Optimistic recovery in distribute systems. TODS, 1985. Google ScholarDigital Library
- Z. Szebenyi et al. Space-efficient time-series call-path profiling of parallel applications. In SC09, 2009. Google ScholarDigital Library
- P. Unterbrunner et al. Predictable performance for unpredictable workloads. PVLDB, 2009. Google ScholarDigital Library
- M. Wainwright. Chapter 2: Basic tail and concentration bounds. http://www.stat.berkeley.edu/~mjwain/stat210b/Chap2TailBounds Jan22 2015.pdf.Google Scholar
- T. Wang and R. Johnson. Scalable logging through emerging non-volatile memory. PVLDB, 2014. Google ScholarDigital Library
- A. Wolski. Tatp benchmark description, 2009.Google Scholar
- R. J. Yang et al. PTL: Partitioned logging for database storage on flash solid state drives. In WAIM. 2012.Google ScholarCross Ref
- N. Ye, X. Li, T. Farley, and X. Xu. Job scheduling methods for reducing waiting time variance. Computers & Operations Research, 2007. Google ScholarDigital Library
- D. Y. Yoon, B. Mozafari, and D. P. Brown. DBSeer: Pain-free database administration through workload intelligence. PVLDB, 2015. Google ScholarDigital Library
- D. Y. Yoon, N. Niu, and B. Mozafari. DBSherlock: A performance diagnostic tool for transactional databases. In SIGMOD, 2016. Google ScholarDigital Library
- X. Yu, G. Bezerra, A. Pavlo, S. Devadas, and M. Stone-braker. Staring into the abyss: An evaluation of concurrency control with one thousand cores. PVLDB, 2014 Google ScholarDigital Library
Index Terms
- A Top-Down Approach to Achieving Performance Predictability in Database Systems
Recommendations
Supporting predictability in real-time database systems
RTAS '96: Proceedings of the 2nd IEEE Real-Time Technology and Applications Symposium (RTAS '96)Real-time database systems (RTDBSs) have timing constraints in their specifications, such as read times, deadlines and other temporal constraints. In addition, RTDBSs must adapt to changes in the operating environment and guarantee the completion of ...
Performance issues in database systems
The performance of transaction processing systems is affected by contention for hardware as well as software resources (data objects). Software contention becomes prominent in database systems because concurrency control mechanism, which is used to ...
Database Concurrency Control in Multilevel Secure Database Management Systems
Concurrent execution of transactions in database management systems (DBMSs) may lead to contention for access to data, which in a multilevel secure DBMS (MLS/DBMS) may lead to insecurity. Security issues involved in database concurrency control for MLS/...
Comments