research-article

Staring into the abyss: an evaluation of concurrency control with one thousand cores

Authors:
Xiangyao Yu

MIT CSAIL

MIT CSAIL
View Profile

,
George Bezerra

MIT CSAIL

MIT CSAIL
View Profile

,
Andrew Pavlo

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Srinivas Devadas

MIT CSAIL

MIT CSAIL
View Profile

,
Michael Stonebraker

MIT CSAIL

MIT CSAIL
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 3pp 209–220https://doi.org/10.14778/2735508.2735511

Published:01 November 2014Publication History

Proceedings of the VLDB Endowment

Abstract

Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. This unprecedented level of on-chip parallelism introduces a new dimension to scalability that current database management systems (DBMSs) were not designed for. In particular, as the number of cores increases, the problem of concurrency control becomes extremely challenging. With hundreds of threads running in parallel, the complexity of coordinating competing accesses to data will likely diminish the gains from increased core counts.

To better understand just how unprepared current DBMSs are for future CPU architectures, we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores. Our analysis shows that all algorithms fail to scale to this magnitude but for different reasons. In each case, we identify fundamental bottlenecks that are independent of the particular database implementation and argue that even state-of-the-art DBMSs suffer from these limitations. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.

References

Intel brings supercomputing horsepower to big data analytics. http://intel.ly/18A03EM, November 2013.Google Scholar
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, pages 266--277, 1999. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2): 185--221, 1981. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Multiversion concurrency control - theory and algorithms. ACM Trans. Database Syst., 8(4): 465--483, Dec. 1983. Google ScholarDigital Library
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems, chapter 5. 1987. Google ScholarDigital Library
P. A. Bernstein, D. Shipman, and W. Wong. Formal aspects of serializability in database concurrency control. IEEE Transactions on Software Engineering, 5(3): 203--216, 1979. Google ScholarDigital Library
M. J. Carey, D. J. DeWitt, M. J. Franklin, N. E. Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. Tan, O. G. Tsatalos, et al. Shoring up persistent applications, volume 23. ACM, 1994. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC'10, pages 143--154. Google ScholarDigital Library
J. C. Corbett and et al. Spanner: Google's Globally-Distributed Database. In OSDI, pages 251--264, 2012. Google ScholarDigital Library
C. Dennl, D. Ziener, and J. Teich. On-the-fly composition of fpga-based sql query accelerators using a partially reconfigurable module library. In FCCM, pages 45--52, 2012. Google ScholarDigital Library
C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's memory-optimized OLTP engine. In SIGMOD, pages 1243--1254, 2013. Google ScholarDigital Library
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. The notions of consistency and predicate locks in a database system. Commun. ACM, 19(11): 624--633, Nov. 1976. Google ScholarDigital Library
J. Evans. jemalloc. http://canonware.com/jemalloc.Google Scholar
H. Garcia-Molina and K. Salem. Main memory database systems: An overview. IEEE Trans. on Knowl. and Data Eng., 4(6): 509--516, Dec. 1992. Google ScholarDigital Library
S. Ghemawat and P. Menage. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html.Google Scholar
J. Gray. Concurrency Control and Recovery in Database Systems, chapter Notes on data base operating systems, pages 393--481. Springer-Verlag, 1978. Google ScholarDigital Library
J. Gray. The transaction concept: Virtues and limitations. In VLDB, pages 144--154, 1981. Google ScholarDigital Library
J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly generating billion-record synthetic databases. SIGMOD, pages 243--252, 1994. Google ScholarDigital Library
J. N. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger. Modelling in data base management systems. chapter Granularity of locks and degrees of consistency in a shared data base, pages 365--393. 1976.Google Scholar
T. Haerder and A. Reuter. Principles of transaction-oriented database recovery. ACM Comput. Surv., 15(4): 287--317, Dec. 1983. Google ScholarDigital Library
S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. OLTP through the looking glass, and what we found there. In SIGMOD, pages 981--992, 2008. Google ScholarDigital Library
M. Heytens, S. Listgarten, M.-A. Neimat, and K. Wilkinson. Smallbase: A main-memory dbms for high-performance applications. Technical report, Hewlett-Packard Laboratories, 1995.Google Scholar
R. Johnson and I. Pandis. The bionic dbms is coming, but what will it look like? In CIDR, 2013.Google Scholar
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. EDBT, pages 24--35, 2009. Google ScholarDigital Library
R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki. Aether: a scalable approach to logging. Proc. VLDB Endow., 3(1--2): 681--692, 2010. Google ScholarDigital Library
H. Jung, H. Han, A. D. Fekete, G. Heiser, and H. Y. Yeom. A scalable lock manager for multicores. In SIGMOD, pages 73--84, 2013. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. Proc. VLDB Endow., 1(2): 1496--1499, 2008. Google ScholarDigital Library
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Trans. Database Syst., 6(2): 213--226, June 1981. Google ScholarDigital Library
P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. VLDB, 5(4): 298--309, Dec. 2011. Google ScholarDigital Library
J. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In HPCA, pages 1--12, 2010.Google ScholarCross Ref
D. L. Mills. Internet time synchronization: the network time protocol. Communications, IEEE Transactions on, 39(10): 1482--1493, 1991.Google ScholarCross Ref
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. Proc. VLDB Endow., 3: 928--939, September 2010. Google ScholarDigital Library
I. Pandis, P. Tözün, R. Johnson, and A. Ailamaki. PLP: Page Latch-free Shared-everything OLTP. Proc. VLDB Endow., 4(10): 610--621, July 2011. Google ScholarDigital Library
A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, pages 61--72, 2012. Google ScholarDigital Library
D. Porobic, I. Pandis, M. Branco, P. Tözün, and A. Ailamaki. OLTP on Hardware Islands. Proc. VLDB Endow., 5: 1447--1458, July 2012. Google ScholarDigital Library
K. Ren, A. Thomson, and D. J. Abadi. Lightweight locking for main memory database systems. In VLDB, pages 145--156, 2013. Google ScholarDigital Library
M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta. The impact of architectural trends on operating system performance. In SOSP, pages 285--298, 1995. Google ScholarDigital Library
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarDigital Library
S. S. Thakkar and M. Sweiger. Performance of an OLTP application on symmetry multiprocessor system. In ISCA, pages 228--238, 1990. Google ScholarDigital Library
The Transaction Processing Council. TPC-C Benchmark (Revision 5.9.0). http://www.tpc.org/tpcc/spec/tpcc_current.pdf, June 2007.Google Scholar
P. Tözün, B. Gold, and A. Ailamaki. OLTP in wonderland: where do cache misses come from in major OLTP components? In DaMoN, 2013.Google ScholarDigital Library
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In SOSP, 2013. Google ScholarDigital Library
A. Whitney, D. Shasha, and S. Apter. High Volume Transaction Processing Without Concurrency Control, Two Phase Commit, SQL or C++. In HPTS'97.Google Scholar
L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross. Q100: the architecture and design of a database processing unit. In ASPLOS, 2014. Google ScholarDigital Library

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 3
November 2014
144 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2014
Published in pvldb Volume 8, Issue 3
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 569
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Staring into the abyss: an evaluation of concurrency control with one thousand cores

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Staring into the abyss: an evaluation of concurrency control with one thousand cores

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Vectorizing Unstructured Mesh Computations for Many-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media