Top

Published in:

2019 | OriginalPaper | Chapter

Avoiding Scalability Collapse by Restricting Concurrency

Authors : Dave Dice, Alex Kogan

Published in: Euro-Par 2019: Parallel Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Saturated locks often degrade the performance of a multithreaded application, leading to a so-called scalability collapse problem. This problem arises when a growing number of threads circulating through a saturated lock causes the overall application performance to fade or even drop abruptly. This problem is particularly (but not solely) acute on oversubscribed systems (systems with more threads than available hardware cores).

In this paper, we introduce GCR (generic concurrency restriction), a mechanism that aims to avoid the scalability collapse. GCR, designed as a generic, lock-agnostic wrapper, intercepts lock acquisition calls, and decides when threads would be allowed to proceed with the acquisition of the underlying lock. Furthermore, we present GCR-NUMA, a non-uniform memory access (NUMA)-aware extension of GCR, that strives to ensure that threads allowed to acquire the lock are those that run on the same socket.

The extensive evaluation that includes more than two dozen locks, three machines and three benchmarks shows that GCR brings substantial speedup (in many cases, up to three orders of magnitude) in case of contention and growing thread counts, while introducing nearly negligible slowdown when the underlying lock is not contended. GCR-NUMA brings even larger performance gains starting at even lighter lock contention.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Enabling Resilience in Asynchronous Many-Task Programming Models

next chapter Graph Coloring Using GPUs

We also discuss other waiting policies and their limitations later in the paper.

For the clarity of exposition, we assume sequential consistency. Our actual implementation uses memory fences as well as volatile keywords and padding (to avoid false sharing) where necessarily.

Afek, Y., Dice, D., Morrison, A.: Cache index-aware memory allocation. In: Proceedings of ACM ISMM, pp. 55–64 (2011)

Boyd-Wickizer, S., Kaashoek, M., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium (2012)

Chabbi, M., Fagan, M., Mellor-Crummey, J.: High performance locks for multi-level NUMA systems. In: Proceedings of the ACM PPoPP (2015)

Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (LIMO): controlled parallelism for improved efficiency. In: Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES) (2012)

Craig, T.: Building FIFO and priority-queueing spin locks from atomic swap. Technical report TR 93–02-02, University of Washington, Department of Computer Science (1993)

David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 33–48 (2013)

Dice, D.: Malthusian locks. In: Proceedings of ACM EuroSys, pp. 314–327 (2017)

Dice, D., Kogan, A.: Avoiding scalability collapse by restricting concurrency. CoRR (2019). https://arxiv.org/abs/1905.10818

Dice, D., Kogan, A.: Compact NUMA-aware locks. In: Proceedings of ACM EuroSys (2019)

10.

Dice, D., Marathe, V.J., Shavit, N.: Lock cohorting: a general technique for designing NUMA locks. ACM TOPC 1(2), 13 (2015)

11.

Eyerman, S., Eeckhout, L.: Modeling critical sections in Amdahl’s law and its implications for multicore design. In: Proceedings of ACM ISCA, pp. 362–370 (2010)CrossRef

12.

Kyoto Cabinet. http://fallabs.com/kyotocabinet

13.

Guiroux, H., Lachaize, R., Quéma, V.: Multicore locks: the case is not closed yet. In: Proceedings of USENIX ATC, pp. 649–662 (2016)

14.

He, B., Scherer, W.N., Scott, M.L.: Preemption adaptivity in time-published queue-based spin locks. In: Proceedings of High Performance Computing (HiPC), pp. 7–18 (2005)

15.

Heirman, W., Carlson, T., Van Craeynest, K., Hur, I., Jaleel, A., Eeckhout, L.: Undersubscribed threading on clustered cache architectures. In: Proceedings of IEEE HPCA (2014)

16.

Johnson, R., Athanassoulis, M., Stoica, R., Ailamaki, A.: A new look at the roles of spinning and blocking. In: Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). ACM (2009)

17.

Johnson, R., Stoica, R., Ailamaki, A., Mowry, T.C.: Decoupling contention management from scheduling. In: Proceedings of ACM ASPLOS, pp. 117–128 (2010)CrossRef

18.

Leveldb. https://github.com/google/leveldb

19.

Lim, B.H., Agarwal, A.: Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 253–294 (1993)CrossRef

20.

LiTL: Library for Transparent Lock interposition. https://github.com/multicore-locks/litl

21.

Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comp. Syst. 9(1), 21–65 (1991)CrossRef

22.

Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: dynamically determining number of threads via OS level monitoring. In: Proceedings of IEEE IISWC (2011)

23.

Radovic, Z., Hagersten, E.: Hierarchical backoff locks for nonuniform communication architectures. In: Proceedings of EEE HPCA, pp. 241–252 (2003)

24.

Raman, A., Kim, H., Oh, T., Lee, J.W., August, D.I.: Parallelism orchestration using DoPE: the degree of parallelism executive. In: Proceedings of ACM PLDI (2011)

25.

Yoo, R.M., Lee, H.H.S.: Adaptive transaction scheduling for transactional memory systems. In: Proceedings of ACM SPAA (2008)

Title: Avoiding Scalability Collapse by Restricting Concurrency
Authors: Dave Dice
Alex Kogan
Publisher: Springer International Publishing
Book: Euro-Par 2019: Parallel Processing
Print ISBN: 978-3-030-29399-4

Electronic ISBN: 978-3-030-29400-7

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-29400-7_26

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner