Skip to main content
Top

2019 | OriginalPaper | Chapter

Avoiding Scalability Collapse by Restricting Concurrency

Authors : Dave Dice, Alex Kogan

Published in: Euro-Par 2019: Parallel Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Saturated locks often degrade the performance of a multithreaded application, leading to a so-called scalability collapse problem. This problem arises when a growing number of threads circulating through a saturated lock causes the overall application performance to fade or even drop abruptly. This problem is particularly (but not solely) acute on oversubscribed systems (systems with more threads than available hardware cores).
In this paper, we introduce GCR (generic concurrency restriction), a mechanism that aims to avoid the scalability collapse. GCR, designed as a generic, lock-agnostic wrapper, intercepts lock acquisition calls, and decides when threads would be allowed to proceed with the acquisition of the underlying lock. Furthermore, we present GCR-NUMA, a non-uniform memory access (NUMA)-aware extension of GCR, that strives to ensure that threads allowed to acquire the lock are those that run on the same socket.
The extensive evaluation that includes more than two dozen locks, three machines and three benchmarks shows that GCR brings substantial speedup (in many cases, up to three orders of magnitude) in case of contention and growing thread counts, while introducing nearly negligible slowdown when the underlying lock is not contended. GCR-NUMA brings even larger performance gains starting at even lighter lock contention.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We also discuss other waiting policies and their limitations later in the paper.
 
2
For the clarity of exposition, we assume sequential consistency. Our actual implementation uses memory fences as well as volatile keywords and padding (to avoid false sharing) where necessarily.
 
Literature
1.
go back to reference Afek, Y., Dice, D., Morrison, A.: Cache index-aware memory allocation. In: Proceedings of ACM ISMM, pp. 55–64 (2011) Afek, Y., Dice, D., Morrison, A.: Cache index-aware memory allocation. In: Proceedings of ACM ISMM, pp. 55–64 (2011)
2.
go back to reference Boyd-Wickizer, S., Kaashoek, M., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium (2012) Boyd-Wickizer, S., Kaashoek, M., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium (2012)
3.
go back to reference Chabbi, M., Fagan, M., Mellor-Crummey, J.: High performance locks for multi-level NUMA systems. In: Proceedings of the ACM PPoPP (2015) Chabbi, M., Fagan, M., Mellor-Crummey, J.: High performance locks for multi-level NUMA systems. In: Proceedings of the ACM PPoPP (2015)
4.
go back to reference Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (LIMO): controlled parallelism for improved efficiency. In: Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES) (2012) Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (LIMO): controlled parallelism for improved efficiency. In: Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES) (2012)
5.
go back to reference Craig, T.: Building FIFO and priority-queueing spin locks from atomic swap. Technical report TR 93–02-02, University of Washington, Department of Computer Science (1993) Craig, T.: Building FIFO and priority-queueing spin locks from atomic swap. Technical report TR 93–02-02, University of Washington, Department of Computer Science (1993)
6.
go back to reference David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 33–48 (2013) David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 33–48 (2013)
7.
go back to reference Dice, D.: Malthusian locks. In: Proceedings of ACM EuroSys, pp. 314–327 (2017) Dice, D.: Malthusian locks. In: Proceedings of ACM EuroSys, pp. 314–327 (2017)
9.
go back to reference Dice, D., Kogan, A.: Compact NUMA-aware locks. In: Proceedings of ACM EuroSys (2019) Dice, D., Kogan, A.: Compact NUMA-aware locks. In: Proceedings of ACM EuroSys (2019)
10.
go back to reference Dice, D., Marathe, V.J., Shavit, N.: Lock cohorting: a general technique for designing NUMA locks. ACM TOPC 1(2), 13 (2015) Dice, D., Marathe, V.J., Shavit, N.: Lock cohorting: a general technique for designing NUMA locks. ACM TOPC 1(2), 13 (2015)
11.
go back to reference Eyerman, S., Eeckhout, L.: Modeling critical sections in Amdahl’s law and its implications for multicore design. In: Proceedings of ACM ISCA, pp. 362–370 (2010)CrossRef Eyerman, S., Eeckhout, L.: Modeling critical sections in Amdahl’s law and its implications for multicore design. In: Proceedings of ACM ISCA, pp. 362–370 (2010)CrossRef
13.
go back to reference Guiroux, H., Lachaize, R., Quéma, V.: Multicore locks: the case is not closed yet. In: Proceedings of USENIX ATC, pp. 649–662 (2016) Guiroux, H., Lachaize, R., Quéma, V.: Multicore locks: the case is not closed yet. In: Proceedings of USENIX ATC, pp. 649–662 (2016)
14.
go back to reference He, B., Scherer, W.N., Scott, M.L.: Preemption adaptivity in time-published queue-based spin locks. In: Proceedings of High Performance Computing (HiPC), pp. 7–18 (2005) He, B., Scherer, W.N., Scott, M.L.: Preemption adaptivity in time-published queue-based spin locks. In: Proceedings of High Performance Computing (HiPC), pp. 7–18 (2005)
15.
go back to reference Heirman, W., Carlson, T., Van Craeynest, K., Hur, I., Jaleel, A., Eeckhout, L.: Undersubscribed threading on clustered cache architectures. In: Proceedings of IEEE HPCA (2014) Heirman, W., Carlson, T., Van Craeynest, K., Hur, I., Jaleel, A., Eeckhout, L.: Undersubscribed threading on clustered cache architectures. In: Proceedings of IEEE HPCA (2014)
16.
go back to reference Johnson, R., Athanassoulis, M., Stoica, R., Ailamaki, A.: A new look at the roles of spinning and blocking. In: Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). ACM (2009) Johnson, R., Athanassoulis, M., Stoica, R., Ailamaki, A.: A new look at the roles of spinning and blocking. In: Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). ACM (2009)
17.
go back to reference Johnson, R., Stoica, R., Ailamaki, A., Mowry, T.C.: Decoupling contention management from scheduling. In: Proceedings of ACM ASPLOS, pp. 117–128 (2010)CrossRef Johnson, R., Stoica, R., Ailamaki, A., Mowry, T.C.: Decoupling contention management from scheduling. In: Proceedings of ACM ASPLOS, pp. 117–128 (2010)CrossRef
19.
go back to reference Lim, B.H., Agarwal, A.: Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 253–294 (1993)CrossRef Lim, B.H., Agarwal, A.: Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Comput. Syst. 11, 253–294 (1993)CrossRef
21.
go back to reference Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comp. Syst. 9(1), 21–65 (1991)CrossRef Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comp. Syst. 9(1), 21–65 (1991)CrossRef
22.
go back to reference Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: dynamically determining number of threads via OS level monitoring. In: Proceedings of IEEE IISWC (2011) Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: dynamically determining number of threads via OS level monitoring. In: Proceedings of IEEE IISWC (2011)
23.
go back to reference Radovic, Z., Hagersten, E.: Hierarchical backoff locks for nonuniform communication architectures. In: Proceedings of EEE HPCA, pp. 241–252 (2003) Radovic, Z., Hagersten, E.: Hierarchical backoff locks for nonuniform communication architectures. In: Proceedings of EEE HPCA, pp. 241–252 (2003)
24.
go back to reference Raman, A., Kim, H., Oh, T., Lee, J.W., August, D.I.: Parallelism orchestration using DoPE: the degree of parallelism executive. In: Proceedings of ACM PLDI (2011) Raman, A., Kim, H., Oh, T., Lee, J.W., August, D.I.: Parallelism orchestration using DoPE: the degree of parallelism executive. In: Proceedings of ACM PLDI (2011)
25.
go back to reference Yoo, R.M., Lee, H.H.S.: Adaptive transaction scheduling for transactional memory systems. In: Proceedings of ACM SPAA (2008) Yoo, R.M., Lee, H.H.S.: Adaptive transaction scheduling for transactional memory systems. In: Proceedings of ACM SPAA (2008)
Metadata
Title
Avoiding Scalability Collapse by Restricting Concurrency
Authors
Dave Dice
Alex Kogan
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-29400-7_26

Premium Partner