Skip to main content
Erschienen in: Theory of Computing Systems 3/2014

01.10.2014

Highly-Efficient Wait-Free Synchronization

verfasst von: Panagiota Fatourou, Nikolaos D. Kallimanis

Erschienen in: Theory of Computing Systems | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We study a simple technique, originally presented by Herlihy (ACM Trans. Program. Lang. Syst. 15(5):745–770, 1993), for executing concurrently, in a wait-free manner, blocks of code that have been programmed for sequential execution and require significant synchronization in order to be performed in parallel. We first present an implementation of this technique, called Sim, which employs a collect object. We describe a simple implementation of a collect object from a single shared object that supports atomic Add (or XOR) in addition to read; this implementation has step complexity O(1). By plugging in to Sim this implementation, Sim exhibits constant step complexity as well. This allows us to derive lower bounds on the step complexity of implementations of several shared objects, like Add, XOR, collect, and snapshot objects, from LL/SC objects.
We then present a practical version of Sim, called PSim, which is implemented in a real shared-memory machine. From a theoretical perspective, PSim has worse step complexity than Sim, its theoretical analog; in practice though, we experimentally show that PSim is highly-efficient: it outperforms several state-of-the-art lock-based and lock-free synchronization techniques, and this given that it is wait-free, i.e. that it satisfies a stronger progress condition than all the algorithms that it outperforms. We have used PSim to get highly-efficient wait-free implementations of stacks and queues.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We sometimes abuse notation and use op to denote an operation or an instance of it, depending on the context.
 
2
For simplicity we sometimes say that a request req by a thread p i executes Attempt (or any other line of the pseudocode) meaning that the instance of SimApplyOp that is called by p i for req executes Attempt (or any code line in reference). Moreover, when we refer to the execution interval of some request req, we mean the execution interval of the instance of SimApplyOp that is invoked with parameter req.
 
3
For clarity of the proof, we consider each \(\mathit{req}_{j}^{i}\) as distinct.
 
4
For simplicity, we assume that d is a divisor of b, so that the d bits allocated to each thread are not split across two Add objects.
 
5
We remark that in the real code we use a pool of nC+1 structures, where C>1 is a small constant, for performance reasons. However, using a pool of just 2n+1 structures is enough to prove correctness. For code simplicity, Algorithm 2 uses 2n+2 such structures.
 
6
In the real code, Pool is implemented as a one-dimensional array, and S is an index indicating one of its elements.
 
7
This problem occurs when a thread p reads some value A from a shared variable and then some other thread p′ modifies the variable to the value B and back to A; when p begins execution again, it sees that the value of the variable has not changed and continues executing normally which might be incorrect.
 
8
We experimentally saw that MCS spin locks [27] have similar performance to CLH spin locks, so we present our results only for CLH locks.
 
9
The number of combining rounds determines how many times the combiner (in flat-combining) traverses the request list before it gives up serving other requests.
 
Literatur
1.
Zurück zum Zitat Afek, Y., Dauber, D., Touitou, D.: Wait-free made fast. In: Proceedings of the 27th ACM Symposium on Theory of Computing, pp. 538–547 (1995) Afek, Y., Dauber, D., Touitou, D.: Wait-free made fast. In: Proceedings of the 27th ACM Symposium on Theory of Computing, pp. 538–547 (1995)
2.
Zurück zum Zitat Afek, Y., Stupp, G., Touitou, D.: Long-lived adaptive collect with applications. In: Proceedings of the 40th Symposium on Foundations of Computer Science, pp. 262–272 (1999) Afek, Y., Stupp, G., Touitou, D.: Long-lived adaptive collect with applications. In: Proceedings of the 40th Symposium on Foundations of Computer Science, pp. 262–272 (1999)
3.
Zurück zum Zitat Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: Proceedings of National Computer Conference of the American Federation of Information Processing Societies, pp. 483–485 (1967) Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: Proceedings of National Computer Conference of the American Federation of Information Processing Societies, pp. 483–485 (1967)
4.
Zurück zum Zitat Anderson, J.H., Moir, M.: Universal constructions for multi-object operations. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pp. 184–193 (1995) Anderson, J.H., Moir, M.: Universal constructions for multi-object operations. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pp. 184–193 (1995)
5.
Zurück zum Zitat Anderson, J.H., Moir, M.: Universal constructions for large objects. IEEE Trans. Parallel Distrib. Syst. 10(12), 1317–1332 (1999) CrossRef Anderson, J.H., Moir, M.: Universal constructions for large objects. IEEE Trans. Parallel Distrib. Syst. 10(12), 1317–1332 (1999) CrossRef
6.
Zurück zum Zitat Attiya, H., Guerraoui, R., Ruppert, E.: Partial snapshot objects. In: Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 336–343 (2008) Attiya, H., Guerraoui, R., Ruppert, E.: Partial snapshot objects. In: Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 336–343 (2008)
7.
Zurück zum Zitat Barnes, G.: A method for implementing lock-free shared data structures. In: Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures, pp. 261–270 (1993) Barnes, G.: A method for implementing lock-free shared data structures. In: Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures, pp. 261–270 (1993)
8.
Zurück zum Zitat Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 117–128 (2000) Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 117–128 (2000)
9.
Zurück zum Zitat Chuong, P., Ellen, F., Ramachandran, V.: A universal construction for wait-free transaction friendly data structures. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 335–344 (2010) Chuong, P., Ellen, F., Ramachandran, V.: A universal construction for wait-free transaction friendly data structures. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 335–344 (2010)
10.
Zurück zum Zitat Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Blade Computing with the AMD Opteron Processor (Magny-Cours). Hot chips 21 (2009) Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Blade Computing with the AMD Opteron Processor (Magny-Cours). Hot chips 21 (2009)
11.
Zurück zum Zitat Craig, T.S.: Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington (1993) Craig, T.S.: Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington (1993)
12.
Zurück zum Zitat Fatourou, P., Kallimanis, N.D.: The RedBlue adaptive universal constructions. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 127–141 (2009) Fatourou, P., Kallimanis, N.D.: The RedBlue adaptive universal constructions. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 127–141 (2009)
13.
Zurück zum Zitat Fatourou, P., Kallimanis, N.D.: Fast Implementations of Shared Objects using Fetch&Add. Tech. Rep. TR 02-2010, Department of Computer Science, University of Ioannina (2010) Fatourou, P., Kallimanis, N.D.: Fast Implementations of Shared Objects using Fetch&Add. Tech. Rep. TR 02-2010, Department of Computer Science, University of Ioannina (2010)
14.
Zurück zum Zitat Fatourou, P., Kallimanis, N.D.: A Highly-Efficient Wait-Free Universal Construction. Tech. Rep. TR 01-2011, Department of Computer Science, University of Ioannina (2011) Fatourou, P., Kallimanis, N.D.: A Highly-Efficient Wait-Free Universal Construction. Tech. Rep. TR 01-2011, Department of Computer Science, University of Ioannina (2011)
15.
Zurück zum Zitat Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 257–266. ACM, New York (2012) Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 257–266. ACM, New York (2012)
17.
Zurück zum Zitat Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 355–364 (2010) Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 355–364 (2010)
18.
Zurück zum Zitat Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures, pp. 206–215 (2004) Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures, pp. 206–215 (2004)
19.
Zurück zum Zitat Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 124–149 (1991) CrossRef Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 124–149 (1991) CrossRef
20.
Zurück zum Zitat Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1993) CrossRef Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1993) CrossRef
21.
Zurück zum Zitat Herlihy, M., Luchangco, V., Moir, M.: Space- and time-adaptive nonblocking algorithms. Electron. Notes Theor. Comput. Sci. 78(0), 260–280 (2003) CrossRef Herlihy, M., Luchangco, V., Moir, M.: Space- and time-adaptive nonblocking algorithms. Electron. Notes Theor. Comput. Sci. 78(0), 260–280 (2003) CrossRef
22.
Zurück zum Zitat Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990) CrossRef Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990) CrossRef
23.
Zurück zum Zitat Imbs, D., Raynal, M.: Help when needed, but no more: efficient Read/Write partial snapshot. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 142–156 (2009) Imbs, D., Raynal, M.: Help when needed, but no more: efficient Read/Write partial snapshot. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 142–156 (2009)
24.
Zurück zum Zitat Jayanti, P.: A time complexity lower bound for randomized implementations of some shared objects. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 201–210 (1998) Jayanti, P.: A time complexity lower bound for randomized implementations of some shared objects. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 201–210 (1998)
26.
Zurück zum Zitat Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th International Parallel Processing Symposium, pp. 165–171 (1994) CrossRef Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th International Parallel Processing Symposium, pp. 165–171 (1994) CrossRef
27.
Zurück zum Zitat Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991) CrossRef Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991) CrossRef
28.
Zurück zum Zitat Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996) Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996)
29.
Zurück zum Zitat Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, pp. 182–204 (1999) Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, pp. 182–204 (1999)
30.
Zurück zum Zitat Shalev, O., Shavit, N.: Predictive log-synchronization. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys ’06, pp. 305–315 (2006) CrossRef Shalev, O., Shavit, N.: Predictive log-synchronization. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys ’06, pp. 305–315 (2006) CrossRef
31.
Zurück zum Zitat Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000) CrossRefMATH Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000) CrossRefMATH
32.
Zurück zum Zitat Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, New York (2006) Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, New York (2006)
33.
Zurück zum Zitat Treiber, R.K.: Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, (1986) Treiber, R.K.: Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, (1986)
34.
Zurück zum Zitat Yew, P.C., Tzeng, N.F., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 100(4), 388–395 (1987) Yew, P.C., Tzeng, N.F., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 100(4), 388–395 (1987)
Metadaten
Titel
Highly-Efficient Wait-Free Synchronization
verfasst von
Panagiota Fatourou
Nikolaos D. Kallimanis
Publikationsdatum
01.10.2014
Verlag
Springer US
Erschienen in
Theory of Computing Systems / Ausgabe 3/2014
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-013-9491-y

Weitere Artikel der Ausgabe 3/2014

Theory of Computing Systems 3/2014 Zur Ausgabe