Skip to main content
Log in

Bounded-wait combining: constructing robust and high-throughput shared objects

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, non-linearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, non-blocking, and can provably achieve high throughput in k-synchronous executions—executions in which process speeds vary by at most a constant factor k. The algorithm is based on a novel variation of the software combining paradigm that we call bounded-wait combining (BWC). It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as a stack or a queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the BWC algorithm, a process only waits for other processes for a bounded period of time and then “takes destiny in its own hands”. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects, which we believe is interesting in its own right. We use this metric to prove that our algorithm achieves throughput of Ω(N/ log N) in k-synchronous executions, where N is the number of processes that can participate in the algorithm. Our algorithm uses two tools that we believe may prove useful for obtaining highly parallel non-blocking implementation of additional objects. The first are “synchronous locks”, locks that are respected by processes only in k-synchronous executions and are disregarded otherwise; the second are “pseduo-transactions”—a weakening of regular transactions that allows higher parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aspnes J., Herlihy M., Shavit N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  2. Attiya H., Dagan A.: Improved implementations of binary universal operations. J. ACM 48(5), 1013–1037 (2001)

    Article  MathSciNet  Google Scholar 

  3. Attiya, H., Guerraoui, R., Kouznetsov, P.: Computing with reads and writes in the absence of step contention. In: Proceedings of the 19th International Symposium on Distributed Computing (DISC’05) (2005)

  4. Dwork C., Herlihy M., Waarts O.: Contention in shared memory algorithms. J. ACM 44(6), 779–805 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  5. Ellen, F., Lev, Y., Luchangco, V., Moir, M.: Snzi: scalable nonzero indicators. In: Proceedings of the 27th Annual ACM Symposium on Principles of Distributed Computing (PODC) (2007)

  6. Fich, F.E., Hendler, D., Shavit, N.: Linear lower bounds on real-world implementations of concurrent objects. In: Proceedings of the 46th Annual Symposium on Foundations of Computer Science (FOCS) (2005)

  7. Goodman, J.R., Vernon, M.K., Woest, P.J.: Efficent synchronization primitives for large-scale cache-coherent multiprocessors. In: ASPLOS, pp. 64–75 (1989)

  8. Gottlieb, A., Grishman, R., Kruskal, C.P., McAuliffe, K.P., Rudolph, L., Snir, M.: The nyu ultracomputer designing a mimd, shared-memory parallel machine. In: ISCA ’98: 25 years of the international symposia on Computer architecture (selected papers), pp. 239–254, New York, NY, USA. ACM Press, New York (1998)

  9. Greenwald, M.: Two-handed emulation: how to build non-blocking implementations of complex data-structures using dcas. In: PODC ’02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pp. 260–269, New York, NY, USA. ACM Press, New York (2002)

  10. Ha, P.H., Papatriantafilou, M., Tsigas, P.: Self-tuning reactive distributed trees for counting and balancing. In: OPODIS, pp. 213–228 (2004)

  11. Hendler, D., Kutten, S.: Bounded-wait combining: Constructing robust and high-throughput shared objects. In: Proceedings of the 20th International Symposium on Distributed Computing (DISC’06), pp. xx–xx (2006)

  12. Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, pp. 206–215, New York, NY, USA. ACM Press, New York (2004)

  13. Herlihy M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)

    Article  Google Scholar 

  14. Herlihy M.: A methodology for implementing highly concurrent objects. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1993)

    Article  Google Scholar 

  15. Herlihy M., Shavit N., Waarts O.: Linearizable counting networks. Distributed Comput. 9(4), 193–203 (1996)

    Article  Google Scholar 

  16. Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)

    Article  Google Scholar 

  17. Kruskal C.P., Rudolph L., Snir M.: Efficient synchronization on multiprocessors with shared memory. ACM Trans. Program. Lang. Syst. 10(4), 579–601 (1988)

    Article  MATH  Google Scholar 

  18. Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free fifo queues. In: SPAA’05: Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures, pp. 253–262, NY, USA. ACM Press, New York (2005)

  19. Shavit N., Zemach A.: Diffracting trees. ACM Trans. Comput. Syst. 14(4), 385–428 (1996)

    Article  Google Scholar 

  20. Shavit N., Zemach A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distributed Comput. 60(11), 1355–1387 (2000)

    Article  MATH  Google Scholar 

  21. Wattenhofer R., Widmayer P.: The counting pyramid: an adaptive distributed counting scheme. J. Parallel Distributed Comput. 64(4), 449–460 (2004)

    Article  MATH  Google Scholar 

  22. Yew P.-C., Tzeng N.-F., Lawrie D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 36(4), 388–395 (1987)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danny Hendler.

Additional information

A preliminary version of this paper appeared in [11].

D. Hendler is supported in part by a grant from the Israel Science Foundation. S. Kutten is supported in part by a grant from the Israel Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hendler, D., Kutten, S. Bounded-wait combining: constructing robust and high-throughput shared objects. Distrib. Comput. 21, 405–431 (2009). https://doi.org/10.1007/s00446-009-0078-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-009-0078-4

Keywords

Navigation