research-article

Fast concurrent queues for x86 processors

Authors:
Adam Morrison

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

,
Yehuda Afek

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 48 Issue 8August 2013pp 103–112https://doi.org/10.1145/2517327.2442527

Published:23 February 2013Publication History

ACM SIGPLAN Notices

Abstract

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues.

This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.

References

Power ISA Version 2.06. http://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf, January 2009.Google Scholar
G. E. Blelloch, P. B. Gibbons, and S. H. Vardhan. Combinable memory-block transactions. In SPAA 2008. Google ScholarDigital Library
G. E. Blelloch, P. Cheng, and P. B. Gibbons. Scalable room synchronizations. Theory of Computing Systems, 36, 2003.Google Scholar
R. Colvin and L. Groves. Formal verification of an array-based nonblocking queue. In ICECCS 2005. Google ScholarDigital Library
D. Dice, V. J. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. In PPoPP 2012. Google ScholarDigital Library
J. Evans. Scalable memory allocation using jemalloc. http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919, 2011.Google Scholar
P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In PPoPP 2012. Google ScholarDigital Library
P. Fatourou and N. D. Kallimanis. A highly-efficient wait-free universal construction. In SPAA 2011. Google ScholarDigital Library
E. Freudenthal and A. Gottlieb. Process coordination with fetch-and-increment. In ASPLOS 1991. Google ScholarDigital Library
A. Gottlieb, B. D. Lubachevsky, and L. Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. TOPLAS, 5(2), Apr. 1983. Google ScholarDigital Library
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA 2010. Google ScholarDigital Library
M. Herlihy. Wait-free synchronization. TOPLAS, 13:124--149, January 1991. Google ScholarDigital Library
M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarDigital Library
M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. TOPLAS, 12:463--492, July 1990. Google ScholarDigital Library
M. Hoffman, O. Shalev, and N. Shavit. The baskets queue. In OPODIS 2007. Google ScholarDigital Library
A. Kogan and E. Petrank. Wait-free queues with multiple enqueuers and dequeuers. In PPoPP 2011. Google ScholarDigital Library
E. Ladan-Mozes and N. Shavit. An optimistic approach to lock-free FIFO queues. In DISC 2004.Google ScholarCross Ref
M. M. Michael. Hazard pointers: Safe memory reclamation for lockfree objects. IEEE TPDS, 15(6):491--504, June 2004. Google ScholarDigital Library
M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In PODC 1996. Google ScholarDigital Library
M. Moir, D. Nussbaum, O. Shalev, and N. Shavit. Using elimination to implement scalable and lock-free FIFO queues. In SPAA 2005. Google ScholarDigital Library
P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53(7):89--97, July 2010. Google ScholarDigital Library
N. Shafiei. Non-blocking array-based algorithms for stacks and queues. In ICDCN 2009. Google ScholarDigital Library
P. Tsigas and Y. Zhang. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In SPAA 2001. Google ScholarDigital Library

Index Terms

Fast concurrent queues for x86 processors

Recommendations

Fast concurrent queues for x86 processors
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to ...
Read More
Practical, Fast and Simple Concurrent FIFO Queues Using Single Word Synchronization Primitives
Ada-Europe '08: Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies

We present an efficient and practical non-blocking implementation of a concurrent array-based FIFO queue that is suitable for preemptive multi threaded systems. It is well known that concurrent FIFO queues relying on mutual exclusion cause blocking, ...
Read More
On the Importance of Synchronization Primitives with Low Consensus Numbers
ICDCN '18: Proceedings of the 19th International Conference on Distributed Computing and Networking

The consensus number of a synchronization primitive is the maximum number of processes for which the primitive can solve consensus. This has been the traditional measure of power of a synchronization primitive. Thus, the compare-and-swap primitive, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 48, Issue 8
PPoPP '13
August 2013
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2517327
Issue’s Table of Contents
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2013
332 pages
ISBN:9781450319225
DOI:10.1145/2442516
General Chairs:
Alex Nicolau
University of California, Irvine, USA
,
Xiaowei Shen
IBM Research, China
,
Program Chairs:
Saman Amarasinghe
Massachusetts Institute of Technology, USA
,
Richard Vuduc
Georgia Institute of Technology, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 February 2013
Check for updates
Author Tags
concurrent queue
fetch-and-add
nonblocking algorithm
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 1,013
  Total Downloads
- Downloads (Last 12 months)152
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast concurrent queues for x86 processors

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Fast concurrent queues for x86 processors

Practical, Fast and Simple Concurrent FIFO Queues Using Single Word Synchronization Primitives

On the Importance of Synchronization Primitives with Low Consensus Numbers