skip to main content
research-article

Fast concurrent queues for x86 processors

Published:23 February 2013Publication History
Skip Abstract Section

Abstract

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues.

This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.

References

  1. Power ISA Version 2.06. http://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf, January 2009.Google ScholarGoogle Scholar
  2. G. E. Blelloch, P. B. Gibbons, and S. H. Vardhan. Combinable memory-block transactions. In SPAA 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. E. Blelloch, P. Cheng, and P. B. Gibbons. Scalable room synchronizations. Theory of Computing Systems, 36, 2003.Google ScholarGoogle Scholar
  4. R. Colvin and L. Groves. Formal verification of an array-based nonblocking queue. In ICECCS 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Dice, V. J. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. In PPoPP 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Evans. Scalable memory allocation using jemalloc. http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919, 2011.Google ScholarGoogle Scholar
  7. P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In PPoPP 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Fatourou and N. D. Kallimanis. A highly-efficient wait-free universal construction. In SPAA 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Freudenthal and A. Gottlieb. Process coordination with fetch-and-increment. In ASPLOS 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gottlieb, B. D. Lubachevsky, and L. Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. TOPLAS, 5(2), Apr. 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Herlihy. Wait-free synchronization. TOPLAS, 13:124--149, January 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. TOPLAS, 12:463--492, July 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hoffman, O. Shalev, and N. Shavit. The baskets queue. In OPODIS 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kogan and E. Petrank. Wait-free queues with multiple enqueuers and dequeuers. In PPoPP 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Ladan-Mozes and N. Shavit. An optimistic approach to lock-free FIFO queues. In DISC 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. M. Michael. Hazard pointers: Safe memory reclamation for lockfree objects. IEEE TPDS, 15(6):491--504, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In PODC 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Moir, D. Nussbaum, O. Shalev, and N. Shavit. Using elimination to implement scalable and lock-free FIFO queues. In SPAA 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53(7):89--97, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Shafiei. Non-blocking array-based algorithms for stacks and queues. In ICDCN 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Tsigas and Y. Zhang. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In SPAA 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast concurrent queues for x86 processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 48, Issue 8
          PPoPP '13
          August 2013
          309 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2517327
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
            February 2013
            332 pages
            ISBN:9781450319225
            DOI:10.1145/2442516

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 February 2013

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader