Abstract
Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor systems. Thus, system services that employ such methods can hinder reaching the full potential of these systems. A natural research question is the pertinence and the impact of lock-free concurrency control in key services for multiprocessors, such as in the memory allocation service, which is the theme of this work. We show the design and implementation of NBmalloc, a lock-free memory allocator designed to enhance the parallelism in the system. The architecture of NBmalloc is inspired by Hoard, a well-known concurrent memory allocator, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup. Within our effort to design appropriate lock-free algorithms for NBmalloc, we propose and show a lock-free implementation of a new data structure, flat-set, supporting conventional “internal” set operations as well as “inter-object” operations, for moving items between flat-sets. The design of NBmalloc also involved a series of other algorithmic problems, which are discussed in the paper. Further, we present the implementation of NBmalloc and a study of its behaviour in a set of multiprocessor systems. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved.
Article PDF
Similar content being viewed by others
References
Gidenstam, A., Papatriantafilou, M., Tsigas, P.: Allocating memory in a lock-free manner. In: Proc. of the 13th Annual European Symp. on Algorithms (ESA’05). LNCS, vol. 3669, pp. 329–342. Springer, Berlin (2005)
Berger, E.D.: Memory management for high-performance applications. Ph.D. Thesis, The University of Texas at Austin, Department of Computer Sciences (2002)
Berger, E., McKinley, K., Blumofe, R., Wilson, P.: Hoard: A scalable memory allocator for multithreaded applications. In: ASPLOS-IX: 9th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 117–128 (2000)
Michael, M.: Scalable lock-free dynamic memory allocation. In: Proc. of SIGPLAN 2004 Conf. on Programming Languages Design and Implementation. ACM SIGPLAN Notices. ACM, New York (2004)
Dice, D., Garthwaite, A.: Mostly lock-free malloc. In: ISMM’02 Proc. of the 3rd Int. Symp. on Memory Management. ACM SIGPLAN Notices, pp. 163–174. ACM, New York (2002)
Michael, M.M.: Safe memory reclamation for dynamic lock-free objects using atomic reads and writes. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Computing, pp. 21–30. ACM, New York (2002)
Michael, M.M.: Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 491–504 (2004)
Herlihy, M., Luchangco, V., Moir, M.: The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In: Proceedings of the 16th International Symposium on Distributed Computing (DISC’02). LNCS, vol. 2508, pp. 339–353. Springer, Berlin (2002)
Valois, J.D.: Lock-free data structures. Ph.D. Thesis, Rensselaer Polytechnic Institute, Department of Computer Science, Troy, New York (1995)
Michael, M.M., Scott, M.L.: Correction of a memory management method for lock-free data structures. Technical Report TR599, University of Rochester, Computer Science Department (1995)
Detlefs, D.L., Martin, P.A., Moir, M., Guy L. Steele, J.: Lock-free reference counting. In: Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, pp. 190–199. ACM, New York (2001)
Herlihy, M., Luchangco, V., Martin, P., Moir, M.: Nonblocking memory management support for dynamic-sized data structures. ACM Trans. Comput. Syst. 23, 146–196 (2005)
Gidenstam, A., Papatriantafilou, M., Sundell, H., Tsigas, P.: Practical and efficient lock-free garbage collection based on reference counting. In: Proc. of the 8th International Symp. on Parallel Architectures, Algorithms, and Networks (I-SPAN), pp. 202–207. IEEE Comput. Soc., Los Alamitos (2005)
Schneider, S., Antonopoulos, C., Nikolopoulos, D.: Scalabel locality-conscious multithreaded memory allocation. In: Proceedings of the 2006 International Symposium on Memory Management (ISMM’06), pp. 84–94. ACM, New York (2006)
Massalin, H., Pu, C.: A lock-free multiprocessor OS kernel. Technical Report CUCS-005-91 (1991)
Massalin, H.: Synthesis: an efficient implementation of fundamental operating system services. Ph.D. Thesis, Columbia University (1992)
Greenwald, M., Cheriton, D.R.: The synergy between non-blocking synchronization and operating system structure. In: Operating Systems Design and Implementation, pp. 123–136 (1996)
Greenwald, M.B.: Non-blocking synchronization and system design. Ph.D. Thesis, Stanford University (1999)
SGI: The standard template library for C++ (2003). http://www.sgi.com/tech/stl/Allocators.html
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. Assoc. Comput. Mach. 46, 720–748 (1999)
Gloger, W.: Wolfram Gloger’s malloc homepage (2003). http://www.malloc.de/en/
Steensgaard, B.: Thread-specific heaps for multi-threaded programs. In: ISMM 2000 Proc. of the Second Int. Symp. on Memory Management. ACM SIGPLAN Notices, vol. 36(1). ACM, New York (2000)
Herlihy, M.P., Wing, J.M.: Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 463–492 (1990)
Barnes, G.: A method for implementing lock-free shared data structures. In: Proc. of the 5th Annual ACM Symp. on Parallel Algorithms and Architectures, SIGACT and SIGARCH, pp. 261–270 (1993). Extended abstract
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Syst. 11, 124–149 (1991)
Rinard, M.C.: Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives. ACM Trans. Comput. Syst. 17, 337–371 (1999)
Valois, J.D.: Lock-free linked lists using compare-and-swap. In: Proc. of the 14th Annual ACM Symp. on Principles of Distributed Computing (PODC ’95), pp. 214–222. ACM, New York (1995)
Tsigas, P., Zhang, Y.: A simple, fast and scalable non-blocking concurrent fifo queue for shared memory multiprocessor systems. In: Proc. of the 13th Annual ACM Symp. on Parallel Algorithms and Architectures, pp. 134–143. ACM, New York (2001)
Harris, T.L.: A pragmatic implementation of non-blocking linked lists. In: Proc. of the 15th Int. Conf. on Distributed Computing, pp. 300–314. Springer, Berlin (2001)
Hoepman, J.H., Papatriantafilou, M., Tsigas, P.: Self-stabilization of wait-free shared memory objects. J. Parallel Distrib. Comput. 62, 766–791 (2002)
Moir, M.: Practical implementations of non-blocking synchronization primitives. In: Proc. of the 16th Annual ACM Symp. on Principles of Distributed Computing, pp. 219–228 (1997)
Papatriantafilou, M., Tsigas, P.: Wait-free consensus in “in-phase” multiprocessor systems. In: Symp. on Parallel and Distributed Processing (SPDP ’95), pp. 312–319. IEEE Comput. Soc., Los Alamitos (1995)
Papatriantafilou, M., Tsigas, P.: On self-stabilizing wait-free clock synchronization. Parallel Process. Lett. 7, 321–328 (1997)
Sundell, H., Tsigas, P.: Fast and lock-free concurrent priority queues for multi-thread systems. In: Proc. of the 17th IEEE/ACM Int. Parallel and Distributed Processing Symp. (IPDPS 03). IEEE Press, New York (2003)
Valois, J.D.: Implementing lock-free queues. In: Proc. of the Seventh Int. Conf. on Parallel and Distributed Computing Systems, pp. 64–69 (1994)
Michael, M.M.: High performance dynamic lock-free hash tables and list-based sets. In: Proc. of the 14th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA-02), pp. 73–82. ACM, New York (2002)
Jayanti, P.: A complete and constant time wait-free implementation of CAS from LL/SC and vice versa. In: Proceedings of the 12th International Symposium on Distributed Computing (DISC ’98). Lecture Notes in Computer Science, vol. 1499, pp. 216–230. Springer, Berlin (1998)
IBM: IBM System/370 Extended Architecture, Principles of Operation (1983). Publication No. SA22-7085
Larson, P.P.Å., Krishnan, M.: Memory allocation for long-running server applications. In: ISMM’98 Proc. of the 1st Int. Symp. on Memory Management. ACM SIGPLAN Notices, pp. 176–185. ACM, New York (1998)
Dechev, D., Pirkelbauer, P., Stroustrup, B.: Lock-free dynamically resizable arrays. In: Proc. of the 10th Int. Conf. on Principles of Distributed Systems OPODIS’06. LNCS, vol. 4305, pp. 142–156. Springer, Berlin (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by computational resources provided by the Swedish National Supercomputer Centre (NSC). This paper is an extended version of [1].
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Gidenstam, A., Papatriantafilou, M. & Tsigas, P. NBmalloc: Allocating Memory in a Lock-Free Manner. Algorithmica 58, 304–338 (2010). https://doi.org/10.1007/s00453-008-9268-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-008-9268-x