skip to main content
10.1145/2814270.2814312acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Optimizing hash-array mapped tries for fast and lean immutable JVM collections

Published:23 October 2015Publication History

ABSTRACT

The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3–6.7x) and equality checking (3–25.4x).

Skip Supplemental Material Section

Supplemental Material

References

  1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bagwell. Fast And Space Efficient Trie Searches. Technical Report LAMP-REPORT-2000-001, Ecole polytechnique fédérale de Lausanne, 2000.Google ScholarGoogle Scholar
  3. P. Bagwell. Ideal Hash Trees. Technical Report LAMPREPORT-2001-001, Ecole polytechnique fédérale de Lausanne, 2001.Google ScholarGoogle Scholar
  4. P. Bagwell. Fast Functional Lists, Hash-Lists, Deques, and Variable Length Arrays. Technical Report LAMP-REPORT- 2002-003, Ecole polytechnique fédérale de Lausanne, 2002.Google ScholarGoogle Scholar
  5. P. Bagwell and T. Rompf. RRB-Trees: Efficient Immutable Vectors. Technical Report EPFL-REPORT-169879, Ecole polytechnique fédérale de Lausanne, 2011.Google ScholarGoogle Scholar
  6. A. Biboudis, N. Palladinos, G. Fourtounis, and Y. Smaragdakis. Streams a la carte: Extensible Pipelines with Object Algebras. In ECOOP ’15: Proceedings of the 29th European conference on Object-Oriented Programming. Schloss Dagstuhl, 2015.Google ScholarGoogle Scholar
  7. T. J. Biggerstaff. A perspective of generative reuse. Annals of Software Engineering, 5(1):169–226, 1998. ISSN 1022-7091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. D. Cooper, T. J. Harvey, and K. Kennedy. A Simple, Fast Dominance Algorithm. Technical Report TR-06-33870, Rice University, 2006.Google ScholarGoogle Scholar
  9. K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. De La Briandais. File Searching Using Variable Length Keys. In Papers presented at the the March 3-5, 1959, western joint computer conference, pages 295–298. ACM Press, 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. In Proceedings of the 18th annual ACM symposium on Theory of computing. ACM, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ebert, D. Bildhauer, H. Schwarz, and V. Riediger. Using difference information to reuse software cases. Softwaretechnik-Trends, 2007.Google ScholarGoogle Scholar
  13. E. Fredkin. Trie Memory. Communications of the ACM, 3(9): 490–499, 1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gil and Y. Shimron. Smaller Footprint for Java Collections. In ECOOP ’12: Proceedings of the 26th European conference on Object-Oriented Programming. Springer-Verlag, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Hajiyev, M. Verbaere, and O. de Moor. Codequest: Scalable source code queries with datalog. In ECOOP ’06: Proceedings of the 20th European Conference on Object-Oriented Programming. Springer-Verlag, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hills and P. Klint. PHP AiR: Analyzing PHP systems with Rascal. In Proceedings of IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Igarashi and M. Viroli. On Variance-Based Subtyping for Parametric Types. In ECOOP ’02: Proceedings of the 16th European conference on Object-Oriented Programming. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Klint, T. van der Storm, and J. Vinju. Rascal: A Domain Specific Language for Source Code Analysis and Manipulation. In Proceedings of Ninth IEEE International Working Conference on Source Code Analysis and Manipulation. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. E. Ladner, R. Fortna, and B.-H. Nguyen. A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation. In Experimental Algorithmics. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. McIlroy. Mass-Produced Software Components. In P. Naur and B. Randell, editors, Proceedings of NATO Software Engineering Conference, pages 138–155. NATO Scientific Affairs Division, 1968.Google ScholarGoogle Scholar
  21. N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. In OOPSLA ’07: Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Olsson and S. Nilsson. TRASH A dynamic LC-trie and hash data structure. In High Performance Switching and Routing, 2007. HPSR ’07. Workshop on. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. Concurrent tries with efficient non-blocking snapshots. In PPoPP ’12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Rademaker. Binary relational querying for structural source code analysis. Universiteit Utrecht. Master’s thesis, 2008.Google ScholarGoogle Scholar
  26. I. Ramakrishnan, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. Efficient Tabling Mechanisms for Logic Programs. In Proceedings of the 12th International Conference on Logic Programming. Elsevier, 1995.Google ScholarGoogle Scholar
  27. N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. Communications of the ACM, 29(7): 669–679, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Shavit and D. Touitou. Software transactional memory. ACM Press, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. J. Steindorfer and J. J. Vinju. Code Specialization for Memory Efficient Hash Tries (Short Paper). In GPCE ’14: Proceedings of Generative Programming Concepts & Experiences. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Stucki, T. Rompf, V. Ureche, and P. Bagwell. RRB Vector: A Practical General Purpose Immutable Sequence. In ICFP ’15: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Ureche, C. Talau, and M. Odersky. Miniboxing: improving the speed to code size tradeoff in parametric polymorphism translations. In OOPSLA ’13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, and H. Mössenböck. An object storage model for the truffle language implementation framework. In PPPJ ’14: Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform. ACM, 2014. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing hash-array mapped tries for fast and lean immutable JVM collections

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
      October 2015
      953 pages
      ISBN:9781450336895
      DOI:10.1145/2814270
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 10
        OOPSLA '15
        October 2015
        953 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2858965
        • Editor:
        • Andy Gill
        Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate268of1,244submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader