ABSTRACT
Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain users from using an optimal variant for a given problem. Although a wide range of specialized collections is available for the Java Virtual Machine (JVM), they introduce yet another dependency and complicate user adoption by requiring specific Application Program Interfaces (APIs) incompatible with the standard library.
A product line for collection data structures would relieve library designers from optimizing for the general case. Furthermore, a product line allows evolving the potentially large code base of a collection family efficiently. The challenge is to find a small core framework for collection data structures which covers all variations without exhaustively listing them, while supporting good performance at the same time.
We claim that the concept of Array Mapped Tries (AMTs) embodies a high degree of commonality in the sub-domain of immutable collection data structures. AMTs are flexible enough to cover most of the variability, while minimizing code bloat in the generator and the generated code. We implemented a Data Structure Code Generator (DSCG) that emits immutable collections based on an AMT skeleton foundation. The generated data structures outperform competitive hand-optimized implementations, and the generator still allows for customization towards specific workloads.
- V. Alves, D. Schneider, M. Becker, N. Bencomo, and P. Grace. “Comparitive Study of Variability Management in Software Product Lines and Runtime Adaptable Systems”. In: Proceedings of VaMoS ’09. Universität Duisburg-Essen, 2009.Google Scholar
- P. Bagwell. Fast And Space Efficient Trie Searches. Tech. rep. LAMP-REPORT-2000-001. Ecole polytechnique fédérale de Lausanne, 2000.Google Scholar
- P. Bagwell. Ideal Hash Trees. Tech. rep. LAMPREPORT-2001-001. Ecole polytechnique fédérale de Lausanne, 2001.Google Scholar
- P. Bagwell and T. Rompf. RRB-Trees: Efficient Immutable Vectors. Tech. rep. EPFL-REPORT-169879.Google Scholar
- Ecole polytechnique fédérale de Lausanne, 2011.Google Scholar
- T. J. Biggerstaff. “A Perspective of Generative Reuse”. In: Annals of Software Engineering 5.1 (1998). Google ScholarDigital Library
- R. Bird. “Two Dimensional Pattern Matching”. In: Information Processing Letters 6.5 (1977).Google ScholarCross Ref
- R. de la Briandais. “File Searching Using Variable Length Keys”. In: Proceedings of IRE-AIEE-ACM ’59 (Western). ACM, 1959. Google ScholarDigital Library
- P. Clements and L. Northrop. Software Product Lines: Practices and Patterns. Addison-Wesley, 2001. Google ScholarDigital Library
- K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google ScholarDigital Library
- A. van Deursen and P. Klint. “Domain-Specific Language Design Requires Feature Descriptions”. In: Journal of Computing and Information Technology 10.1 (2002).Google ScholarCross Ref
- J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. “Making Data Structures Persistent”. In: Proceedings of STOC ’86. ACM, 1986. Google ScholarDigital Library
- E. Fredkin. “Trie Memory”. In: Communications of the ACM 3.9 (1960). Google ScholarDigital Library
- S. Hallsteinsen, M. Hinchey, S. Park, and K. Schmid. “Dynamic Software Product Lines”. In: Computer 41.4 (2008). Google ScholarDigital Library
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. “Concurrent Data Representation Synthesis”. In: Proceedings of PLDI ’12. ACM, 2012. Google ScholarDigital Library
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. “Data Representation Synthesis”. In: Proceedings of PLDI ’11. ACM, 2011. Google ScholarDigital Library
- P. Helland. “Immutability Changes Everything”. In: Communications of the ACM 59.1 (2015). Google ScholarDigital Library
- P. Klint, T. van der Storm, and J. Vinju. “Rascal: A Domain Specific Language for Source Code Analysis and Manipulation”. In: Proceedings of SCAM ’09. IEEE, 2009. Google ScholarDigital Library
- C. Loncaric, E. Torlak, and M. D. Ernst. “Fast Synthesis of Fast Collections”. In: Proceedings of PLDI ’16. ACM, 2016. Google ScholarDigital Library
- D. McIlroy. “Mass-Produced Software Components”. In: Proceedings of NATO Software Engineering Conference. Scientific Affairs Division, NATO, 1969.Google Scholar
- C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999. Google ScholarDigital Library
- A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. “Concurrent Tries with Efficient Non-blocking Snapshots”. In: Proc. of PPoPP ’12. ACM, 2012. Google ScholarDigital Library
- E. Schonberg, J. T. Schwartz, and M. Sharir. “Automatic Data Structure Selection in SETL”. In: Proceedings of POPL ’79. ACM, 1979. Google ScholarDigital Library
- O. Shacham, M. Vechev, and E. Yahav. “Chameleon: Adaptive Selection of Collections”. In: Proceedings of PLDI ’09. ACM, 2009. Google ScholarDigital Library
- Y. Smaragdakis and D. Batory. “DiSTiL: A Transformation Library for Data Structures”. In: Proceedings of DSL’97. USENIX Association, 1997. Google ScholarDigital Library
- M. J. Steindorfer. Towards a Feature Model of Trie-Based Collections. 2016.Google Scholar
- DOI : 10. 5281 / zenodo. 59739.Google Scholar
- M. J. Steindorfer and J. J. Vinju. “Code Specialization for Memory Efficient Hash Tries (Short Paper)”. In: Proceedings of GPCE ’14. ACM, 2014. Google ScholarDigital Library
- M. J. Steindorfer and J. J. Vinju. “Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries”. In: ArXiv e-prints (2016). arXiv: 1608.01036 {cs.DS}.Google Scholar
- M. J. Steindorfer and J. J. Vinju. “Optimizing Hasharray Mapped Tries for Fast and Lean Immutable JVM Collections”. In: Proceedings of OOPSLA ’15. ACM, 2015. Google ScholarDigital Library
- M. J. Steindorfer and J. J. Vinju. “Performance Modeling of Maximal Sharing”. In: Proceedings of ICPE ’16. ACM, 2016. Google ScholarDigital Library
- N. Stucki, T. Rompf, V. Ureche, and P. Bagwell. “RRB Vector: A Practical General Purpose Immutable Sequence”. In: Proceedings of ICFP ’15. ACM, 2015. Google ScholarDigital Library
- V. Ureche, C. Talau, and M. Odersky. “Miniboxing: Improving the Speed to Code Size Tradeoff in Parametric Polymorphism Translations”. In: Proceedings of OOPSLA ’13. ACM, 2013. Google ScholarDigital Library
- Introduction Related Work A Stable Data Type Independent Encoding Intermediate Generator Abstractions ConclusionGoogle Scholar
Index Terms
- Towards a software product line of trie-based collections
Recommendations
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsThe data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...
Towards a software product line of trie-based collections
GPCE '16Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain ...
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA '15The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...
Comments