Abstract
The cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms that perform well in practice. This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on <i>k</i>-way merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4-ary heaps for large inputs.
Supplemental Material
Available for Download
The software suite accompanying the article; this is a small Unix tar file, which includes the source code, a Makefile, and the test files used in the article.
- ARGE, L. 1995. The buffer tree: A new technique for optimal I/O-algorithms. In 4th WADS, Number 955 in LNCS (1995), pp. 334-345. Springer.]] Google ScholarDigital Library
- BARVE, R. D., GROVE, E. F., AND VITTER, J. S. 1997. Simple randomized mergesort on parallel disks. Parallel Computing 23, 4, 601-631.]] Google ScholarDigital Library
- BRENGEL, K., CRAUSER, A., MEYER, U., AND FERRAGINA, P. 1999. An experimental study of prioritty queues in external memory. In 3rd International Workshop on Algorithmic Engineering (WAE) (1999), pp. 345-359. full paper in ACM Journal of Experimental Algorithmics.]] Google ScholarDigital Library
- BRODAL, G. S. AND KATAJAINEN, J. 1998. Worst-case efficient external-memory priority queues. In 6th Scandinavian Workshop on Algorithm Theory, Number 1432 in LNCS (1998), pp. 107-118. Springer Verlag, Berlin.]] Google ScholarDigital Library
- BROWN, R. 1988. Calendar queues: A fast O(1) priority queue implementation for the simulation event set problem. Communications of the ACM 31, 10, 1220-1227.]] Google ScholarDigital Library
- CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. McGraw-Hill.]] Google ScholarDigital Library
- FADEL, R., JAKOBSEN, K. V., KATAJAINEN, J., AND TEUHOLA, J. 1997. External heaps combined with effective buffering. In 4th Australasian Theory Symposium, Volume 19-2 of Australian Computer Science Communications (1997), pp. 72-78. Springer.]]Google Scholar
- FISCHER, M. J. AND PATERSON, M. S. 1994. Fishspear: A priority queue algorithm. Journal of the ACM 41, 1, 3-30.]] Google ScholarDigital Library
- HENNESSY, J. L. AND PATTERSON, D. A. 1996. Computer Architecture a Quantitative Approach . Morgan Kaufmann.]] Google ScholarDigital Library
- Intel Corporation. 1997. Intel Architecture Software Developer's Manual. Volume I: Basic Architecture. P.O. Box 5937, Denver, CO, 80217-9808, http://www.intel.com: Intel Corporation. Ordering Number 243190.]]Google Scholar
- JONES, D. 1986. An empirical comparison of priority-queue and event set implementations. Communications of the ACM 29, 4, 300-311.]] Google ScholarDigital Library
- KELLER, J. 1996. The 21264: A superscalar alpha processor with out-of-order execution. In Microprocessor Forum (October 1996).]]Google Scholar
- KNUTH, D. E. 1973. The Art of Computer Programming--Sorting and Searching, Volume 3. Addison Wesley.]]Google Scholar
- LAMARCA, A. AND LADNER, R. E. 1996. The influence of caches on the performance of heaps. ACM Journal of Experimental Algorithmics 1, 4.]] Google ScholarDigital Library
- LAMARCA, A. AND LADNER, R. E. 1997. The influence of caches on the performance of sorting. In 8th Symposium on Discrete Algorithm (1997), pp. 370-379.]] Google ScholarDigital Library
- MIPS Technologies, Inc. 1998. R10000 Microprocessor User's Manual (2.0 ed.). MIPS Technologies, Inc. http://www.mips.com.]]Google Scholar
- NEUMANN, J. V. 1945. First draft of a report on the EDVAC. Technical report, University of Pennsylvania.]] Google ScholarDigital Library
- SANDERS, P. 1999. Accessing multiple sequences through set associative caches. In ICALP, Number 1644 in LNCS (1999), pp. 655-664.]] Google ScholarDigital Library
- Sun Microsystems. 1997. UltraSPARC-IIi User's Manual. Sun Microsystems.]]Google Scholar
- VENGROFF, D. E. 1995. TPIE User Manual and Reference. Duke University. http://www. cs.duke.edu/~dev/tpie_home_page.html.]]Google Scholar
- VITTER, J. S. 1998. External memory algorithms. In 6th European Symposium on Algorithms , Number 1461 in LNCS (1998), pp. 1-25. Springer.]] Google ScholarDigital Library
- VITTER, J. S. AND SHRIVER, E. A. M. 1994. Algorithms for parallel memory I: Two level memories. Algorithmica 12, 2-3, 110-147.]]Google Scholar
- WEGENER, I. 1993. BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT, beating, on an average, QUICKSORT (if n is not very small). Theoretical Computer Science 118, 1 (Sept.), 81-98.]] Google ScholarDigital Library
- WEGNER, L. M. AND TEUHOLA, J. I. 1989. The external heapsort. IEEE Transactions on Software Engineering 15, 7 (July), 9-925.]] Google ScholarDigital Library
Index Terms
- Fast priority queues for cached memory
Recommendations
Fast Priority Queues for Cached Memory
ALENEX '99: Selected papers from the International Workshop on Algorithm Engineering and ExperimentationThe cache hierarchy prevalent in todays high performance processors has to be taken into account in order to design algorithms which perform well in practice. We advocates the approach to adapt external memory algorithms to this purpose. We exemplify ...
Revisiting priority queues for image analysis
Many algorithms in image analysis require a priority queue, a data structure that holds pointers to pixels in the image, and which allows efficiently finding the pixel in the queue with the highest priority. However, very few articles describing such ...
An efficient design for fast memory registration in RDMA
Remote Direct Memory Access (RDMA) improves network bandwidth and reduces latency by eliminating unnecessary copies from network interface card to application buffers, but the communication buffer management to reduce memory registration and ...
Comments