ABSTRACT
Modern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was initially built into commodity processors in order to accelerate the performance of multimedia applications. SIMD instructions provide new opportunities for database engine design and implementation. We study various kinds of operations in a database context, and show how the inner loop of the operations can be accelerated using SIMD instructions. The use of SIMD instructions has two immediate performance benefits: It allows a degree of parallelism, so that many operands can be processed at once. It also often leads to the elimination of conditional branch instructions, reducing branch mispredictions.We consider the most important database operations, including sequential scans, aggregation, index operations, and joins. We present techniques for implementing these using SIMD instructions. We show that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of SIMD technology. Our study shows that using a SIMD parallelism of four, the CPU time for the new algorithms is from 10% to more than four times less than for the traditional algorithms. Superlinear speedups are obtained as a result of the elimination of branch misprediction effects.
- http://lava.cs.virginia.edu/bpred.html.Google Scholar
- Sybase IQ. http://www.sybase.com/products/archivedproducts/sybaseiq.Google Scholar
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In Proceedings of VLDB Conference, 2001. Google ScholarDigital Library
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In Proceedings of VLDB conference, 1999. Google ScholarDigital Library
- P. Boncz, S. Manegold, and M. Kersten. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of VLDB Conference, 1999. Google ScholarDigital Library
- S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. In Proceedings of ACM SIGMOD Conference, 2001. Google ScholarDigital Library
- D. Comer. The ubiquitous B-tree. ACM Computing Surveys, 11(2):121-137, 1979. Google ScholarDigital Library
- D. J. DeWitt, J. Naughton, and D. Schneider. A comparison of non-equijoin algorithms. In Proceedings of VLDB Conference, 1991. Google ScholarDigital Library
- M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen. A reliable randomized algorithm for the closest-pair problem. J. Algorithms, 25:19-51, 1997. Google ScholarDigital Library
- R. Finkel and J. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Informatica, 4:1-9, 1974.Google ScholarDigital Library
- G. Graefe and P. Larson. B-tree indexes and CPU caches. In Proceedings of ICDE Conference, 2001. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. ACM SIGMOD on Management of Data, pages 47-54, 1984. Google ScholarDigital Library
- InfoCharger Engine. Optimization for decision support solutions. 1998. (available from http://www.tandem.com/prod_des/ifchegpd/ifchegpd.htm).Google Scholar
- Intel Inc. Intel architecture software developer's manual. 2001.Google Scholar
- Intel Inc. Intel C++ compiler user's manual. 2001.Google Scholar
- Intel Inc. Intel IA64 architecture software developer's manual. 2001.Google Scholar
- K. Kim, S. K. Cha, and K. Kwon. Optimizing multidimensional index trees for main memory access. In Proceedings of ACM SIGMOD Conference, 2001. Google ScholarDigital Library
- S. Manegold, P. Boncz, and M. Kersten. What happens during a join? Dissecting CPU and memory optimization effects. In Proceedings of VLDB Conference, 2000. Google ScholarDigital Library
- P. O'Neil and D. Quass. Improved query performance with variant indexes. In Proceedings of ACM SIGMOD Conference, 1997. Google ScholarDigital Library
- D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach, Second Edition. Morgan Kaugmann, 1996. Google ScholarDigital Library
- J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of VLDB Conference, 1999. Google ScholarDigital Library
- J. Rao and K. A. Ross. Making B+ trees cache conscious in main memory. In Proceedings of ACM SIGMOD Conference, 2000. Google ScholarDigital Library
- J. Robinson. The K-D-B tree: A search structure for large multidimensional dynamic indexes. ACM SIGMOD on Management of Data, pages 10-18, 1981. Google ScholarDigital Library
- K. A. Ross. Conjunctive selection conditions in main memory. In ACM Symposium on Principles of Database Systems, 2002. Google ScholarDigital Library
- N. Slingerland and A. J. Smith. Multimedia extensions for general purpose microprocessors: A survey. Technical report CSD-00-1124, University of California at Berkeley, 2000. Google ScholarDigital Library
- H. R. Strong, G. Markovsky, and A. K. Chandra. Search within a page. JACM, 26(3), 1979. Google ScholarDigital Library
- SUN Microsystem Inc. VIS instruction set user's manual. 2000.Google Scholar
- Times Ten Performance Software Inc. TimesTen 4.3 and Front-Tier 2.3 product descriptions, 2002. Available at http://www.timesten.com.Google Scholar
Index Terms
- Implementing database operations using SIMD instructions
Recommendations
Retargetable code optimization with SIMD instructions
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesisRetargetable C compilers are nowadays widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. One frequent concern about retargetable compilers, though, is their lack of machine-...
Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions
Three-dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geometry processing of 3D graphics on the host CPU and have specialized hardware ...
Automatic generation of custom SIMD instructions for superword level parallelism
DATE '14: Proceedings of the conference on Design, Automation & Test in EuropeApplication specific instruction-set processors (ASIPs) have drawn significant attention from System-on-a-Chip (SoC) community due to the capability of fine grain flexibility and customizability. In order to maximize the benefit of ASIP, automatic ...
Comments