research-article

SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Authors:
Thomas Willhalm

Intel GmbH, Munich, Germany

Intel GmbH, Munich, Germany
View Profile

,
Nicolae Popovici

Intel GmbH, Munich, Germany

Intel GmbH, Munich, Germany
View Profile

,
Yazan Boshmaf

SAP AG, Dietmar-Hopp-Allee, Walldorf, Germany

SAP AG, Dietmar-Hopp-Allee, Walldorf, Germany
View Profile

,
Hasso Plattner

University of Potsdam, Potsdam, Germany

University of Potsdam, Potsdam, Germany
View Profile

,
Alexander Zeier

University of Potsdam, Potsdam, Germany

University of Potsdam, Potsdam, Germany
View Profile

,
Jan Schaffner

University of Potsdam, Potsdam, Germany

University of Potsdam, Potsdam, Germany
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 385–394https://doi.org/10.14778/1687627.1687671

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

The availability of huge system memory, even on standard servers, generated a lot of interest in main memory database engines. In data warehouse systems, highly compressed column-oriented data structures are quite prominent. In order to scale with the data volume and the system load, many of these systems are highly distributed with a shared-nothing approach. The fundamental principle of all systems is a full table scan over one or multiple compressed columns. Recent research proposed different techniques to speedup table scans like intelligent compression or using an additional hardware such as graphic cards or FPGAs. In this paper, we show that utilizing the embedded Vector Processing Units (VPUs) found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors. This is achieved without changing the hardware architecture and thereby without additional power consumption. Moreover, as on-chip VPUs directly access the system's RAM, no additional costly copy operations are needed for using the new SIMD-scan approach in standard main memory database engines. Therefore, we propose this scan approach to be used as the standard scan operator for compressed column-oriented main memory storage. We then discuss how well our solution scales with the number of processor cores; consequently, to what degree it can be applied in multi-threaded environments. To verify the feasibility of our approach, we implemented the proposed techniques on a modern Intel multi-core processor using Intel® Streaming SIMD Extensions (Intel® SSE). In addition, we integrated the new SIMD-scan approach into SAP® Netweaver® Business Warehouse Accelerator. We conclude with describing the performance benefits of using our approach for processing and scanning compressed data using VPUs in column-oriented main memory database systems.

References

Westmann, T., Kossmann D., Helmer, S., Moerkkotte, G., "The Implementation and Performance of Compressed Databases," in SIGMOD, vol. 29, no. 3, pp. 55--67, 2000 Google ScholarDigital Library
Harizopoulos S., Liang V., Abadi D., Madden S., "Performance tradeoffs in read-optimized databases," In VLDB, pp. 487--498, 2006 Google ScholarDigital Library
Flynn, M. J., "Very high-speed computing systems," Proceedings of the IEEE, vol. 54, no. 12, pp. 1901--1909, 1966Google ScholarCross Ref
Duncan, R., "A survey of parallel computer architectures," Computer, vol. 23, no. 2, pp. 5--16, Feb 1990 Google ScholarDigital Library
Graefe, G., Shapiro, L. D., "Data Compression and Database Performance," Applied Computing, pp. 22--27, 1991Google Scholar
Zukowski M., Heman S., Nes N., Boncz P., "Super-Scalar RAM-CPU Cache Compression," Data Engineering, International Conference, vol. 0, no. 0, pp. 59, 2006. Google ScholarDigital Library
Holloway A., Raman V., Swart G., DeWitt D., "How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans," In SIGMOD, pp. 389--400, 2007 Google ScholarDigital Library
Qiao, L., Raman, V., Reiss, F., Haas, P. J., and Lohman, G. M., "Main-memory scan sharing for multi-core CPUs," In VLDB, pp. 610--621, 2008 Google ScholarDigital Library
Johnson, R., Raman, V., Sidle, R., and Swart, G., "Row-wise parallel predicate evaluation," In VLDB, pp. 622--634, 2008 Google ScholarDigital Library
Zhou J., Ross K. A., "Implementing database operations using SIMD instructions," In SIGMOD, 2002. Google ScholarDigital Library
Heman S., Nes N., Zukowski M., Boncz P., "Vectorized Data Processing on the Cell Broadband Engine," Data Management on New Hardware, no. 4, 2007 Google ScholarDigital Library
Roth M., Van Horn S., "Database compression," In SIGMOD Record, pp. 31--39, 1993 Google ScholarDigital Library
Goldstein J., Ramakrishnan R., Shaft U., "Compressing relations and indexes," In ICDE, 1998 Google ScholarDigital Library
Abel J., Balasubramanian, K., Bargeron M., Craver T., Phlipot M., "Applications Tuning for Streaming SIMD Extensions," Intel Technology Journal Q2, 1999Google Scholar
Oberman S., Favor G., Weber F., "AMD 3DNow! Technology: Architecture and Implementations," IEEE Micro, vol. 19, pp. 37--48, 1999 Google ScholarDigital Library
Gerber R., Bik A., Smith K., Tian X., "The Software Optimization Cookbook," 2^nd edition, Intel Press Google ScholarDigital Library
SAP AG, https://www.sdn.sap.com/irj/sdn/biaGoogle Scholar

Index Terms

SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Recommendations

Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, ...
Read More
Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Intel® Xeon Phi™ coprocessor is based on the Intel® Many Integrated Core (Intel® MIC) architecture, which is an innovative new processor architecture that combines abundant thread parallelism with long SIMD vector units. Efficiently exploiting SIMD ...
Read More
Effective SIMD vectorization for intel Xeon Phi coprocessors
Special issue on Programming Models, Languages, and Compilers for Manycore and Heterogeneous Architectures

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 108
  Total Citations
  View Citations
- 982
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Rethinking SIMD Vectorization for In-Memory Databases

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Effective SIMD vectorization for intel Xeon Phi coprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Rethinking SIMD Vectorization for In-Memory Databases

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors

Effective SIMD vectorization for intel Xeon Phi coprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Practical SIMD Vectorization Techniques for Intel® Xeon Phi Coprocessors