Top

The VLDB Journal

Published in:

12-09-2017 | Special Issue Paper

Compressed linear algebra for large-scale machine learning

Authors: Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald

Published in: The VLDB Journal | Issue 5/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work—inspired by database compression and sparse matrix formats—on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to \(9.2\mathrm{x}\).

previous article Package queries: efficient and scalable computation of high-order constraints

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Dummy coding transforms a categorical feature having d possible values into d Boolean features, each indicating the rows in which a given value occurs. The larger the value of d, the greater the sparsity (from adding \(d-1\) zeros per row).

The results with native BLAS libraries would be similar because memory bandwidth and I/O are the bottlenecks.

For consistency with previously published results [32], we use Snappy, which was the default codec in Spark 1.x. However, we also include LZ4, which is the default in Spark 2.x.

For Mnist with its original 10 classes, we created the labels with \(\mathbf {y} \leftarrow (\mathbf {y}==7)\) (i.e., class 7 against the rest), whereas for ImageNet with its 1000 classes, we created the labels with \(\mathbf {y}\leftarrow (\mathbf {y}_0 > (\max (\mathbf {y}_0) - (\max (\mathbf {y}_0)-\min (\mathbf {y}_0))/2))\), where we derived \(\mathbf {y}_0 = \mathbf {X}\mathbf {w}\) from the data \(\mathbf {X}\) and a random model \(\mathbf {w}\).

We enabled code generation for cell-wise operations only because SystemML 0.14 does not yet support operator fusion, i.e., code generation, for compressed matrices.

Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016)

Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001)

Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef

American Statistical Association (ASA). Airline on-time performance dataset. http://stat-computing.org/dataexpo/2009/the-data.html

Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014)

Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015)

Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016)

Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRef

10.

Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009)

11.

Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010)

12.

Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007)

13.

Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009)

14.

Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014)

15.

Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009)

16.

Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)

17.

Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016)

18.

Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993)

19.

Bottou, L.: The infinite MNIST dataset. http://leon.bottou.org/projects/infimnist

20.

Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008)

21.

Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000)

22.

Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017)

23.

Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011)

24.

Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)

25.

Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011)

26.

Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRef

27.

Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017)

28.

Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010)

29.

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)

30.

Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015)

31.

Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017)

32.

Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016)

33.

Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012)

34.

Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)

35.

Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRef

36.

Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991)

37.

Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRef

38.

Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef

39.

Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012)

40.

Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013)

41.

Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013)

42.

Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015)

43.

Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010)

44.

Intel. MKL: Math Kernel Library. https://software.intel.com/en-us/intel-mkl/

45.

Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRef

46.

Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)MATH

47.

Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017)

48.

Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRef

49.

Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014)

50.

Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014)

51.

Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011)

52.

Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008)

53.

Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015)

54.

Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015)

55.

Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016)

56.

Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011)

57.

Lecun, Y.: Deep learning. Nature 521, 436–444 (2015)MathSciNetCrossRef

58.

Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017)

59.

Lichman, M.: UCI machine learning repository: higgs, covertype, US Census (1990). https://archive.ics.uci.edu/ml/

60.

Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017)

61.

Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016)

62.

Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015)

63.

Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRef

64.

NVIDIA. cuSPARSE: CUDA Sparse Matrix Library. https://docs.nvidia.com/cuda/cusparse/

65.

Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016)

66.

O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989)

67.

Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015)

68.

Oracle. Data Warehousing Guide, 11g Release 1 (2007)

69.

Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)

70.

Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015)

71.

Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006)

72.

Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)

73.

Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRef

74.

Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013)

75.

Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017)

76.

Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994)

77.

Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011)

78.

Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016)

79.

Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011)

80.

Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016)

81.

Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)

82.

Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011)

83.

Sysbase. IQ 15.4 System Administration Guide (2013)

84.

Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016)

85.

Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRef

86.

Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012)

87.

Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011)

88.

Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRef

89.

Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRef

90.

Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)

91.

Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007)

92.

Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRef

93.

Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015)

94.

Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016)

95.

Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)

96.

Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014)

97.

Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006)

Title: Compressed linear algebra for large-scale machine learning
Authors: Ahmed Elgohary
Matthias Boehm
Peter J. Haas
Frederick R. Reiss
Berthold Reinwald
Publication date: 12-09-2017
Publisher: Springer Berlin Heidelberg
Published in: The VLDB Journal / Issue 5/2018
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI: https://doi.org/10.1007/s00778-017-0478-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 5/2018

Package queries: efficient and scalable computation of high-order constraints

Efficient generation of query plans containing group-by, join, and groupjoin

Query optimization through the looking glass, and what we found running the Join Order Benchmark

Special issue on best papers of VLDB 2016

Adding data provenance support to Apache Spark

Many-query join: efficient shared execution of relational joins on modern hardware

Premium Partner