Skip to main content
Top
Published in: The VLDB Journal 5/2018

12-09-2017 | Special Issue Paper

Compressed linear algebra for large-scale machine learning

Authors: Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald

Published in: The VLDB Journal | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work—inspired by database compression and sparse matrix formats—on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to \(9.2\mathrm{x}\).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Dummy coding transforms a categorical feature having d possible values into d Boolean features, each indicating the rows in which a given value occurs. The larger the value of d, the greater the sparsity (from adding \(d-1\) zeros per row).
 
2
The results with native BLAS libraries would be similar because memory bandwidth and I/O are the bottlenecks.
 
3
For consistency with previously published results [32], we use Snappy, which was the default codec in Spark 1.x. However, we also include LZ4, which is the default in Spark 2.x.
 
4
For Mnist with its original 10 classes, we created the labels with \(\mathbf {y} \leftarrow (\mathbf {y}==7)\) (i.e., class 7 against the rest), whereas for ImageNet with its 1000 classes, we created the labels with \(\mathbf {y}\leftarrow (\mathbf {y}_0 > (\max (\mathbf {y}_0) - (\max (\mathbf {y}_0)-\min (\mathbf {y}_0))/2))\), where we derived \(\mathbf {y}_0 = \mathbf {X}\mathbf {w}\) from the data \(\mathbf {X}\) and a random model \(\mathbf {w}\).
 
5
We enabled code generation for cell-wise operations only because SystemML 0.14 does not yet support operator fusion, i.e., code generation, for compressed matrices.
 
Literature
1.
go back to reference Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006) Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)
2.
go back to reference Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016) Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016)
3.
go back to reference Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001) Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001)
4.
go back to reference Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef
6.
go back to reference Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014) Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014)
7.
go back to reference Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015) Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015)
8.
go back to reference Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016) Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016)
9.
go back to reference Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRef Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRef
10.
go back to reference Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009) Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009)
11.
go back to reference Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010) Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010)
12.
go back to reference Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007) Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007)
13.
go back to reference Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009) Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009)
14.
go back to reference Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014) Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014)
15.
go back to reference Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009) Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009)
16.
go back to reference Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016) Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)
17.
go back to reference Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016) Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016)
18.
go back to reference Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993) Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993)
20.
go back to reference Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008) Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008)
21.
go back to reference Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000) Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000)
22.
go back to reference Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017) Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017)
23.
go back to reference Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011) Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011)
24.
go back to reference Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009) Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
25.
go back to reference Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011) Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011)
26.
go back to reference Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRef Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRef
27.
go back to reference Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017) Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017)
28.
go back to reference Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010) Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010)
29.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
30.
go back to reference Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015) Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015)
31.
go back to reference Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017) Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017)
32.
go back to reference Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016) Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016)
33.
go back to reference Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012) Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012)
34.
go back to reference Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011) Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)
35.
go back to reference Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRef Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRef
36.
go back to reference Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991) Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991)
37.
go back to reference Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRef Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRef
38.
go back to reference Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef
39.
go back to reference Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012) Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012)
40.
go back to reference Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013) Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013)
41.
go back to reference Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013) Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013)
42.
go back to reference Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015) Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015)
43.
go back to reference Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010) Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010)
45.
go back to reference Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRef Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRef
46.
go back to reference Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)MATH Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)MATH
47.
go back to reference Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017) Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017)
48.
go back to reference Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRef Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRef
49.
go back to reference Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014) Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014)
50.
go back to reference Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014) Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014)
51.
go back to reference Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011) Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011)
52.
go back to reference Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008) Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008)
53.
go back to reference Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015) Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015)
54.
go back to reference Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015) Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015)
55.
go back to reference Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016) Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016)
56.
go back to reference Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011) Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011)
58.
go back to reference Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017) Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017)
60.
go back to reference Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017) Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017)
61.
go back to reference Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016) Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016)
62.
go back to reference Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015) Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015)
63.
go back to reference Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRef Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRef
65.
go back to reference Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016) Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016)
66.
go back to reference O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989) O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989)
67.
go back to reference Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015) Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015)
68.
go back to reference Oracle. Data Warehousing Guide, 11g Release 1 (2007) Oracle. Data Warehousing Guide, 11g Release 1 (2007)
69.
go back to reference Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016) Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)
70.
go back to reference Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015) Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015)
71.
go back to reference Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006) Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006)
72.
go back to reference Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013) Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
73.
go back to reference Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRef Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRef
74.
go back to reference Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013) Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013)
75.
go back to reference Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017) Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017)
76.
go back to reference Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994) Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994)
77.
go back to reference Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011) Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011)
78.
go back to reference Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016) Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016)
79.
go back to reference Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011) Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011)
80.
go back to reference Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016) Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016)
81.
go back to reference Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005) Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)
82.
go back to reference Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011) Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011)
83.
go back to reference Sysbase. IQ 15.4 System Administration Guide (2013) Sysbase. IQ 15.4 System Administration Guide (2013)
84.
go back to reference Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016) Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016)
85.
go back to reference Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRef Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRef
86.
go back to reference Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012) Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012)
87.
go back to reference Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011) Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011)
88.
go back to reference Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRef Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRef
89.
go back to reference Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRef Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRef
90.
go back to reference Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009) Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)
91.
go back to reference Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007) Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007)
92.
go back to reference Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRef Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRef
93.
go back to reference Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015) Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015)
94.
go back to reference Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016) Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016)
95.
go back to reference Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)
96.
go back to reference Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014) Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014)
97.
go back to reference Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006) Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006)
Metadata
Title
Compressed linear algebra for large-scale machine learning
Authors
Ahmed Elgohary
Matthias Boehm
Peter J. Haas
Frederick R. Reiss
Berthold Reinwald
Publication date
12-09-2017
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 5/2018
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0478-1

Other articles of this Issue 5/2018

The VLDB Journal 5/2018 Go to the issue

Premium Partner