Skip to main content
Erschienen in: The VLDB Journal 5/2018

12.09.2017 | Special Issue Paper

Compressed linear algebra for large-scale machine learning

verfasst von: Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald

Erschienen in: The VLDB Journal | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work—inspired by database compression and sparse matrix formats—on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to \(9.2\mathrm{x}\).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Dummy coding transforms a categorical feature having d possible values into d Boolean features, each indicating the rows in which a given value occurs. The larger the value of d, the greater the sparsity (from adding \(d-1\) zeros per row).
 
2
The results with native BLAS libraries would be similar because memory bandwidth and I/O are the bottlenecks.
 
3
For consistency with previously published results [32], we use Snappy, which was the default codec in Spark 1.x. However, we also include LZ4, which is the default in Spark 2.x.
 
4
For Mnist with its original 10 classes, we created the labels with \(\mathbf {y} \leftarrow (\mathbf {y}==7)\) (i.e., class 7 against the rest), whereas for ImageNet with its 1000 classes, we created the labels with \(\mathbf {y}\leftarrow (\mathbf {y}_0 > (\max (\mathbf {y}_0) - (\max (\mathbf {y}_0)-\min (\mathbf {y}_0))/2))\), where we derived \(\mathbf {y}_0 = \mathbf {X}\mathbf {w}\) from the data \(\mathbf {X}\) and a random model \(\mathbf {w}\).
 
5
We enabled code generation for cell-wise operations only because SystemML 0.14 does not yet support operator fusion, i.e., code generation, for compressed matrices.
 
Literatur
1.
Zurück zum Zitat Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006) Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)
2.
Zurück zum Zitat Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016) Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016)
3.
Zurück zum Zitat Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001) Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001)
4.
Zurück zum Zitat Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRef
6.
Zurück zum Zitat Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014) Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014)
7.
Zurück zum Zitat Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015) Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015)
8.
Zurück zum Zitat Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016) Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016)
9.
Zurück zum Zitat Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRef Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRef
10.
Zurück zum Zitat Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009) Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009)
11.
Zurück zum Zitat Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010) Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010)
12.
Zurück zum Zitat Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007) Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007)
13.
Zurück zum Zitat Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009) Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009)
14.
Zurück zum Zitat Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014) Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014)
15.
Zurück zum Zitat Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009) Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009)
16.
Zurück zum Zitat Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016) Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)
17.
Zurück zum Zitat Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016) Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016)
18.
Zurück zum Zitat Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993) Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993)
20.
Zurück zum Zitat Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008) Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008)
21.
Zurück zum Zitat Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000) Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000)
22.
Zurück zum Zitat Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017) Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017)
23.
Zurück zum Zitat Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011) Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011)
24.
Zurück zum Zitat Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009) Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
25.
Zurück zum Zitat Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011) Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011)
26.
Zurück zum Zitat Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRef Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRef
27.
Zurück zum Zitat Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017) Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017)
28.
Zurück zum Zitat Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010) Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010)
29.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
30.
Zurück zum Zitat Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015) Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015)
31.
Zurück zum Zitat Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017) Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017)
32.
Zurück zum Zitat Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016) Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016)
33.
Zurück zum Zitat Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012) Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012)
34.
Zurück zum Zitat Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011) Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)
35.
Zurück zum Zitat Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRef Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRef
36.
Zurück zum Zitat Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991) Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991)
37.
Zurück zum Zitat Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRef Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRef
38.
Zurück zum Zitat Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRef
39.
Zurück zum Zitat Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012) Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012)
40.
Zurück zum Zitat Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013) Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013)
41.
Zurück zum Zitat Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013) Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013)
42.
Zurück zum Zitat Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015) Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015)
43.
Zurück zum Zitat Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010) Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010)
45.
Zurück zum Zitat Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRef Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRef
46.
Zurück zum Zitat Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)MATH Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)MATH
47.
Zurück zum Zitat Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017) Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017)
48.
Zurück zum Zitat Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRef Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRef
49.
Zurück zum Zitat Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014) Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014)
50.
Zurück zum Zitat Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014) Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014)
51.
Zurück zum Zitat Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011) Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011)
52.
Zurück zum Zitat Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008) Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008)
53.
Zurück zum Zitat Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015) Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015)
54.
Zurück zum Zitat Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015) Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015)
55.
Zurück zum Zitat Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016) Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016)
56.
Zurück zum Zitat Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011) Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011)
58.
Zurück zum Zitat Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017) Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017)
60.
Zurück zum Zitat Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017) Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017)
61.
Zurück zum Zitat Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016) Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016)
62.
Zurück zum Zitat Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015) Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015)
63.
Zurück zum Zitat Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRef Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRef
65.
Zurück zum Zitat Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016) Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016)
66.
Zurück zum Zitat O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989) O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989)
67.
Zurück zum Zitat Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015) Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015)
68.
Zurück zum Zitat Oracle. Data Warehousing Guide, 11g Release 1 (2007) Oracle. Data Warehousing Guide, 11g Release 1 (2007)
69.
Zurück zum Zitat Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016) Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)
70.
Zurück zum Zitat Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015) Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015)
71.
Zurück zum Zitat Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006) Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006)
72.
Zurück zum Zitat Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013) Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)
73.
Zurück zum Zitat Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRef Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRef
74.
Zurück zum Zitat Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013) Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013)
75.
Zurück zum Zitat Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017) Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017)
76.
Zurück zum Zitat Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994) Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994)
77.
Zurück zum Zitat Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011) Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011)
78.
Zurück zum Zitat Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016) Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016)
79.
Zurück zum Zitat Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011) Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011)
80.
Zurück zum Zitat Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016) Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016)
81.
Zurück zum Zitat Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005) Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)
82.
Zurück zum Zitat Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011) Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011)
83.
Zurück zum Zitat Sysbase. IQ 15.4 System Administration Guide (2013) Sysbase. IQ 15.4 System Administration Guide (2013)
84.
Zurück zum Zitat Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016) Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016)
85.
Zurück zum Zitat Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRef Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRef
86.
Zurück zum Zitat Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012) Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012)
87.
Zurück zum Zitat Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011) Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011)
88.
Zurück zum Zitat Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRef Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRef
89.
Zurück zum Zitat Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRef Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRef
90.
Zurück zum Zitat Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009) Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)
91.
Zurück zum Zitat Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007) Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007)
92.
Zurück zum Zitat Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRef Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRef
93.
Zurück zum Zitat Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015) Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015)
94.
Zurück zum Zitat Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016) Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016)
95.
Zurück zum Zitat Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012) Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)
96.
Zurück zum Zitat Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014) Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014)
97.
Zurück zum Zitat Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006) Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006)
Metadaten
Titel
Compressed linear algebra for large-scale machine learning
verfasst von
Ahmed Elgohary
Matthias Boehm
Peter J. Haas
Frederick R. Reiss
Berthold Reinwald
Publikationsdatum
12.09.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 5/2018
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0478-1

Weitere Artikel der Ausgabe 5/2018

The VLDB Journal 5/2018 Zur Ausgabe

Premium Partner