nach oben

The VLDB Journal

Erschienen in:

18.03.2019 | Regular Paper

Fast and scalable method for distributed Boolean tensor factorization

verfasst von: Namyong Park, Sejoon Oh, U Kang

Erschienen in: The VLDB Journal | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

How can we analyze tensors that are composed of 0’s and 1’s? How can we efficiently analyze such Boolean tensors with millions or even billions of entries? Boolean tensors often represent relationship, membership, or occurrences of events such as subject–relation–object tuples in knowledge base data (e.g., ‘Seoul’-‘is the capital of’-‘South Korea’). Boolean tensor factorization (BTF) is a useful tool for analyzing binary tensors to discover latent factors from them. Furthermore, BTF is known to produce more interpretable and sparser results than normal factorization methods. Although several BTF algorithms exist, they do not scale up for large-scale Boolean tensors. In this paper, we propose DBTF, a distributed method for Boolean CP (DBTF-CP) and Tucker (DBTF-TK) factorizations running on the Apache Spark framework. By distributed data generation with minimal network transfer, exploiting the characteristics of Boolean operations, and with careful partitioning, DBTF successfully tackles the high computational costs and minimizes the intermediate data. Experimental results show that DBTF-CP decomposes up to 16³–32³ \(\times \) larger tensors than existing methods in 82–180 \(\times \) less time, and DBTF-TK decomposes up to 8³–16³ \(\times \) larger tensors than existing methods in 86–129 \(\times \) less time. Furthermore, both DBTF-CP and DBTF-TK exhibit near-linear scalability in terms of tensor dimensionality, density, rank, and machines.

Vorheriger Artikel Scalable computational geometry in MapReduce

Nächster Artikel Prescriptive analytics: a survey of emerging trends and technologies

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

http://socialnetworks.mpi-sws.org/data-wosn2009.html.

http://www.informatik.uni-trier.de/~ley/db/.

http://www.caida.org/data/passive/ddos-20070804_dataset.xml.

http://people.mpi-inf.mpg.de/~pmiettin/src/walknmerge.zip.

http://people.mpi-inf.mpg.de/~pmiettin/src/DBP-progs/.

Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetCrossRefMATH

Erdős, D., Miettinen, P.: Walk ’n’ merge: a scalable algorithm for Boolean tensor factorization. In: ICDM, pp. 1037–1042 (2013)

Miettinen, P.: Boolean tensor factorizations. In: ICDM (2011)

Erdős, D., Miettinen, P.: Discovering facts with Boolean tensor tucker decomposition. In: CIKM, pp. 1569–1572 (2013)

Metzler, S., Miettinen, P.: Clustering Boolean tensors. DMKD 29(5), 1343–1373 (2015)MathSciNetMATH

Belohlávek, R., Glodeanu, C.V., Vychodil, V.: Optimal factorization of three-way binary data using triadic concepts. Order 30(2), 437–454 (2013)MathSciNetCrossRefMATH

Leenen, I., Van Mechelen, I., De Boeck, P., Rosenberg, S.: INDCLAS: a three-way hierarchical classes model. Psychometrika 64(1), 9–24 (1999)CrossRefMATH

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

Park, N., Oh, S., Kang, U.: Fast and scalable distributed Boolean tensor factorization. In: ICDE, pp. 1071–1082 (2017)

10.

Cerf, L., Besson, J., Robardet, C., Boulicaut, J.: Closed patterns meet n-ary relations. TKDD 3(1), 3 (2009)CrossRef

11.

Ji, L., Tan, K., Tung, A.K.H.: Mining frequent closed cubes in 3D datasets. In: VLDB, pp. 811–822 (2006)

12.

Kang, U., Papalexakis, E.E., Harpale, A., Faloutsos, C.: Gigatensor: scaling tensor analysis up by 100 times—algorithms and discoveries. In: KDD, pp. 316–324 (2012)

13.

Jeon, B., Jeon, I., Sael, L., Kang, U.: Scout: scalable coupled matrix-tensor factorization—algorithm and discoveries. In: ICDE, pp. 811–822 (2016)

14.

Jeon, I., Papalexakis, E.E., Faloutsos, C., Sael, L., Kang, U.: Mining billion-scale tensors: algorithms and discoveries. VLDB J. 25(4), 519–544 (2016)CrossRef

15.

Sael, L., Jeon, I., Kang, U.: Scalable tensor mining. Big Data Res. 2(2), 82–86 (2015). (visions on Big Data) CrossRef

16.

Park, N., Jeon, B., Lee, J., Kang, U.: Bigtensor: mining billion-scale tensor made easy. In: CIKM, pp. 2457–2460 (2016)

17.

Beutel, A., Talukdar, P.P., Kumar, A., Faloutsos, C., Papalexakis, E.E., Xing, E.P.: Flexifact: scalable flexible factorization of coupled tensors on hadoop. In: SDM, pp. 109–117 (2014)

18.

Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: Parcube: sparse parallelizable tensor decompositions. In: ECML PKDD, pp. 521–536 (2012)

19.

Li, J., Choi, J., Perros, I., Sun, J., Vuduc, R.: Model-driven sparse CP decomposition for higher-order tensors. In: IPDPS (2017)

20.

Smith, S., Park, J., Karypis, G.: An exploration of optimization algorithms for high performance tensor completion. In: SC (2016)

21.

Karlsson, L., Kressner, D., Uschmajew, A.: Parallel algorithms for tensor completion in the CP format. Parallel Comput. 57, 222–234 (2016)MathSciNetCrossRef

22.

Shin, K., Sael, L., Kang, U.: Fully scalable methods for distributed tensor factorization. TKDE 29(1), 100–113 (2017)

23.

Lathauwer, L.D., Moor, B.D., Vandewalle, J.: On the best rank-1 and rank-(\(R_{1},R_{2},\ldots, R_{N}\)) approximation of higher-order tensors. SIMAX 21(4), 1324–1342 (2000)CrossRefMATH

24.

Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: ICDM, pp. 363–372 (2008)

25.

Oh, J., Shin, K., Papalexakis, E.E., Faloutsos, C., Yu, H.: S-hot: scalable high-order tucker decomposition. In: WSDM (2017)

26.

Smith, S., Karypis, G.: Accelerating the tucker decomposition with compressed sparse tensors. In: Europar (2017)

27.

Kaya, O., Uçar, B.: High performance parallel algorithms for the tucker decomposition of sparse tensors. In: ICPP (2016)

28.

Oh, S., Park, N., Sael, L., Kang, U.: Scalable tucker factorization for sparse tensors—algorithms and discoveries. In: ICDE (2018)

29.

Chakaravarthy, V.T., Choi, J.W., Joseph, D.J., Liu, X., Murali, P., Sabharwal, Y., Sreedhar, D.: On optimizing distributed tucker decomposition for dense tensors (2017). CoRR arXiv:1707.05594

30.

Choi, J.H., Vishwanathan, S.: Dfacto: distributed factorization of tensors. In: NIPS (2014)

31.

Kaya, O., Uçar, B.: Scalable sparse tensor decompositions in distributed memory systems. In: SC, pp. 1–11 (2015)

32.

Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: IPDPS (2016)

33.

Smith, S., Karypis, G.: A medium-grained algorithm for distributed sparse tensor factorization. In: IPDPS (2016)

34.

Acer, S., Torun, T., Aykanat, C.: Improving medium-grain partitioning for scalable sparse tensor decomposition. IEEE Trans. Parallel Distrib. Syst. 29, 2814–2825 (2018)CrossRef

35.

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

36.

Apache hadoop. http://hadoop.apache.org/

37.

Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: ICDM, pp. 229–238 (2009)

38.

Park, H.-M., Park, N., Myaeng, S.-H., Kang, U.: Partition aware connected component computation in distributed systems. In: ICDM (2016)

39.

Park, H.-M., Myaeng, S.-H., Kang, U.: Pte: enumerating trillion triangles on distributed systems. In: KDD, pp. 1115–1124 (2016)

40.

Kang, U., Tong, H., Sun, J., Lin, C., Faloutsos, C.: GBASE: a scalable and general graph management system. In: KDD

41.

Kalavri, V., Vlassov, V.: Mapreduce: limitations, optimizations and open issues. In: TrustCom, pp. 1031–1038 (2013)

42.

Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)

43.

Lulli, A., Ricci, L., Carlini, E., Dazzi, P., Lucchese, C.: Cracker: crumbling large graphs into connected components. In: ISCC, pp. 574–581 (2015)

44.

Wiewiórka, M.S., Messina, A., Pacholewska, A., Maffioletti, S., Gawrysiak, P., Okoniewski, M.J.: Sparkseq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 30(18), 2652–2653 (2014)CrossRef

45.

Gu, R., Tang, Y., Wang, Z., Wang, S., Yin, X., Yuan, C., Huang, Y.: Efficient large scale distributed matrix computation with spark. In: IEEE BigData, pp. 2327–2336 (2015)

46.

Zadeh, R.B., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E.R., Staple, A., Zaharia, M.: Matrix computations and optimization in apache spark. In: KDD, pp. 31–38 (2016)

47.

Kim, H., Park, J., Jang, J., Yoon, S.: Deepspark: spark-based deep learning supporting asynchronous updates and caffe compatibility (2016). CoRR arXiv:1602.08191

48.

Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. TKDE 20(10), 1348–1362 (2008)

Titel: Fast and scalable method for distributed Boolean tensor factorization
verfasst von: Namyong Park
Sejoon Oh
U Kang
Publikationsdatum: 18.03.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: The VLDB Journal / Ausgabe 4/2019
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI: https://doi.org/10.1007/s00778-019-00538-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2019

Scalable computational geometry in MapReduce

A framework for efficient multi-attribute movement data analysis

Fast diversified coherent core search on multi-layer graphs

Prescriptive analytics: a survey of emerging trends and technologies

Parametric schema inference for massive JSON datasets

Compact representations of temporal databases

Premium Partner