Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 1/2016

01.01.2016

Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions

verfasst von: Mijung Kim, K. Selçuk Candan

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For many multi-dimensional data applications, tensor operations as well as relational operations both need to be supported throughout the data lifecycle. Tensor based representations (including two widely used tensor decompositions, CP and Tucker decompositions) are proven to be effective in multi-aspect data analysis and tensor decomposition is an important tool for capturing high-order structures in multi-dimensional data. Although tensor decomposition is shown to be effective for multi-dimensional data analysis, the cost of tensor decomposition is often very high. Since the number of modes of the tensor data is one of the main factors contributing to the costs of the tensor operations, in this paper, we focus on reducing the modality of the input tensors to tackle the computational cost of the tensor decomposition process. We propose a novel decomposition-by-normalization scheme that first normalizes the given relation into smaller tensors based on the functional dependencies of the relation, decomposes these smaller tensors, and then recombines the sub-results to obtain the overall decomposition. The decomposition and recombination steps of the decomposition-by-normalization scheme fit naturally in settings with multiple cores. This leads to a highly efficient, effective, and parallelized decomposition-by-normalization algorithm for both dense and sparse tensors for CP and Tucker decompositions. Experimental results confirm the efficiency and effectiveness of the proposed decomposition-by-normalization scheme compared to the conventional nonnegative CP decomposition and Tucker decomposition approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that the cost increases linearly in the size of the input relation (Huhtala et al. 1999).
Table 10
Different attribute sets, join attributes (\(X\)), supports of \(X\) (the lowest of all the supports of \(X \rightarrow *\)), and execution times for FDs discovery for D1-D18 where \(A_n\) is the \(n\)th attribute of each data set
Data set
Attributes
Join attr. (\(X\))
Support of \(X\) (%)
Exec. time for FDs
D1
\(\{A_{11},A_{12},A_3,A_9,A_{10}\}\)
\(A_3\)
\(97\)
0.024s
D2
\(\{A_2,A_3,A_9,A_8,A_4\}\)
\(A_3\)
\(80\)
0.022s
D3
\(\{A_1,A_3,A_{12},A_{15},A_{10}\}\)
\(A_3\)
\(80\)
0.025s
D4
\(\{A_3,A_7,A_{15},A_8,A_{13}\}\)
\(A_3\)
\(75\)
0.023s
D5
\(\{A_3,A_9,A_{15},A_{12},A_1\}\)
\(A_3\)
\(80\)
0.023s
D6
\(\{A_1,A_4,A_7,A_{11},A_6\}\)
\(A_1\)
\(96\)
0.004s
D7
\(\{A_4,A_1,A_{10},A_8,A_9\}\)
\(A_1\)
\(96\)
0.003s
D8
\(\{A_6,A_5,A_7,A_8,A_1\}\)
\(A_1\)
\(96\)
0.002s
D9
\(\{A_{11},A_9,A_6,A_3,A_1\}\)
\(A_1\)
\(98\)
0.003s
D10
\(\{A_5,A_4,A_1,A_{10},A_8\}\)
\(A_1\)
\(96\)
0.003s
D11
\(\{A_8,A_{17},A_{19},A_3,A_2\}\)
\(A_8\)
\(99\)
0.007s
D12
\(\{A_{53},A_2,A_{21},A_3,A_4\}\)
\(A_{53}\)
\(98\)
0.006s
D13
\(\{A_{13},A_{48},A_{17},A_{14},A_2\}\)
\(A_{13}\)
\(98\)
0.005s
D14
\(\{A_4,A_9,A_{18},A_{17},A_2\}\)
\(A_2\)
\(88\)
0.004s
D15
\(\{A_{34},A_{24},A_{33},A_{25},A_{11}\}\)
\(A_{34}\)
\(80\)
0.002s
D16
\(\{A_{11},A_{12},A_{3},A_{9},A_{10}\}\)
\(A_{3}\)
\(98\)
0.024s
D17
4-mode
\(\{A_{11},A_{12},A_{3},A_{13}\}\)
\(A_{3}\)
\(96\)
0.024s
5-mode
\(\{A_{11},A_{12},A_{3},A_{13},A_{1}\}\)
6-mode
\(\{A_{11},A_{12},A_{3},A_{13},A_{1},A_{14}\}\)
D18
4-mode
\(\{A_{49},A_{50},A_{51},A_{54}\}\)
\(A_{50}\)
\(95\)
0.007s
5-mode
\(\{A_{8},A_{49},A_{50},A_{51},A_{54}\}\)
6-mode
\(\{A_{8},A_{49},A_{50},A_{51},A_{54},A_{58}\}\)
 
2
Note that the fit is low in this experiment due to the extremely tight target rank (10) used for the decomposition for a very high dimensional tensor. The fit obtained by the conventional tensor decomposition technique, NNCP-CP-ALS, on the same tensor with the same rank is also similarly low, 0.0028.
 
Literatur
Zurück zum Zitat Allen GI (2012) Sparse higher-order principal components analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS) Allen GI (2012) Sparse higher-order principal components analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS)
Zurück zum Zitat Antikainen J, Havel J, Josth JR, Herout A, Zemcik P, Hauta-Kasari M (2011) Nonnegative tensor factorization accelerated using gpgpu. IEEE Trans Parallel Distrib Syst 22(7):1135–1141CrossRef Antikainen J, Havel J, Josth JR, Herout A, Zemcik P, Hauta-Kasari M (2011) Nonnegative tensor factorization accelerated using gpgpu. IEEE Trans Parallel Distrib Syst 22(7):1135–1141CrossRef
Zurück zum Zitat Bader BW, Kolda TG (2006) Efficient matlab computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories Bader BW, Kolda TG (2006) Efficient matlab computations with sparse and factored tensors. Technical Report SAND2006-7592, Sandia National Laboratories
Zurück zum Zitat Carroll J, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika 35:283–319MATHCrossRef Carroll J, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition. Psychometrika 35:283–319MATHCrossRef
Zurück zum Zitat Chu W, Ghahramani Z (2009) Probabilistic models for incomplete multi-dimensional arrays. In: Proceedings of the 12th international conference on artificial intelligence and statistics Chu W, Ghahramani Z (2009) Probabilistic models for incomplete multi-dimensional arrays. In: Proceedings of the 12th international conference on artificial intelligence and statistics
Zurück zum Zitat Elmasri R, Navathe SB (1994) Fundamentals of database systems, 2nd edn. Benjamin-Cummings, Redwood CityMATH Elmasri R, Navathe SB (1994) Fundamentals of database systems, 2nd edn. Benjamin-Cummings, Redwood CityMATH
Zurück zum Zitat Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, New YorkMATH Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, New YorkMATH
Zurück zum Zitat Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16(1):84 Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16(1):84
Zurück zum Zitat Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2):100–111MATHCrossRef Huhtala Y, Kärkkäinen J, Porkka P, Toivonen H (1999) Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2):100–111MATHCrossRef
Zurück zum Zitat Ilyas IF, Markl V, Haas PJ, Brown P, Aboulnaga (2004) A Cords: automatic discovery of correlations and soft functional dependencies. In: SIGMOD conference, pp. 647–658 Ilyas IF, Markl V, Haas PJ, Brown P, Aboulnaga (2004) A Cords: automatic discovery of correlations and soft functional dependencies. In: SIGMOD conference, pp. 647–658
Zurück zum Zitat Karmarker N, Karp RM (1983) The differencing method of set partitioning. Technical report, Berkeley Karmarker N, Karp RM (1983) The differencing method of set partitioning. Technical report, Berkeley
Zurück zum Zitat Kim M, Candan KS (2011) Approximate tensor decomposition within a tensor-relational algebraic framework. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp. 1737–1742 doi:10.1145/2063576.2063827 Kim M, Candan KS (2011) Approximate tensor decomposition within a tensor-relational algebraic framework. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp. 1737–1742 doi:10.​1145/​2063576.​2063827
Zurück zum Zitat Kolda T, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE international conference on data mining, pp. 363–372. doi:10.1109/ICDM.2008.89 Kolda T, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE international conference on data mining, pp. 363–372. doi:10.​1109/​ICDM.​2008.​89
Zurück zum Zitat Kolda TG, Bader BW, Kenny JP (2005) Higher-order web link analysis using multilinear algebra. In: Proceedings of the 5th IEEE international conference on data mining, pp. 242–249. doi:10.1109/ICDM.2005.77 Kolda TG, Bader BW, Kenny JP (2005) Higher-order web link analysis using multilinear algebra. In: Proceedings of the 5th IEEE international conference on data mining, pp. 242–249. doi:10.​1109/​ICDM.​2005.​77
Zurück zum Zitat Kruskal JB (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebr Appl 18(2):95–138MATHMathSciNetCrossRef Kruskal JB (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebr Appl 18(2):95–138MATHMathSciNetCrossRef
Zurück zum Zitat Lopes S, Petit JM, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: Proceedings of the 7th international conference on extending database technology: advances in database technology, EDBT ’00. Springer, London, pp. 350–364 Lopes S, Petit JM, Lakhal L (2000) Efficient discovery of functional dependencies and armstrong relations. In: Proceedings of the 7th international conference on extending database technology: advances in database technology, EDBT ’00. Springer, London, pp. 350–364
Zurück zum Zitat Mahoney MW, Maggioni M, Drineas P (2006) Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 327–336. doi:10.1145/1150402.1150440 Mahoney MW, Maggioni M, Drineas P (2006) Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 327–336. doi:10.​1145/​1150402.​1150440
Zurück zum Zitat Mangasarian OL, Wolberg WH (1990) Cancer diagnosis via linear programming. SIAM News 23(5):1–18 Mangasarian OL, Wolberg WH (1990) Cancer diagnosis via linear programming. SIAM News 23(5):1–18
Zurück zum Zitat Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef
Zurück zum Zitat Sun J, Papadimitriou S, Lin CY, Cao N, Liu S, Qian W (2009) Multivis: content-based social network exploration through multi-way visual analysis. In: Proceedings SDM, vol 9. SIAM, pp. 1063–1074 Sun J, Papadimitriou S, Lin CY, Cao N, Liu S, Qian W (2009) Multivis: content-based social network exploration through multi-way visual analysis. In: Proceedings SDM, vol 9. SIAM, pp. 1063–1074
Zurück zum Zitat Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp. 990–998 Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp. 990–998
Zurück zum Zitat Tsourakakis CE (2010) Mach: fast randomized tensor decompositions. In: Proceedings of the 10th SIAM International Conference on Data Mining, pp. 689–700 Tsourakakis CE (2010) Mach: fast randomized tensor decompositions. In: Proceedings of the 10th SIAM International Conference on Data Mining, pp. 689–700
Zurück zum Zitat Wyss C, Giannella C, Robertson EL (2001) Fastfds: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances: extended abstract. In: Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK ’01. Springer, London, pp 101–110 Wyss C, Giannella C, Robertson EL (2001) Fastfds: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances: extended abstract. In: Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK ’01. Springer, London, pp 101–110
Zurück zum Zitat Xu Z, Yan F, Qi A (2012) Infinite tucker decomposition: nonparametric bayesian models for multiway data analysis. In: ICML. icml.cc/Omnipress Xu Z, Yan F, Qi A (2012) Infinite tucker decomposition: nonparametric bayesian models for multiway data analysis. In: ICML. icml.cc/Omnipress
Zurück zum Zitat Zhang Q, Berry M, Lamb B, Samuel T, Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (2009) A parallel nonnegative tensor factorization algorithm for mining global climate data, vol 5545. Springer, Berlin/Heidelberg, pp. 405–415 Zhang Q, Berry M, Lamb B, Samuel T, Allen G, Nabrzyski J, Seidel E, van Albada G, Dongarra J, Sloot P (2009) A parallel nonnegative tensor factorization algorithm for mining global climate data, vol 5545. Springer, Berlin/Heidelberg, pp. 405–415
Zurück zum Zitat Zhou G, He Z, Zhang Y, Zhao Q, Cichocki A (2009) Canonical polyadic decomposition: from 3-way to n-way. In: Eighth international conference on computational intelligence and security (CIS), pp 391–395. doi:10.1109/CIS.2012.94 Zhou G, He Z, Zhang Y, Zhao Q, Cichocki A (2009) Canonical polyadic decomposition: from 3-way to n-way. In: Eighth international conference on computational intelligence and security (CIS), pp 391–395. doi:10.​1109/​CIS.​2012.​94
Metadaten
Titel
Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions
verfasst von
Mijung Kim
K. Selçuk Candan
Publikationsdatum
01.01.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 1/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0401-6

Weitere Artikel der Ausgabe 1/2016

Data Mining and Knowledge Discovery 1/2016 Zur Ausgabe

Premium Partner