Skip to main content

2019 | OriginalPaper | Buchkapitel

Text Mining with Constrained Tensor Decomposition

verfasst von : Elaheh Sobhani, Pierre Comon, Christian Jutten, Massoud Babaie-Zadeh

Erschienen in: Machine Learning, Optimization, and Data Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text mining, as a special case of data mining, refers to the estimation of knowledge or parameters necessary for certain purposes, such as unsupervised clustering by observing various documents. In this context, the topic of a document can be seen as a hidden variable, and words are multi-view variables related to each other by a topic. The main goal in this paper is to estimate the probability of topics, and conditional probability of words given topics. To this end, we use non negative Canonical Polyadic (CP) decomposition of a third order moment tensor of observed words. Our computer simulations show that the proposed algorithm has better performance compared to a previously proposed algorithm, which utilizes the Robust tensor power method after whitening by second order moment. Moreover, as our cost function includes the non negativity constraint on estimated probabilities, we never obtain negative values in our estimated probabilities, whereas it is often the case with the power method combined with deflation. In addition, our algorithm is capable of handling over-complete cases, where the number of hidden variables is larger than that of multi-view variables, contrary to deflation-based techniques. Further, the method proposed therein supports a larger over-completeness compared to modified versions of the tensor power method, which has been customized to handle over-complete case.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
mode-k product between tensors.
 
2
Recall that there exist several definitions of tensor eigenvectors [19, 23]. The definition used in [2] – and hence here – is \(\varvec{\mathcal {T}}\mathop {\bullet }\varvec{v}\mathop {\bullet }\varvec{v}=\lambda \varvec{v}\), which has the undesirable property that \(\lambda \) depends on the norm of \(\varvec{v}\). In fact, if \((\lambda ,\varvec{v})\) is an eigenpair, then so is \((\alpha \lambda ,\alpha \varvec{v})\) for any nonzero \(\alpha \). This is analyzed in e.g. [23].
 
3
With this goal, the matlab function cumsum() can be used.
 
4
Results on other particular realizations may be found in [25].
 
Literatur
1.
Zurück zum Zitat Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.: A spectral algorithm for latent Dirichlet allocation. NIPS 25, 1–9 (2012)MATH Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.: A spectral algorithm for latent Dirichlet allocation. NIPS 25, 1–9 (2012)MATH
2.
Zurück zum Zitat Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15, 2773–2832 (2014)MathSciNetMATH Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15, 2773–2832 (2014)MathSciNetMATH
3.
Zurück zum Zitat Anandkumar, A., Ge, R., Janzamin, M.: Analyzing tensor power method dynamics in overcomplete regime. J. Mach. Learn. Res. 18, 1–40 (2017)MathSciNetMATH Anandkumar, A., Ge, R., Janzamin, M.: Analyzing tensor power method dynamics in overcomplete regime. J. Mach. Learn. Res. 18, 1–40 (2017)MathSciNetMATH
4.
Zurück zum Zitat Anandkumar, A., Hsu, D., Janzamin, M., Kakade, S.: When are overcomplete topic models identifiable? Uniqueness of tensor Tucker decompositions with structured sparsity. J. Mach. Learn. Res. 16, 2643–2694 (2015)MathSciNetMATH Anandkumar, A., Hsu, D., Janzamin, M., Kakade, S.: When are overcomplete topic models identifiable? Uniqueness of tensor Tucker decompositions with structured sparsity. J. Mach. Learn. Res. 16, 2643–2694 (2015)MathSciNetMATH
5.
Zurück zum Zitat Anandkumar, A., Hsu, D., Kakade, S.M.: A method of moments for mixture models and hidden Markov models. In: 25th Conference on Learning Theory, vol. 23, pp. 33.1–33.34 (2012) Anandkumar, A., Hsu, D., Kakade, S.M.: A method of moments for mixture models and hidden Markov models. In: 25th Conference on Learning Theory, vol. 23, pp. 33.1–33.34 (2012)
6.
Zurück zum Zitat Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)CrossRef Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)CrossRef
7.
Zurück zum Zitat Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)MathSciNetCrossRef Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)MathSciNetCrossRef
8.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)CrossRef Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)CrossRef
9.
Zurück zum Zitat Chang, K.W., Yih, W.T., Meek, C.: Multi-relational latent semantic analysis. In: Empirical Methods in Natural Language Proceedings, Seattle, pp. 1602–1612, October 2013 Chang, K.W., Yih, W.T., Meek, C.: Multi-relational latent semantic analysis. In: Empirical Methods in Natural Language Proceedings, Seattle, pp. 1602–1612, October 2013
10.
Zurück zum Zitat Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation. Academic Press, Oxford, Burlington (2010) Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation. Academic Press, Oxford, Burlington (2010)
11.
Zurück zum Zitat Comon, P.: Independent component analysis, a new concept? Sig. Proc. 36(3), 287–314 (1994)CrossRef Comon, P.: Independent component analysis, a new concept? Sig. Proc. 36(3), 287–314 (1994)CrossRef
12.
Zurück zum Zitat Comon, P.: Tensors: a brief introduction. IEEE Sig. Proc. Mag. 31(3), 44–53 (2014)CrossRef Comon, P.: Tensors: a brief introduction. IEEE Sig. Proc. Mag. 31(3), 44–53 (2014)CrossRef
13.
Zurück zum Zitat Comon, P., Luciani, X., De Almeida, A.L.: Tensor decompositions, alternating least squares and other tales. J. Chemom. 23(7–8), 393–405 (2009)CrossRef Comon, P., Luciani, X., De Almeida, A.L.: Tensor decompositions, alternating least squares and other tales. J. Chemom. 23(7–8), 393–405 (2009)CrossRef
14.
Zurück zum Zitat Cullum, J., Willoughby, R.: Lanczos Algorithms, vol. I. Birkhauser, Basel (1985)MATH Cullum, J., Willoughby, R.: Lanczos Algorithms, vol. I. Birkhauser, Basel (1985)MATH
15.
Zurück zum Zitat De Lathauwer, L., Comon, P., et al.: Higher-order power method, application in Independent Component Analysis. In: NOLTA, Las Vegas, pp. 91–96, December 1995 De Lathauwer, L., Comon, P., et al.: Higher-order power method, application in Independent Component Analysis. In: NOLTA, Las Vegas, pp. 91–96, December 1995
16.
Zurück zum Zitat De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-(r 1, r 2, ..., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000) De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-(r 1, r 2, ..., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000)
17.
Zurück zum Zitat Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)MATH Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)MATH
18.
Zurück zum Zitat Huang, K., Sidiropoulos, N., et al.: A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. IEEE Trans. Sig. Proc. 64(19), 5052–5065 (2016)MathSciNetCrossRef Huang, K., Sidiropoulos, N., et al.: A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. IEEE Trans. Sig. Proc. 64(19), 5052–5065 (2016)MathSciNetCrossRef
19.
Zurück zum Zitat Lim, L.H.: Singular values and eigenvalues of tensors: a variational approach. In: IEEE SAM Workshop, Puerto Vallarta, Mexico, 13–15 December 2005 Lim, L.H.: Singular values and eigenvalues of tensors: a variational approach. In: IEEE SAM Workshop, Puerto Vallarta, Mexico, 13–15 December 2005
20.
Zurück zum Zitat Lim, L.H., Comon, P.: Nonnegative approximations of nonnegative tensors. J. Chemom. 23, 432–441 (2009)CrossRef Lim, L.H., Comon, P.: Nonnegative approximations of nonnegative tensors. J. Chemom. 23, 432–441 (2009)CrossRef
21.
Zurück zum Zitat Murphy, K.P.: Machine Learning. A Probabilistic Perspective. MIT Press, London (2012)MATH Murphy, K.P.: Machine Learning. A Probabilistic Perspective. MIT Press, London (2012)MATH
22.
Zurück zum Zitat Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)CrossRef Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)CrossRef
23.
Zurück zum Zitat Qi, L., Luo, Z.: Tensor Analysis. SIAM, Spectral Theory and Special Tensors (2017) Qi, L., Luo, Z.: Tensor Analysis. SIAM, Spectral Theory and Special Tensors (2017)
24.
Zurück zum Zitat Ramage, D., Manning, C.D., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–465. ACM (2011) Ramage, D., Manning, C.D., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–465. ACM (2011)
25.
Zurück zum Zitat Sobhani, E., Comon, P., Babaie-Zadeh, M.: Data mining with tensor decompositions. In: Gretsi. Lille, 26–29 August 2019 Sobhani, E., Comon, P., Babaie-Zadeh, M.: Data mining with tensor decompositions. In: Gretsi. Lille, 26–29 August 2019
26.
Zurück zum Zitat Sobhani, E., Sadeghi, M., Babaie-Zadeh, M., Jutten, C.: A robust ellipse fitting algorithm based on sparsity of outliers. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1195–1199. IEEE (2017) Sobhani, E., Sadeghi, M., Babaie-Zadeh, M., Jutten, C.: A robust ellipse fitting algorithm based on sparsity of outliers. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 1195–1199. IEEE (2017)
27.
Zurück zum Zitat Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Hoboken (2009)MATH Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Hoboken (2009)MATH
Metadaten
Titel
Text Mining with Constrained Tensor Decomposition
verfasst von
Elaheh Sobhani
Pierre Comon
Christian Jutten
Massoud Babaie-Zadeh
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-37599-7_19

Premium Partner