skip to main content
research-article

Incremental tensor analysis: Theory and applications

Published:27 October 2008Publication History
Skip Abstract Section

Abstract

How do we find patterns in author-keyword associations, evolving over time? Or in data cubes (tensors), with product-branchcustomer sales information? And more generally, how to summarize high-order data cubes (tensors)? How to incrementally update these patterns over time? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example).

We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity.

We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets.

References

  1. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R., Imieliński, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ding, C. and Ye, J. 2005. Two-Dimensional singular value decomposition (2dsvd) for 2d maps and images. In SDM.Google ScholarGoogle Scholar
  5. Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Drineas, P. and Mahoney, M. W. 2005. A randomized algorithm for a tensor-based generalization of the SVD. Tech. Rep. YALEU/DCS/TR-1327, Yale University.Google ScholarGoogle Scholar
  8. Enron Dataset. 2004. U.C. Berkeley Enron email analysis. http://bailando.sims.berkeley.edu/enron_email.html.Google ScholarGoogle Scholar
  9. Foltz, P. W. and Dumais, S. T. 1992. Personalized information delivery: An analysis of information filtering methods. Commun. ACM 35, 12, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ganti, V., Gehrke, J., and Ramakrishnan, R. 2002. Mining data streams under block evolution. SIGKDD Explor. 3, 2, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guha, S., Gunopulos, D., and Koudas, N. 2003. Correlating synchronous and asynchronous data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 529--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. 2003. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 15, 3, 515--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Haykin, S. 1991. Adaptive Filter Theory, 2nd ed. Prentice-Hall, Upper Saddle River, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. He, X., Cai, D., and Niyogi, P. 2005. Tensor subspace analysis. In Proceeding of the Conference on Advance, in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  15. Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series data sets using sketches. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, San Francisco, CA, 363--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jolliffe, I. 2002. Principal Component Analysis. Springer.Google ScholarGoogle Scholar
  17. Kannan, R., Vempala, S., and Veta, A. 2000. On clusterings-good, bad and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, Washington, DC, 367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karypis, G. and Kumar, V. 1999. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev. 41, 2, 278--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kolda, T. G. 2006. Multilinear operators for higher-order decompositions. Tech. Rep. SAND2006-2081, Sandia National Laboratory/April.Google ScholarGoogle Scholar
  21. Kolda, T. G. and Bader, B. W. 2008 Tensor decompositions and applications. SIAM Rev. (in revision). http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SAND2007-6702.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kolda, T. G., Bader, B. W., and Kenny, J. P. 2005. Higher-Order Web link analysis using multilinear algebra. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 242--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Korn, F., Labrinidis, A., Kotidis, Y., and Faloutsos, C. 2000. Quantifiable data-mining using ratio rules. The VLDB. 8, 3-4, 254--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kroonenberg, P. and Leeuw, J. D. 1980. Principal component analysis of three-mode data by means of alternating least square algorithms. Psychometrika 45.Google ScholarGoogle Scholar
  25. Kroonenberg, P. M. 1983. Three-Mode Principal Component Analysis. Theory and Applications. DWO Press.Google ScholarGoogle Scholar
  26. Lathauwer, L. D., Moor, B. D., and Vandewalle, J. 2006. On the best rank-1 and rank-(r1,r2,. . .,rn) approximation of higher-order tensors. SIAM. Matrix Anal. Appl. 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mahoney, M. W., Maggioni, M., and Drineas, P. 2006. Tensor-Cur decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Papadimitriou, C. H., Tamaki, H., Raghavan, P., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Papadimitriou, S., Brockwell, A., and Faloutsos, C. 2003. Adaptive, hands-off stream mining. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). Endowment, 560--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Papadimitriou, S., Sun, J., and Faloutsos, C. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Endowment, 697--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sakurai, Y., Papadimitriou, S., and Faloutsos, C. 2005. Braid: Stream mining through group lag correlations. In Proceeding of the ACM/International Conference on Management of Data, 599--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sanguansat, P., Asdornwised, W., Jitapunkul, S., and Marukatat, S. 2006. Two-Dimensional linear discriminant analysis of principle component vectors for face recognition. IEICE - Trans. Inf. Syst. E89-D, 7, 2164--2170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shashua, A. and Levin, A. 2001. Linear image coding for regression and classification using the tensor-rank principle. In Proceedings of the IEEE Conference on Computer vision and Patteroo Recongnistion CVPR (1). IEEE Computer Society, 42--49.Google ScholarGoogle Scholar
  34. Smilde, A., Bro, R., and Geladi, P. 2004. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, UK.Google ScholarGoogle Scholar
  35. Sun, J., Papadimitriou, S., and Yu, P. S. 2006a. Window-Based tensor analysis on high-dimensional and multi-aspect streams. In Proceedings of the 6th International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 1076--1080. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sun, J., Tao, D., and Faloutsos, C. 2006a. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tao, D., Li, X., Wu, X., and Maybank, S. J. 2007. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10, 1700--1715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279--311.Google ScholarGoogle ScholarCross RefCross Ref
  39. Universität Trier. 2008. http://www.informatik.uni-trier.de/~ley/db/.Google ScholarGoogle Scholar
  40. Van Loan, C. F. 2000. The ubiquitous Kronecker product. J. Comput. Appl. Math. 123, 85--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vasilescu, M. A. O. and Terzopoulos, D. 2002. Multilinear analysis of image ensembles: Tensorfaces. In Proceedings of the 7th European Conference on Computer Vision-Part I (ECCV). Springer, London, UK, 447--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wang, H., Fan, W., Yu, P. S., and Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 226--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xu, D., Yan, S., Zhang, L., Zhang, H.-J., Liu, Z., and Shum, H.-Y. 2005. Concurrent subspaces analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattem Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ye, J. 2005. Generalized low rank approximations of matrices. Mach. Learn. 61, 1-3, 167--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H., Faloutsos, C., and Biliris, A. 2000. Online data-mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering (ICDE). IEEE Computer Society, Washington, DC, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhao, L. and Zaki, M. J. 2005. Tricluster: An effective algorithm for mining coherent clusters in 3D microarray data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 694--705. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the International Conference on Voy Large Databases (VLDB). 358--369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incremental tensor analysis: Theory and applications

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Knowledge Discovery from Data
            ACM Transactions on Knowledge Discovery from Data  Volume 2, Issue 3
            October 2008
            124 pages
            ISSN:1556-4681
            EISSN:1556-472X
            DOI:10.1145/1409620
            Issue’s Table of Contents

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 October 2008
            • Accepted: 1 May 2008
            • Revised: 1 March 2008
            • Received: 1 October 2007
            Published in tkdd Volume 2, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader