Abstract
How do we find patterns in author-keyword associations, evolving over time? Or in data cubes (tensors), with product-branchcustomer sales information? And more generally, how to summarize high-order data cubes (tensors)? How to incrementally update these patterns over time? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example).
We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity.
We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets.
- Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 81--92. Google ScholarDigital Library
- Agrawal, R., Imieliński, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 207--216. Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarDigital Library
- Ding, C. and Ye, J. 2005. Two-Dimensional singular value decomposition (2dsvd) for 2d maps and images. In SDM.Google Scholar
- Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 71--80. Google ScholarDigital Library
- Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 57--66. Google ScholarDigital Library
- Drineas, P. and Mahoney, M. W. 2005. A randomized algorithm for a tensor-based generalization of the SVD. Tech. Rep. YALEU/DCS/TR-1327, Yale University.Google Scholar
- Enron Dataset. 2004. U.C. Berkeley Enron email analysis. http://bailando.sims.berkeley.edu/enron_email.html.Google Scholar
- Foltz, P. W. and Dumais, S. T. 1992. Personalized information delivery: An analysis of information filtering methods. Commun. ACM 35, 12, 51--60. Google ScholarDigital Library
- Ganti, V., Gehrke, J., and Ramakrishnan, R. 2002. Mining data streams under block evolution. SIGKDD Explor. 3, 2, 1--10. Google ScholarDigital Library
- Guha, S., Gunopulos, D., and Koudas, N. 2003. Correlating synchronous and asynchronous data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 529--534. Google ScholarDigital Library
- Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. 2003. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 15, 3, 515--528. Google ScholarDigital Library
- Haykin, S. 1991. Adaptive Filter Theory, 2nd ed. Prentice-Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- He, X., Cai, D., and Niyogi, P. 2005. Tensor subspace analysis. In Proceeding of the Conference on Advance, in Neural Information Processing Systems (NIPS).Google Scholar
- Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series data sets using sketches. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, San Francisco, CA, 363--372. Google ScholarDigital Library
- Jolliffe, I. 2002. Principal Component Analysis. Springer.Google Scholar
- Kannan, R., Vempala, S., and Veta, A. 2000. On clusterings-good, bad and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, Washington, DC, 367. Google ScholarDigital Library
- Karypis, G. and Kumar, V. 1999. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev. 41, 2, 278--300. Google ScholarDigital Library
- Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarDigital Library
- Kolda, T. G. 2006. Multilinear operators for higher-order decompositions. Tech. Rep. SAND2006-2081, Sandia National Laboratory/April.Google Scholar
- Kolda, T. G. and Bader, B. W. 2008 Tensor decompositions and applications. SIAM Rev. (in revision). http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SAND2007-6702.pdf. Google ScholarDigital Library
- Kolda, T. G., Bader, B. W., and Kenny, J. P. 2005. Higher-Order Web link analysis using multilinear algebra. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 242--249. Google ScholarDigital Library
- Korn, F., Labrinidis, A., Kotidis, Y., and Faloutsos, C. 2000. Quantifiable data-mining using ratio rules. The VLDB. 8, 3-4, 254--266. Google ScholarDigital Library
- Kroonenberg, P. and Leeuw, J. D. 1980. Principal component analysis of three-mode data by means of alternating least square algorithms. Psychometrika 45.Google Scholar
- Kroonenberg, P. M. 1983. Three-Mode Principal Component Analysis. Theory and Applications. DWO Press.Google Scholar
- Lathauwer, L. D., Moor, B. D., and Vandewalle, J. 2006. On the best rank-1 and rank-(r1,r2,. . .,rn) approximation of higher-order tensors. SIAM. Matrix Anal. Appl. 21. Google ScholarDigital Library
- Mahoney, M. W., Maggioni, M., and Drineas, P. 2006. Tensor-Cur decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 327--336. Google ScholarDigital Library
- Papadimitriou, C. H., Tamaki, H., Raghavan, P., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, 159--168. Google ScholarDigital Library
- Papadimitriou, S., Brockwell, A., and Faloutsos, C. 2003. Adaptive, hands-off stream mining. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). Endowment, 560--571. Google ScholarDigital Library
- Papadimitriou, S., Sun, J., and Faloutsos, C. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Endowment, 697--708. Google ScholarDigital Library
- Sakurai, Y., Papadimitriou, S., and Faloutsos, C. 2005. Braid: Stream mining through group lag correlations. In Proceeding of the ACM/International Conference on Management of Data, 599--610. Google ScholarDigital Library
- Sanguansat, P., Asdornwised, W., Jitapunkul, S., and Marukatat, S. 2006. Two-Dimensional linear discriminant analysis of principle component vectors for face recognition. IEICE - Trans. Inf. Syst. E89-D, 7, 2164--2170. Google ScholarDigital Library
- Shashua, A. and Levin, A. 2001. Linear image coding for regression and classification using the tensor-rank principle. In Proceedings of the IEEE Conference on Computer vision and Patteroo Recongnistion CVPR (1). IEEE Computer Society, 42--49.Google Scholar
- Smilde, A., Bro, R., and Geladi, P. 2004. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, UK.Google Scholar
- Sun, J., Papadimitriou, S., and Yu, P. S. 2006a. Window-Based tensor analysis on high-dimensional and multi-aspect streams. In Proceedings of the 6th International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 1076--1080. Google ScholarDigital Library
- Sun, J., Tao, D., and Faloutsos, C. 2006a. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 374--383. Google ScholarDigital Library
- Tao, D., Li, X., Wu, X., and Maybank, S. J. 2007. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10, 1700--1715. Google ScholarDigital Library
- Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279--311.Google ScholarCross Ref
- Universität Trier. 2008. http://www.informatik.uni-trier.de/~ley/db/.Google Scholar
- Van Loan, C. F. 2000. The ubiquitous Kronecker product. J. Comput. Appl. Math. 123, 85--100. Google ScholarDigital Library
- Vasilescu, M. A. O. and Terzopoulos, D. 2002. Multilinear analysis of image ensembles: Tensorfaces. In Proceedings of the 7th European Conference on Computer Vision-Part I (ECCV). Springer, London, UK, 447--460. Google ScholarDigital Library
- Wang, H., Fan, W., Yu, P. S., and Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 226--235. Google ScholarDigital Library
- Xu, D., Yan, S., Zhang, L., Zhang, H.-J., Liu, Z., and Shum, H.-Y. 2005. Concurrent subspaces analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattem Recognition (CVPR). Google ScholarDigital Library
- Ye, J. 2005. Generalized low rank approximations of matrices. Mach. Learn. 61, 1-3, 167--191. Google ScholarDigital Library
- Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H., Faloutsos, C., and Biliris, A. 2000. Online data-mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering (ICDE). IEEE Computer Society, Washington, DC, 13. Google ScholarDigital Library
- Zhao, L. and Zaki, M. J. 2005. Tricluster: An effective algorithm for mining coherent clusters in 3D microarray data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 694--705. Google ScholarDigital Library
- Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the International Conference on Voy Large Databases (VLDB). 358--369. Google ScholarDigital Library
Index Terms
- Incremental tensor analysis: Theory and applications
Recommendations
Tensor compressed video sensing reconstruction by combination of fractional-order total variation and sparsifying transform
High reconstructed performance compressed video sensing (CVS) with low computational complexity and memory requirement is very challenging. In order to reconstruct the high quality video frames with low computational complexity, this paper proposes a ...
Tensor Decompositions and Applications
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order tensors (i.e., $N$-way arrays with $N \geq 3$) have ...
Tensor Missing Value Recovery with Tucker Thresholding Method
INCOS '13: Proceedings of the 2013 5th International Conference on Intelligent Networking and Collaborative SystemsIn this paper, a tensor missing value recovery method on the tensor Tucker decomposition is presented. The contribution of this paper is to extend matrix shrinkage operator to the tensor Tucker higher-order singular value decomposition operator to ...
Comments