ABSTRACT
Given huge collections of time-evolving events such as web-click logs, which consist of multiple attributes (e.g., URL, userID, times- tamp), how do we find patterns and trends? How do we go about capturing daily patterns and forecasting future events? We need two properties: (a) effectiveness, that is, the patterns should help us understand the data, discover groups, and enable forecasting, and (b) scalability, that is, the method should be linear with the data size. We introduce TriMine, which performs three-way mining for all three attributes, namely, URLs, users, and time. Specifically TriMine discovers hidden topics, groups of URLs, and groups of users, simultaneously. Thanks to its concise but effective summarization, it makes it possible to accomplish the most challenging and important task, namely, to forecast future events. Extensive experiments on real datasets demonstrate that TriMine discovers meaningful topics and makes long-range forecasts, which are notoriously difficult to achieve. In fact, TriMine consistently outperforms the best state-of-the-art existing methods in terms of accuracy and execution speed (up to 74x faster).
Supplemental Material
- D. Agarwal, B.-C. Chen, and P. Elango. Spatio-temporal models for estimating click-through rate. In WWW Conference, pages 21--30, Madrid, Spain, April 2009. Google ScholarDigital Library
- L. AlSumait, D. Barbará, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM, pages 3--12, 2008. Google ScholarDigital Library
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
- L. Hong, D. Yin, J. Guo, and B. D. Davison. Tracking trends: incorporating term volume into temporal topic models. In KDD, pages 484--492, 2011. Google ScholarDigital Library
- T. Iwata, S. Watanabe, T. Yamada, and N. Ueda. Topic tracking model for analyzing consumer purchase behavior. In IJCAI, pages 1427--1432, 2009. Google ScholarDigital Library
- T. Iwata, T. Yamada, Y. Sakurai, and N. Ueda. Online multiscale dynamic topic models. In KDD, pages 663--672, 2010. Google ScholarDigital Library
- R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. In 18th Annual Conference on Learning Theory (COLT), pages 444--457, 2005. Google ScholarDigital Library
- T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, pages 242--249, 2005. Google ScholarDigital Library
- F. Korn, S. Muthukrishnan, and Y. Wu. Modeling skew in data streams. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 181--192, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl.,21(4):1253--1278, 2000.. Google ScholarDigital Library
- L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010. Google ScholarDigital Library
- Y. Matsubara, Y. Sakurai, and M. Yoshikawa. Scalable algorithms for distribution search. In ICDM, pages 347--356, 2009. Google ScholarDigital Library
- R. V. Nehme, E. A. Rundensteiner, and E. Bertino. Tagging stream data for rich real-time services. PVLDB, 2(1):73--84, 2009. Google ScholarDigital Library
- S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. In Proceedings of VLDB, pages 560--571, Berlin, Germany, Sept. 2003. Google ScholarDigital Library
- S. Papadimitriou and P. S. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD Conference, pages 647--658, 2006. Google ScholarDigital Library
- I. Porteous, D. Newman, A. T. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In KDD, pages 569--577, 2008. Google ScholarDigital Library
- S. Rendle, L. B. Marinho, A. Nanopoulos, and L. Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727--736, 2009. Google ScholarDigital Library
- Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In Proceedings of ICDE, pages 1046--1055, Istanbul, Turkey, April 2007.Google ScholarCross Ref
- Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In Proceedings of ACM SIGMOD, pages 599--610, Baltimore, Maryland, June 2005. Google ScholarDigital Library
- T. Shi, M. Belkin, and B. Yu. Data spectroscopy: learning mixture models using eigenspaces of convolution operators. In ICML, pages 936--943, 2008. Google ScholarDigital Library
- G. Tomasi and R. Bro. Parafac and missing values. Chemometrics and Intelligent Laboratory Systems, 75(2):163--180, 2005.Google ScholarCross Ref
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD, pages 424--433, 2006. Google ScholarDigital Library
- X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007. Google ScholarDigital Library
- A. Weigend and N. Gershenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1993.Google Scholar
- L. Yao, D. M. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD, pages 937--946, 2009. Google ScholarDigital Library
- Z. Zhang, B. T. Dai, and A. K. H. Tung. Estimating local optimums in em algorithm over gaussian mixture model. In ICML, pages 1240--1247, 2008. Google ScholarDigital Library
Index Terms
- Fast mining and forecasting of complex time-stamped events
Recommendations
Boosted Embeddings for Time-Series Forecasting
Machine Learning, Optimization, and Data ScienceAbstractTime-series forecasting is a fundamental task emerging from diverse data-driven applications. Many advanced autoregressive methods such as ARIMA were used to develop forecasting models. Recently, deep learning based methods such as DeepAR, ...
Time series forecasting with the WARIMAX-GARCH method
It is well-known that causal forecasting methods that include appropriately chosen Exogenous Variables (EVs) very often present improved forecasting performances over univariate methods. However, in practice, EVs are usually difficult to obtain and in ...
A heuristic time-invariant model for fuzzy time series forecasting
Many forecasting models based on the concepts of fuzzy time series have been proposed in the past decades. These models have been applied to predict enrollments, temperature, crop production and stock index, etc. In this paper, we present a simple ...
Comments