skip to main content
10.1145/2339530.2339577acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Fast mining and forecasting of complex time-stamped events

Published:12 August 2012Publication History

ABSTRACT

Given huge collections of time-evolving events such as web-click logs, which consist of multiple attributes (e.g., URL, userID, times- tamp), how do we find patterns and trends? How do we go about capturing daily patterns and forecasting future events? We need two properties: (a) effectiveness, that is, the patterns should help us understand the data, discover groups, and enable forecasting, and (b) scalability, that is, the method should be linear with the data size. We introduce TriMine, which performs three-way mining for all three attributes, namely, URLs, users, and time. Specifically TriMine discovers hidden topics, groups of URLs, and groups of users, simultaneously. Thanks to its concise but effective summarization, it makes it possible to accomplish the most challenging and important task, namely, to forecast future events. Extensive experiments on real datasets demonstrate that TriMine discovers meaningful topics and makes long-range forecasts, which are notoriously difficult to achieve. In fact, TriMine consistently outperforms the best state-of-the-art existing methods in terms of accuracy and execution speed (up to 74x faster).

Skip Supplemental Material Section

Supplemental Material

307_m_talk_7.mp4

mp4

179 MB

References

  1. D. Agarwal, B.-C. Chen, and P. Elango. Spatio-temporal models for estimating click-through rate. In WWW Conference, pages 21--30, Madrid, Spain, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. AlSumait, D. Barbará, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM, pages 3--12, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Hong, D. Yin, J. Guo, and B. D. Davison. Tracking trends: incorporating term volume into temporal topic models. In KDD, pages 484--492, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Iwata, S. Watanabe, T. Yamada, and N. Ueda. Topic tracking model for analyzing consumer purchase behavior. In IJCAI, pages 1427--1432, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Iwata, T. Yamada, Y. Sakurai, and N. Ueda. Online multiscale dynamic topic models. In KDD, pages 663--672, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. In 18th Annual Conference on Learning Theory (COLT), pages 444--457, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. G. Kolda, B. W. Bader, and J. P. Kenny. Higher-order web link analysis using multilinear algebra. In ICDM, pages 242--249, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Korn, S. Muthukrishnan, and Y. Wu. Modeling skew in data streams. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 181--192, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl.,21(4):1253--1278, 2000.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Li, B. A. Prakash, and C. Faloutsos. Parsimonious linear fingerprinting for time series. PVLDB, 3(1):385--396, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Matsubara, Y. Sakurai, and M. Yoshikawa. Scalable algorithms for distribution search. In ICDM, pages 347--356, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. V. Nehme, E. A. Rundensteiner, and E. Bertino. Tagging stream data for rich real-time services. PVLDB, 2(1):73--84, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. In Proceedings of VLDB, pages 560--571, Berlin, Germany, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Papadimitriou and P. S. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD Conference, pages 647--658, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Porteous, D. Newman, A. T. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In KDD, pages 569--577, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Rendle, L. B. Marinho, A. Nanopoulos, and L. Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727--736, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In Proceedings of ICDE, pages 1046--1055, Istanbul, Turkey, April 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Sakurai, S. Papadimitriou, and C. Faloutsos. Braid: Stream mining through group lag correlations. In Proceedings of ACM SIGMOD, pages 599--610, Baltimore, Maryland, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Shi, M. Belkin, and B. Yu. Data spectroscopy: learning mixture models using eigenspaces of convolution operators. In ICML, pages 936--943, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Tomasi and R. Bro. Parafac and missing values. Chemometrics and Intelligent Laboratory Systems, 75(2):163--180, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD, pages 424--433, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Weigend and N. Gershenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, 1993.Google ScholarGoogle Scholar
  27. L. Yao, D. M. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In KDD, pages 937--946, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Z. Zhang, B. T. Dai, and A. K. H. Tung. Estimating local optimums in em algorithm over gaussian mixture model. In ICML, pages 1240--1247, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast mining and forecasting of complex time-stamped events

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader