skip to main content
10.1145/1835804.1835940acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora

Published:25 July 2010Publication History

ABSTRACT

Mining cluster evolution from multiple correlated time-varying text corpora is important in exploratory text analytics. In this paper, we propose an approach called evolutionary hierarchical Dirichlet processes (EvoHDP) to discover interesting cluster evolution patterns from such text data. We formulate the EvoHDP as a series of hierarchical Dirichlet processes~(HDP) by adding time dependencies to the adjacent epochs, and propose a cascaded Gibbs sampling scheme to infer the model. This approach can discover different evolving patterns of clusters, including emergence, disappearance, evolution within a corpus and across different corpora. Experiments over synthetic and real-world multiple correlated time-varying data sets illustrate the effectiveness of EvoHDP on discovering cluster evolution patterns.

Skip Supplemental Material Section

Supplemental Material

kdd2010_zhang_ehdpm_01.mov

mov

107.6 MB

References

  1. A. Ahmed and E. Xing. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In SDM, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, pages 1152--1174, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1):17--35, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Blei, A. Ng, M. Jordan, and J. Lafferty. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4-5):993--1022, 2003. Google ScholarGoogle Scholar
  6. D. M. Blei and P. I. Frazier. Distance dependent Chinese restaurant processes. arXiv, October 2009.Google ScholarGoogle Scholar
  7. F. Caron, M. Davy, and A. Doucet. Generalized Polya urn for time-varying Dirichlet process mixtures. In UAI, 2007.Google ScholarGoogle Scholar
  8. D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Griffin and M. Steel. Order-based dependent Dirichlet processes. Journal of the American Statistical Association, 101(473):179--194, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Liu, M. X. Zhou, S. Pan, W. Qian, W. Cai, and X. Lian. Interactive, topic-based visual text summarization and analysis. In CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google ScholarGoogle Scholar
  14. I. Pruteanu-Malinici, L. Ren, J. Paisley, E. Wang, and L. Carin. Hierarchical Bayesian modeling of topics in time-stamped documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Ren, D. Dunson, S. Lindroth, and L. Carin. Dynamic nonparametric Bayesian models for analysis of music. Journal of the American Statistical Association, to appear.Google ScholarGoogle Scholar
  16. L. Ren, D. B. Dunson, and L. Carin. The dynamic hierarchical Dirichlet process. ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Salomatin, Y. Yang, and A. Lad. Multi-field correlated topic modeling. In SDM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4(2):639--650, 1994.Google ScholarGoogle Scholar
  19. Z. Shen, J. Sun, and Y. Shen. Collective latent Dirichlet allocation. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Srebro and S. Roweis. Time-varying topic models using dependent Dirichlet processes. Technical report, C.S., Univ. of Toronto, 2005.Google ScholarGoogle Scholar
  21. L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.Google ScholarGoogle Scholar
  24. Y. W. Teh and M. I. Jordan. Hierarchical Bayesian nonparametric models with applications. In N. Hjort, C. Holmes, P. Müller, and S. Walker, editors, To appear in Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2009.Google ScholarGoogle Scholar
  25. C. Wang, D. Blei, and D. Heckerman. Continuous time dynamic topic models. In UAI, 2008.Google ScholarGoogle Scholar
  26. C. Wang, B. Thiesson, C. Meek, and D. Blei. Markov topic models. In AISTATS, 2009.Google ScholarGoogle Scholar
  27. X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from multiple asynchronous text streams. In WSDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Xu, Z. M. Zhang, P. S. Yu, and B. Long. Dirichlet process based evolutionary clustering. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Xu, Z. M. Zhang, P. S. Yu, and B. Long. Evolutionary clustering by hierarchical Dirichlet process with hidden markov state. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zhang, Y. Song, G. Chen, and C. Zhang. On-line evolutionary exponential family mixture. In IJCAI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
          July 2010
          1240 pages
          ISBN:9781450300551
          DOI:10.1145/1835804

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader