ABSTRACT
Mining cluster evolution from multiple correlated time-varying text corpora is important in exploratory text analytics. In this paper, we propose an approach called evolutionary hierarchical Dirichlet processes (EvoHDP) to discover interesting cluster evolution patterns from such text data. We formulate the EvoHDP as a series of hierarchical Dirichlet processes~(HDP) by adding time dependencies to the adjacent epochs, and propose a cascaded Gibbs sampling scheme to infer the model. This approach can discover different evolving patterns of clusters, including emergence, disappearance, evolution within a corpus and across different corpora. Experiments over synthetic and real-world multiple correlated time-varying data sets illustrate the effectiveness of EvoHDP on discovering cluster evolution patterns.
Supplemental Material
- A. Ahmed and E. Xing. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In SDM, 2008.Google ScholarCross Ref
- C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, pages 1152--1174, 1974.Google ScholarCross Ref
- D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. Google ScholarDigital Library
- D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1):17--35, 2007.Google ScholarCross Ref
- D. Blei, A. Ng, M. Jordan, and J. Lafferty. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4-5):993--1022, 2003. Google Scholar
- D. M. Blei and P. I. Frazier. Distance dependent Chinese restaurant processes. arXiv, October 2009.Google Scholar
- F. Caron, M. Davy, and A. Doucet. Generalized Polya urn for time-varying Dirichlet process mixtures. In UAI, 2007.Google Scholar
- D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering. In KDD, 2006. Google ScholarDigital Library
- Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In KDD, 2007. Google ScholarDigital Library
- J. Griffin and M. Steel. Order-based dependent Dirichlet processes. Journal of the American Statistical Association, 101(473):179--194, 2006.Google ScholarCross Ref
- J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, 2009. Google ScholarDigital Library
- S. Liu, M. X. Zhou, S. Pan, W. Qian, W. Cai, and X. Lian. Interactive, topic-based visual text summarization and analysis. In CIKM, 2009. Google ScholarDigital Library
- A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.Google Scholar
- I. Pruteanu-Malinici, L. Ren, J. Paisley, E. Wang, and L. Carin. Hierarchical Bayesian modeling of topics in time-stamped documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear. Google ScholarDigital Library
- L. Ren, D. Dunson, S. Lindroth, and L. Carin. Dynamic nonparametric Bayesian models for analysis of music. Journal of the American Statistical Association, to appear.Google Scholar
- L. Ren, D. B. Dunson, and L. Carin. The dynamic hierarchical Dirichlet process. ICML, 2008. Google ScholarDigital Library
- K. Salomatin, Y. Yang, and A. Lad. Multi-field correlated topic modeling. In SDM, 2009.Google ScholarCross Ref
- J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4(2):639--650, 1994.Google Scholar
- Z. Shen, J. Sun, and Y. Shen. Collective latent Dirichlet allocation. In ICDM, 2008. Google ScholarDigital Library
- N. Srebro and S. Roweis. Time-varying topic models using dependent Dirichlet processes. Technical report, C.S., Univ. of Toronto, 2005.Google Scholar
- L. Tang, H. Liu, J. Zhang, and Z. Nazeri. Community evolution in dynamic multi-mode networks. KDD, 2008. Google ScholarDigital Library
- Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.Google ScholarCross Ref
- Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.Google Scholar
- Y. W. Teh and M. I. Jordan. Hierarchical Bayesian nonparametric models with applications. In N. Hjort, C. Holmes, P. Müller, and S. Walker, editors, To appear in Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2009.Google Scholar
- C. Wang, D. Blei, and D. Heckerman. Continuous time dynamic topic models. In UAI, 2008.Google Scholar
- C. Wang, B. Thiesson, C. Meek, and D. Blei. Markov topic models. In AISTATS, 2009.Google Scholar
- X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In KDD, 2006. Google ScholarDigital Library
- X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, 2007. Google ScholarDigital Library
- X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from multiple asynchronous text streams. In WSDM, 2009. Google ScholarDigital Library
- T. Xu, Z. M. Zhang, P. S. Yu, and B. Long. Dirichlet process based evolutionary clustering. In ICDM, 2008. Google ScholarDigital Library
- T. Xu, Z. M. Zhang, P. S. Yu, and B. Long. Evolutionary clustering by hierarchical Dirichlet process with hidden markov state. In ICDM, 2008. Google ScholarDigital Library
- J. Zhang, Y. Song, G. Chen, and C. Zhang. On-line evolutionary exponential family mixture. In IJCAI, 2009. Google ScholarDigital Library
Index Terms
- Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora
Recommendations
Nonparametric localized feature selection via a dirichlet process mixture of generalized dirichlet distributions
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IIIIn this paper, we propose a novel Bayesian nonparametric statistical approach of simultaneous clustering and localized feature selection for unsupervised learning. The proposed model is based on a mixture of Dirichlet processes with generalized ...
Partially collapsed parallel Gibbs sampler for Dirichlet process mixture models
Proposed representation for DP is ideally suited for distributed computing.Proposed sampler offers advantages in terms of both scalability and run-time.Proposed sampler outperforms existing parallel samplers for Dirichlet process mixtures in terms of ...
Expectation-maximization algorithms for inference in Dirichlet processes mixture
Mixture models are ubiquitous in applied science. In many real-world applications, the number of mixture components needs to be estimated from the data. A popular approach consists of using information criteria to perform model selection. Another ...
Comments