ABSTRACT
Modeling the evolution of topics with time is of great value in automatic summarization and analysis of large document collections. In this work, we propose a new probabilistic graphical model to address this issue. The new model, which we call the Multiscale Topic Tomography Model (MTTM), employs non-homogeneous Poisson processes to model generation of word-counts. The evolution of topics is modeled through a multi-scale analysis using Haar wavelets. One of the new features of the model is its modeling the evolution of topics at various time-scales of resolution, allowing the user to zoom in and out of the time-scales. Our experiments on Science data using the new model uncovers some interesting patterns in topics. The new model is also comparable to LDA in predicting unseen data as demonstrated by our perplexity experiments.
- D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems, 2006.Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003. Google ScholarDigital Library
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In International conference on Machine learning, pages 113--120, 2006. Google ScholarDigital Library
- J. Canny. Gap: a factor model for discrete data. In International ACM SIGIR conference on Research and development in information retrieval, pages 122--129, 2004. Google ScholarDigital Library
- T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems, pages 537--544, 2005.Google Scholar
- S. P. Harter. A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science, 35:285--295, 1975.Google Scholar
- M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183--233, 1999. Google ScholarDigital Library
- E. D. Kolaczyk. Bayesian multiscale models for poisson processes. In Journal of the American Statistical Association, pages 920--933, 1999.Google Scholar
- R. Nowak. Multiscale hidden markov models for bayesian image analysis. Bayesian Inference in Wavelet Based Models (B. Vidakovic and P. Muller, eds.), Lecture Notes in Statistics 141, Springer--Verlag., 1999.Google Scholar
- R. Nowak and E. Kolaczyk. A statistical multiscale framework for poisson inverse problems. Special issue of IEEE Transactions on Information theory on information--theoretic imaging, 2000. Google ScholarDigital Library
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2--poisson model for probabilistic weighted retrieval. In International ACM SIGIR conference on Research and development in information retrieval, 1994. Google ScholarDigital Library
- Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Technical Report 653, Department Of Statistics, UC Berkeley, 2003.Google Scholar
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In International conference on Knowledge discovery and data mining, pages 424--433, 2006. Google ScholarDigital Library
Index Terms
- Multiscale topic tomography
Recommendations
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
The dual-sparse topic model: mining focused topics and focused terms in short text
WWW '14: Proceedings of the 23rd international conference on World wide webTopic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate ...
Hidden Topic Sentiment Model
WWW '16: Proceedings of the 25th International Conference on World Wide WebVarious topic models have been developed for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this paper, we build a Hidden Topic ...
Comments