ABSTRACT
In this tutorial, we teach the intuition and the assumptions behind topic models. Topic models explain the co-occurrences of words in documents by extracting sets of semantically related words, called topics. These topics are semantically coherent and can be interpreted by humans. Starting with the most popular topic model, Latent Dirichlet Allocation (LDA), we explain the fundamental concepts of probabilistic topic modeling. We organise our tutorial as follows: After a general introduction, we will enable participants to develop an intuition for the underlying concepts of probabilistic topic models. Building on this intuition, we cover the technical foundations of topic models, including graphical models and Gibbs sampling. We conclude the tutorial with an overview on the most relevant adaptions and extensions of LDA.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, pages 288--296, 2009.Google ScholarDigital Library
- L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, pages 233--240. ACM, 2007. Google ScholarDigital Library
Index Terms
- Topic model tutorial: A basic introduction on latent dirichlet allocation and extensions for web scientists
Recommendations
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Topic sentiment mixture: modeling facets and opinions in weblogs
WWW '07: Proceedings of the 16th international conference on World Wide WebIn this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent ...
Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA
AbstractProbabilistic topic models have recently attracted much attention because of their successful applications in many text mining tasks such as retrieval, summarization, categorization, and clustering. Although many existing studies have reported ...
Comments