ABSTRACT
We present a unified model of what was traditionally viewed as two separate tasks: data association and intensity tracking of multiple topics over time. In the data association part, the task is to assign a topic (a class) to each data point, and the intensity tracking part models the bursts and changes in intensities of topics over time. Our approach to this problem combines an extension of Factorial Hidden Markov models for topic intensity tracking with exponential order statistics for implicit data association. Experiments on text and email datasets show that the interplay of classification and topic intensity tracking improves the accuracy of both classification and intensity tracking. Even a little noise in topic assignments can mislead the traditional algorithms. However, our approach detects correct topic intensities even with 30% topic noise.
- Aizen, J., Huttenlocher, D., Kleinberg, J., & Novak, A. (2004). Traffic-based feedback on the web. Proc. Natl. Acad. Sci., 101, 5254--5260.Google ScholarCross Ref
- Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. SIGIR '98. Google ScholarDigital Library
- Blei, D., & Lafferty, J. (2005). Correlated topic models. NIPS '05.Google Scholar
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. JMLR. Google ScholarDigital Library
- Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. J. of the Am. Soc. of Inf. Sci., 41.Google Scholar
- Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29. Google ScholarDigital Library
- Ghahramani, Z., & Jordan, M. I. (1995). Factorial hidden Markov models. NIPS '95. Google ScholarDigital Library
- Kleinberg, J. (2003). Bursty and hierarchical structure in streams. KDD '03. Google ScholarDigital Library
- Krause, A., Leskovec, J., & Guestrin, C. (2006). Data association for topic intensity tracking (Technical Report CMU-ML-06-100). Carnegie Mellon University.Google Scholar
- Lerner, U. (2002). Hybrid bayesian networks for reasoning about complex systems. Ph.d. thesis, Stanford University.Google Scholar
- Lerner, U., & Parr, R. (2001). Inference in hybrid networks: Theoretical limits and practical algorithms. UAI. Google ScholarDigital Library
- Ng, B., Pfeffer, A., & Dearden, R. (2005). Continuous time particle filtering. IJCAI. Google ScholarDigital Library
- Nodelman, U., Shelton, C., & Koller, D. (2003). Learning continuous time bayesian networks. UAI. Google ScholarDigital Library
- Segal, R. B., & Kephart, J. O. (1999). Mailcat: an intelligent assistant for organizing e-mail. AGENTS '99. Google ScholarDigital Library
- Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. SIGIR '00. Google ScholarDigital Library
- Trivedi, K. (2002). Probability and statistics with reliability, queuing, and computer science applications. Prentice Hall. Google ScholarDigital Library
- Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. SIGIR '00. Google ScholarDigital Library
Index Terms
- Data association for topic intensity tracking
Recommendations
Topic Chronicle Forest for Topic Discovery and Tracking
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningTo ease comprehension of given time-stamped corpora, we extend topic models to handle both the specificity and temporality of topics; this is a significant advance over previous models which fail to provide both views simultaneously. Our proposed model ...
Topic Tracking Algorithm Based on Topic Structure Characteristics
HPCCT '22: Proceedings of the 2022 6th High Performance Computing and Cluster Technologies ConferenceTopic tracking task is used for public opinion monitoring, and its key technology is text classification algorithm. However, existing text classification algorithms need large-scale train corpus during training, while topic tracking task only provides a ...
Incorporating topic transition in topic detection and tracking algorithms
Topics often transit among documents in a document collection. To improve the accuracy of the topic detection and tracking (TDT) algorithms in discovering topics or classifying documents, it is necessary to make full use of this kind of topic transition ...
Comments