Article

Data association for topic intensity tracking

Authors:
Andreas Krause

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Jure Leskovec

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Carlos Guestrin

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 497–504https://doi.org/10.1145/1143844.1143907

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 497–504

ABSTRACT

We present a unified model of what was traditionally viewed as two separate tasks: data association and intensity tracking of multiple topics over time. In the data association part, the task is to assign a topic (a class) to each data point, and the intensity tracking part models the bursts and changes in intensities of topics over time. Our approach to this problem combines an extension of Factorial Hidden Markov models for topic intensity tracking with exponential order statistics for implicit data association. Experiments on text and email datasets show that the interplay of classification and topic intensity tracking improves the accuracy of both classification and intensity tracking. Even a little noise in topic assignments can mislead the traditional algorithms. However, our approach detects correct topic intensities even with 30% topic noise.

References

Aizen, J., Huttenlocher, D., Kleinberg, J., & Novak, A. (2004). Traffic-based feedback on the web. Proc. Natl. Acad. Sci., 101, 5254--5260.Google ScholarCross Ref
Allan, J., Papka, R., & Lavrenko, V. (1998). On-line new event detection and tracking. SIGIR '98. Google ScholarDigital Library
Blei, D., & Lafferty, J. (2005). Correlated topic models. NIPS '05.Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. JMLR. Google ScholarDigital Library
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. J. of the Am. Soc. of Inf. Sci., 41.Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29. Google ScholarDigital Library
Ghahramani, Z., & Jordan, M. I. (1995). Factorial hidden Markov models. NIPS '95. Google ScholarDigital Library
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. KDD '03. Google ScholarDigital Library
Krause, A., Leskovec, J., & Guestrin, C. (2006). Data association for topic intensity tracking (Technical Report CMU-ML-06-100). Carnegie Mellon University.Google Scholar
Lerner, U. (2002). Hybrid bayesian networks for reasoning about complex systems. Ph.d. thesis, Stanford University.Google Scholar
Lerner, U., & Parr, R. (2001). Inference in hybrid networks: Theoretical limits and practical algorithms. UAI. Google ScholarDigital Library
Ng, B., Pfeffer, A., & Dearden, R. (2005). Continuous time particle filtering. IJCAI. Google ScholarDigital Library
Nodelman, U., Shelton, C., & Koller, D. (2003). Learning continuous time bayesian networks. UAI. Google ScholarDigital Library
Segal, R. B., & Kephart, J. O. (1999). Mailcat: an intelligent assistant for organizing e-mail. AGENTS '99. Google ScholarDigital Library
Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. SIGIR '00. Google ScholarDigital Library
Trivedi, K. (2002). Probability and statistics with reliability, queuing, and computer science applications. Prentice Hall. Google ScholarDigital Library
Yang, Y., Ault, T., Pierce, T., & Lattimer, C. W. (2000). Improving text categorization methods for event tracking. SIGIR '00. Google ScholarDigital Library

Index Terms

Data association for topic intensity tracking

Recommendations

Topic Chronicle Forest for Topic Discovery and Tracking
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

To ease comprehension of given time-stamped corpora, we extend topic models to handle both the specificity and temporality of topics; this is a significant advance over previous models which fail to provide both views simultaneously. Our proposed model ...
Read More
Topic Tracking Algorithm Based on Topic Structure Characteristics
HPCCT '22: Proceedings of the 2022 6th High Performance Computing and Cluster Technologies Conference

Topic tracking task is used for public opinion monitoring, and its key technology is text classification algorithm. However, existing text classification algorithms need large-scale train corpus during training, while topic tracking task only provides a ...
Read More
Incorporating topic transition in topic detection and tracking algorithms

Topics often transit among documents in a document collection. To improve the accuracy of the topic detection and tracking (TDT) algorithms in discovering topics or classifying documents, it is necessary to make full use of this kind of topic transition ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 426
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data association for topic intensity tracking

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Topic Chronicle Forest for Topic Discovery and Tracking

Topic Tracking Algorithm Based on Topic Structure Characteristics

Incorporating topic transition in topic detection and tracking algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Data association for topic intensity tracking

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Topic Chronicle Forest for Topic Discovery and Tracking

Topic Tracking Algorithm Based on Topic Structure Characteristics

Incorporating topic transition in topic detection and tracking algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media