Article

A mixture model for contextual text mining

Authors:
Qiaozhu Mei

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 649–655https://doi.org/10.1145/1150402.1150482

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 649–655

ABSTRACT

Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document are usually related to the context of the document, analyzing topical themes within context can potentially reveal many interesting theme patterns. In this paper, we generalize some of these models proposed in the previous work and we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases. Specifically, we extend the probabilistic latent semantic analysis (PLSA) model by introducing context variables to model the context of a document. The proposed mixture model, called contextual probabilistic latent semantic analysis (CPLSA) model, can be applied to many interesting mining tasks, such as temporal text mining, spatiotemporal text mining, author-topic analysis, and cross-collection comparative analysis. Empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively.

References

J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998.Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarCross Ref
S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000. Google ScholarDigital Library
C. C. Chen, M. C. Chen, and M.-S. Chen. Liped: Hmm-based life profiles for adaptive event detection. In Proceeding of KDD '05, pages 556--561, 2005. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.Google ScholarCross Ref
T. L. Griffiths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.Google ScholarCross Ref
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of UAI'99.Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR'99. Google ScholarDigital Library
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101. Google ScholarDigital Library
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.Google Scholar
Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR'05, pages 106--113, 2005. Google ScholarDigital Library
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD'03, pages 613--618, 2003. Google ScholarDigital Library
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW '06, pages 533--542, 2006. Google ScholarDigital Library
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD'05, pages 198--207, 2005. Google ScholarDigital Library
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In Proceedings of CIKM'04, pages 446--453, 2004. Google ScholarDigital Library
J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI '04, pages 664--668, 2004. Google ScholarDigital Library
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of KDD'04, pages 306--315, 2004. Google ScholarDigital Library
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD'04, pages 743--748, 2004. Google ScholarDigital Library

Index Terms

A mixture model for contextual text mining
1. Information systems
  1. Information retrieval

Recommendations

A cross-collection mixture model for comparative text mining
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all ...
Read More
Topic sentiment mixture: modeling facets and opinions in weblogs
WWW '07: Proceedings of the 16th international conference on World Wide Web

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent ...
Read More
Contextual text mining
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
EM algorithm
clustering
context
contextual text mining
mixture model
theme pattern
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 2,094
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A mixture model for contextual text mining

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A cross-collection mixture model for comparative text mining

Topic sentiment mixture: modeling facets and opinions in weblogs

Contextual text mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A mixture model for contextual text mining

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A cross-collection mixture model for comparative text mining

Topic sentiment mixture: modeling facets and opinions in weblogs

Contextual text mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media