skip to main content
10.1145/1150402.1150482acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A mixture model for contextual text mining

Published:20 August 2006Publication History

ABSTRACT

Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document are usually related to the context of the document, analyzing topical themes within context can potentially reveal many interesting theme patterns. In this paper, we generalize some of these models proposed in the previous work and we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases. Specifically, we extend the probabilistic latent semantic analysis (PLSA) model by introducing context variables to model the context of a document. The proposed mixture model, called contextual probabilistic latent semantic analysis (CPLSA) model, can be applied to many interesting mining tasks, such as temporal text mining, spatiotemporal text mining, author-topic analysis, and cross-collection comparative analysis. Empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively.

References

  1. J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998.Google ScholarGoogle Scholar
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. C. Chen, M. C. Chen, and M.-S. Chen. Liped: Hmm-based life profiles for adaptive event detection. In Proceeding of KDD '05, pages 556--561, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  6. T. L. Griffiths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of UAI'99.Google ScholarGoogle Scholar
  8. T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR'99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.Google ScholarGoogle Scholar
  11. Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR'05, pages 106--113, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD'03, pages 613--618, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW '06, pages 533--542, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD'05, pages 198--207, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In Proceedings of CIKM'04, pages 446--453, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI '04, pages 664--668, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of KDD'04, pages 306--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD'04, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A mixture model for contextual text mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader