skip to main content
10.1145/1281192.1281249acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Multiscale topic tomography

Published:12 August 2007Publication History

ABSTRACT

Modeling the evolution of topics with time is of great value in automatic summarization and analysis of large document collections. In this work, we propose a new probabilistic graphical model to address this issue. The new model, which we call the Multiscale Topic Tomography Model (MTTM), employs non-homogeneous Poisson processes to model generation of word-counts. The evolution of topics is modeled through a multi-scale analysis using Haar wavelets. One of the new features of the model is its modeling the evolution of topics at various time-scales of resolution, allowing the user to zoom in and out of the time-scales. Our experiments on Science data using the new model uncovers some interesting patterns in topics. The new model is also comparable to LDA in predicting unseen data as demonstrated by our perplexity experiments.

References

  1. D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei and J. D. Lafferty. Dynamic topic models. In International conference on Machine learning, pages 113--120, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Canny. Gap: a factor model for discrete data. In International ACM SIGIR conference on Research and development in information retrieval, pages 122--129, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems, pages 537--544, 2005.Google ScholarGoogle Scholar
  6. S. P. Harter. A probabilistic approach to automatic keyword indexing. Journal of the American Society for Information Science, 35:285--295, 1975.Google ScholarGoogle Scholar
  7. M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183--233, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. D. Kolaczyk. Bayesian multiscale models for poisson processes. In Journal of the American Statistical Association, pages 920--933, 1999.Google ScholarGoogle Scholar
  9. R. Nowak. Multiscale hidden markov models for bayesian image analysis. Bayesian Inference in Wavelet Based Models (B. Vidakovic and P. Muller, eds.), Lecture Notes in Statistics 141, Springer--Verlag., 1999.Google ScholarGoogle Scholar
  10. R. Nowak and E. Kolaczyk. A statistical multiscale framework for poisson inverse problems. Special issue of IEEE Transactions on Information theory on information--theoretic imaging, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. E. Robertson and S. Walker. Some simple effective approximations to the 2--poisson model for probabilistic weighted retrieval. In International ACM SIGIR conference on Research and development in information retrieval, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Technical Report 653, Department Of Statistics, UC Berkeley, 2003.Google ScholarGoogle Scholar
  13. X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In International conference on Knowledge discovery and data mining, pages 424--433, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multiscale topic tomography

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader