skip to main content
10.1145/1390156.1390200acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Memory bounded inference in topic models

Published:05 July 2008Publication History

ABSTRACT

What type of algorithms and statistical techniques support learning from very large datasets over long stretches of time? We address this question through a memory bounded version of a variational EM algorithm that approximates inference in a topic model. The algorithm alternates two phases: "model building" and "model compression" in order to always satisfy a given memory constraint. The model building phase expands its internal representation (the number of topics) as more data arrives through Bayesian model selection. Compression is achieved by merging data-items in clumps and only caching their sufficient statistics. Empirically, the resulting algorithm is able to handle datasets that are orders of magnitude larger than the standard batch version.

References

  1. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. IEEE CVPR Workshop of Generative Model Based Vision (WGMBV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209--230.Google ScholarGoogle ScholarCross RefCross Ref
  4. Kurihara, K., Welling, M., & Vlassis, N. (2006). Accelerated variational dirichlet process mixtures. NIPS.Google ScholarGoogle Scholar
  5. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Minka, T. (2000). Estimating a dirichlet distribution (Technical Report).Google ScholarGoogle Scholar
  7. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. To appear in Journal of the American Statistical Association.Google ScholarGoogle Scholar
  8. Teh, Y. W., Kurihara, K., & Welling, M. (2008). Collapsed variational inference for HDP. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  9. Ueda, N., Nakano, R., Gharamani, Z., & Hinton, G. (1999). Smem algorithm for mixture models.Google ScholarGoogle Scholar
  10. Verbeek, J., Nunnink, J., & Vlassis, N. (2003). Accelerated variants of the em algorithm for gaussian mixtures (Technical Report). University of Amsterdam.Google ScholarGoogle Scholar

Index Terms

  1. Memory bounded inference in topic models

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '08: Proceedings of the 25th international conference on Machine learning
                July 2008
                1310 pages
                ISBN:9781605582054
                DOI:10.1145/1390156

                Copyright © 2008 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 5 July 2008

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader