research-article

Memory bounded inference in topic models

Authors:
Ryan Gomes

California Institute of Technology, Pasadena, CA

California Institute of Technology, Pasadena, CA
View Profile

,
Max Welling

University of California at Irvine, Irvine, CA

University of California at Irvine, Irvine, CA
View Profile

,
Pietro Perona

California Institute of Technology, Pasadena, CA

California Institute of Technology, Pasadena, CA
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 344–351https://doi.org/10.1145/1390156.1390200

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 344–351

ABSTRACT

What type of algorithms and statistical techniques support learning from very large datasets over long stretches of time? We address this question through a memory bounded version of a variational EM algorithm that approximates inference in a topic model. The algorithm alternates two phases: "model building" and "model compression" in order to always satisfy a given memory constraint. The model building phase expands its internal representation (the number of topics) as more data arrives through Bayesian model selection. Compression is achieved by merging data-items in clumps and only caching their sufficient statistics. Empirically, the resulting algorithm is able to handle datasets that are orders of magnitude larger than the standard batch version.

References

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022. Google ScholarDigital Library
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. IEEE CVPR Workshop of Generative Model Based Vision (WGMBV). Google ScholarDigital Library
Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209--230.Google ScholarCross Ref
Kurihara, K., Welling, M., & Vlassis, N. (2006). Accelerated variational dirichlet process mixtures. NIPS.Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91--110. Google ScholarDigital Library
Minka, T. (2000). Estimating a dirichlet distribution (Technical Report).Google Scholar
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. To appear in Journal of the American Statistical Association.Google Scholar
Teh, Y. W., Kurihara, K., & Welling, M. (2008). Collapsed variational inference for HDP. Advances in Neural Information Processing Systems.Google Scholar
Ueda, N., Nakano, R., Gharamani, Z., & Hinton, G. (1999). Smem algorithm for mixture models.Google Scholar
Verbeek, J., Nunnink, J., & Vlassis, N. (2003). Accelerated variants of the em algorithm for gaussian mixtures (Technical Report). University of Amsterdam.Google Scholar

Index Terms

Memory bounded inference in topic models
1. Computing methodologies
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Inductive inference

Recommendations

Variational inference in nonconjugate models

Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the ...
Read More
Hybrid variational/gibbs collapsed inference in topic models
UAI'08: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence

Variational Bayesian inference and (collapsed) Gibbs sampling are the two important classes of inference algorithms for Bayesian networks. Both have their advantages and disadvantages: collapsed Gibbs sampling is unbiased but is also inefficient for ...
Read More
Mean-field variational approximate Bayesian inference for latent variable models

The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 121
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Memory bounded inference in topic models

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Variational inference in nonconjugate models

Hybrid variational/gibbs collapsed inference in topic models

Mean-field variational approximate Bayesian inference for latent variable models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Memory bounded inference in topic models

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Variational inference in nonconjugate models

Hybrid variational/gibbs collapsed inference in topic models

Mean-field variational approximate Bayesian inference for latent variable models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media