Skip to main content
Top

2017 | OriginalPaper | Chapter

Stochastic Bounds for Inference in Topic Models

Authors : Xuan Bui, Tu Vu, Khoat Than

Published in: Advances in Information and Communication Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. The problem of posterior inference for individual texts is particularly important in streaming environments, but often intractable in the worst case. Some existing methods for posterior inference are approximate but do not have any guarantee on neither quality nor convergence rate. Online Maximum a Posterior Estimation algorithm (OPE) [13] has more attractive properties than existing inference approaches, including theoretical guarantees on quality and fast convergence rate. In this paper, we introduce three new algorithms to improve OPE (so called OPE1, OPE2, OPE3) by using stochastic bounds when doing inference. Our algorithms not only maintain the key advantages of OPE but often outperform OPE and existing algorithms. Our new algorithms have been employed to develop new effective methods for learning topic models from massive/streaming text collections.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
A saddle point is not always a (local) maximal point. Further, the inference might have exponentially large number of saddle points.
 
2
The detailed proof of this theorem will be presented in another paper.
 
Literature
1.
go back to reference Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 27–34 (2009) Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 27–34 (2009)
2.
go back to reference Liu, B., Liu, L., Tsykin, A., Goodall, G.J., Green, J.E., Zhu, M., Kim, C.H., Li, J.: Identifying functional miRNA-mRNA regulatory modules with correspondence latent Dirichlet allocation. Bioinformatics 26(24), 3105 (2010)CrossRef Liu, B., Liu, L., Tsykin, A., Goodall, G.J., Green, J.E., Zhu, M., Kim, C.H., Li, J.: Identifying functional miRNA-mRNA regulatory modules with correspondence latent Dirichlet allocation. Bioinformatics 26(24), 3105 (2010)CrossRef
3.
go back to reference Sontag, D., Roy, D.M.: Complexity of inference in latent Dirichlet allocation. In: Advances in Neural Information Processing Systems (NIPS) (2011) Sontag, D., Roy, D.M.: Complexity of inference in latent Dirichlet allocation. In: Advances in Neural Information Processing Systems (NIPS) (2011)
4.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(3), 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(3), 993–1022 (2003)MATH
6.
go back to reference Mimno, D.: Computational historiography: data mining in a century of classics journals. J. Comput. Cult. Heritage 5(1), 3 (2012)CrossRef Mimno, D.: Computational historiography: data mining in a century of classics journals. J. Comput. Cult. Heritage 5(1), 3 (2012)CrossRef
7.
go back to reference Mimno, D., Hoffman, M.D., Blei, D.M.: Sparse stochastic inference for latent Dirichlet allocation. In: Proceedings of the 29th Annual International Conference on Machine Learning (2012) Mimno, D., Hoffman, M.D., Blei, D.M.: Sparse stochastic inference for latent Dirichlet allocation. In: Proceedings of the 29th Annual International Conference on Machine Learning (2012)
8.
go back to reference Schwartz, H.A., Eichstaedt, J.C., Dziurzynski, L., Kern, M.L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Toward personality insights from language exploration in social media. In: AAAI Spring Symposium Series (2013) Schwartz, H.A., Eichstaedt, J.C., Dziurzynski, L., Kern, M.L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Toward personality insights from language exploration in social media. In: AAAI Spring Symposium Series (2013)
9.
go back to reference Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000) Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000)
10.
go back to reference Mairal, J.: Stochastic majorization-minimization algorithms for large-scale optimization. In: Advances in Neural Information Processing Systems, pp. 2283–2291 (2013) Mairal, J.: Stochastic majorization-minimization algorithms for large-scale optimization. In: Advances in Neural Information Processing Systems, pp. 2283–2291 (2013)
11.
go back to reference Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the Association for Computational Linguistics, pp. 530–539 (2014) Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the Association for Computational Linguistics, pp. 530–539 (2014)
12.
go back to reference Grimmer, J.: A Bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Polit. Anal. 18(1), p1–35 (2010)CrossRef Grimmer, J.: A Bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Polit. Anal. 18(1), p1–35 (2010)CrossRef
14.
go back to reference Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)MathSciNetMATH Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)MathSciNetMATH
15.
go back to reference Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics, pp. 13–22 (2013) Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th International Conference on Computational Semantics, pp. 13–22 (2013)
16.
go back to reference Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRefMATH Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)MathSciNetCrossRefMATH
17.
go back to reference Gerrish, S., Blei, D.: How they vote: issue-adjusted models of legislative behavior. Adv. Neural Inf. Process. Syst. 25, 2762–2770 (2012) Gerrish, S., Blei, D.: How they vote: issue-adjusted models of legislative behavior. Adv. Neural Inf. Process. Syst. 25, 2762–2770 (2012)
18.
go back to reference Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America (2004) Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences of the United States of America (2004)
19.
go back to reference Teh, Y.W., Newman, D.M., Welling, A.: Collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 19, pp. 13–53 (2007) Teh, Y.W., Newman, D.M., Welling, A.: Collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, vol. 19, pp. 13–53 (2007)
Metadata
Title
Stochastic Bounds for Inference in Topic Models
Authors
Xuan Bui
Tu Vu
Khoat Than
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-49073-1_62

Premium Partner