Skip to main content
Top
Published in:

24-11-2022

Stein Variational Gradient Descent with Multiple Kernels

Authors: Qingzhong Ai, Shiyu Liu, Lirong He, Zenglin Xu

Published in: Cognitive Computation | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Bayesian inference is an important research area in cognitive computation due to its ability to reason under uncertainty in machine learning. As a representative algorithm, Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. In practice, we notice that the kernel used in SVGD-based methods has a decisive effect on the empirical performance. Radial basis function (RBF) kernel with median heuristics is a common choice in previous approaches, but unfortunately, this has proven to be sub-optimal. Inspired by the paradigm of Multiple Kernel Learning (MKL), our solution to this flaw is using a combination of multiple kernels to approximate the optimal kernel, rather than a single one which may limit the performance and flexibility. Specifically, we first extend Kernelized Stein Discrepancy (KSD) to its multiple kernels view called Multiple Kernelized Stein Discrepancy (MKSD) and then leverage MKSD to construct a general algorithm Multiple Kernel SVGD (MK-SVGD). Further, MK-SVGD can automatically assign a weight to each kernel without any other parameters, which means that our method not only gets rid of optimal kernel dependence but also maintains computational efficiency. Experiments on various tasks and models demonstrate that our proposed method consistently matches or outperforms the competing methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Faix M, Mazer E, Laurent R, Abdallah MO, LeHy R, Lobo J. Cognitive computation: a bayesian machine case study. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE. 2015;67-75. Faix M, Mazer E, Laurent R, Abdallah MO, LeHy R, Lobo J. Cognitive computation: a bayesian machine case study. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE. 2015;67-75.
2.
go back to reference Chater N, Oaksford M, Hahn U, Heit E. Bayesian models of cognition. Wiley Interdisciplinary Reviews: Cognitive Science. 2010;1(6):811–23. Chater N, Oaksford M, Hahn U, Heit E. Bayesian models of cognition. Wiley Interdisciplinary Reviews: Cognitive Science. 2010;1(6):811–23.
3.
go back to reference Knill DC, Richards W. Perception as Bayesian inference. Cambridge University Press. 1996. Knill DC, Richards W. Perception as Bayesian inference. Cambridge University Press. 1996.
4.
go back to reference Neal RM, et al. MCMC using Hamiltonian dynamics. Handbook of markov chain monte carlo. 2011;2(11):2.MATH Neal RM, et al. MCMC using Hamiltonian dynamics. Handbook of markov chain monte carlo. 2011;2(11):2.MATH
5.
go back to reference Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(1):1593–623.MathSciNetMATH Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res. 2014;15(1):1593–623.MathSciNetMATH
6.
go back to reference Zhang R, Li C, Zhang J, Chen C, Wilson AG. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning. International Conference on Learning Representations. 2020. Zhang R, Li C, Zhang J, Chen C, Wilson AG. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning. International Conference on Learning Representations. 2020.
8.
go back to reference Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: A review for statisticians. J Am Stat Assoc. 2017;112(518):859–77.MathSciNetCrossRef Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: A review for statisticians. J Am Stat Assoc. 2017;112(518):859–77.MathSciNetCrossRef
9.
go back to reference Liu Q, Wang D. Stein variational gradient descent: A general purpose bayesian inference algorithm. In: Adv Neural Inf Process Syst. 2016;2378-86. Liu Q, Wang D. Stein variational gradient descent: A general purpose bayesian inference algorithm. In: Adv Neural Inf Process Syst. 2016;2378-86.
10.
go back to reference Chen C, Zhang R, Wang W, Li B, Chen L. A unified particle-optimization framework for scalable Bayesian sampling. arXiv preprint arXiv:1805.11659. 2018. Chen C, Zhang R, Wang W, Li B, Chen L. A unified particle-optimization framework for scalable Bayesian sampling. arXiv preprint arXiv:​1805.​11659. 2018.
11.
go back to reference Liu C, Zhuo J, Cheng P, Zhang R, Zhu J, Carin L. Accelerated first-order methods on the Wasserstein space for Bayesian inference. stat. 2018;1050:4. Liu C, Zhuo J, Cheng P, Zhang R, Zhu J, Carin L. Accelerated first-order methods on the Wasserstein space for Bayesian inference. stat. 2018;1050:4.
12.
go back to reference Zhang J, Zhang R, Carin L, Chen C. Stochastic particle-optimization sampling and the non-asymptotic convergence theory. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020;1877-87. Zhang J, Zhang R, Carin L, Chen C. Stochastic particle-optimization sampling and the non-asymptotic convergence theory. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020;1877-87.
13.
go back to reference Zhang C, Li Z, Qian H, Du X. DPVI A Dynamic-Weight Particle-Based Variational Inference Framework. arXiv preprint arXiv:2112.00945. 2021. Zhang C, Li Z, Qian H, Du X. DPVI A Dynamic-Weight Particle-Based Variational Inference Framework. arXiv preprint arXiv:​2112.​00945. 2021.
14.
go back to reference Liu C, Zhuo J, Cheng P, Zhang R, Zhu J. Understanding and accelerating particle-based variational inference. In: International Conference on Machine Learning. 2019;4082-92. Liu C, Zhuo J, Cheng P, Zhang R, Zhu J. Understanding and accelerating particle-based variational inference. In: International Conference on Machine Learning. 2019;4082-92.
15.
go back to reference Han J, Liu Q. Stein variational gradient descent without gradient. In: International Conference on Machine Learning. PMLR. 2018;1900-8. Han J, Liu Q. Stein variational gradient descent without gradient. In: International Conference on Machine Learning. PMLR. 2018;1900-8.
16.
go back to reference Detommaso G, Cui T, Marzouk Y, Spantini A, Scheichl R. A Stein variational Newton method. In: Adv Neural Inf Process Syst. 2018;9169-79. Detommaso G, Cui T, Marzouk Y, Spantini A, Scheichl R. A Stein variational Newton method. In: Adv Neural Inf Process Syst. 2018;9169-79.
17.
go back to reference Wang D, Tang Z, Bajaj C, Liu Q. Stein variational gradient descent with matrix-valued kernels. In: Adv Neural Inf Process Syst. 2019;7836-46. Wang D, Tang Z, Bajaj C, Liu Q. Stein variational gradient descent with matrix-valued kernels. In: Adv Neural Inf Process Syst. 2019;7836-46.
18.
go back to reference Gorham J, Mackey L. Measuring sample quality with kernels. In: International Conference on Machine Learning. PMLR. 2017;1292-301. Gorham J, Mackey L. Measuring sample quality with kernels. In: International Conference on Machine Learning. PMLR. 2017;1292-301.
19.
go back to reference Hofmann T, Schölkopf B, Smola AJ. Kernel methods in machine learning. The annals of statistics. 2008;1171-220. Hofmann T, Schölkopf B, Smola AJ. Kernel methods in machine learning. The annals of statistics. 2008;1171-220.
20.
go back to reference Han J, Ding F, Liu X, Torresani L, Peng J, Liu Q. Stein variational inference for discrete distributions. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020;4563-72. Han J, Ding F, Liu X, Torresani L, Peng J, Liu Q. Stein variational inference for discrete distributions. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020;4563-72.
21.
go back to reference Liu Q, Lee J, Jordan M. A kernelized Stein discrepancy for goodness-of-fit tests. In: International conference on machine learning. 2016;276-84. Liu Q, Lee J, Jordan M. A kernelized Stein discrepancy for goodness-of-fit tests. In: International conference on machine learning. 2016;276-84.
22.
go back to reference Berlinet A, Thomas-Agnan C. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media. 2011. Berlinet A, Thomas-Agnan C. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media. 2011.
23.
go back to reference Stein C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. vol.6. University of California Press. 1972;583-603. Stein C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. vol.6. University of California Press. 1972;583-603.
25.
go back to reference Gorham J. Measuring sample quality with Stein’s method. Stanford University. 2017. Gorham J. Measuring sample quality with Stein’s method. Stanford University. 2017.
26.
go back to reference Wilson AG, Hu Z, Salakhutdinov R, Xing EP. Deep kernel learning. In: Artificial intelligence and statistics. PMLR. 2016;370-8. Wilson AG, Hu Z, Salakhutdinov R, Xing EP. Deep kernel learning. In: Artificial intelligence and statistics. PMLR. 2016;370-8.
27.
go back to reference Kang Z, Lu X, Yi J, Xu Z. Self-weighted multiple kernel learning for graph-based clustering and semi-supervised classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018;2312-8. Kang Z, Lu X, Yi J, Xu Z. Self-weighted multiple kernel learning for graph-based clustering and semi-supervised classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018;2312-8.
28.
go back to reference Xu Z, Jin R, King I, Lyu M. An extended level method for efficient multiple kernel learning. In: Adv Neural Inf Process Syst. 2009;1825-32. Xu Z, Jin R, King I, Lyu M. An extended level method for efficient multiple kernel learning. In: Adv Neural Inf Process Syst. 2009;1825-32.
29.
go back to reference Xu Z, Jin R, Yang H, King I, Lyu MR. Simple and efficient multiple kernel learning by group lasso. In: Proceedings of the 27th international conference on machine learning (ICML-10). Citeseer. 2010;1175-82. Xu Z, Jin R, Yang H, King I, Lyu MR. Simple and efficient multiple kernel learning by group lasso. In: Proceedings of the 27th international conference on machine learning (ICML-10). Citeseer. 2010;1175-82.
30.
31.
go back to reference Huang S, Kang Z, Tsang IW, Xu Z. Auto-weighted multi-view clustering via kernelized graph learning. Pattern Recognit. 2019;88:174–84.CrossRef Huang S, Kang Z, Tsang IW, Xu Z. Auto-weighted multi-view clustering via kernelized graph learning. Pattern Recognit. 2019;88:174–84.CrossRef
32.
go back to reference Zhang Q, Kang Z, Xu Z, Huang S. Fu H. Spaks: Self-paced multiple kernel subspace clustering with feature smoothing regularization. Knowl Based Syst. 2022;109500. Zhang Q, Kang Z, Xu Z, Huang S. Fu H. Spaks: Self-paced multiple kernel subspace clustering with feature smoothing regularization. Knowl Based Syst. 2022;109500.
33.
go back to reference Pan Z, Zhang H, Liang C, Li G, Xiao Q, Ding P, et al. Self-Weighted Multi-Kernel Multi-Label Learning for Potential miRNA-Disease Association Prediction. Molecular Therapy-Nucleic Acids. 2019;17:414–23.CrossRef Pan Z, Zhang H, Liang C, Li G, Xiao Q, Ding P, et al. Self-Weighted Multi-Kernel Multi-Label Learning for Potential miRNA-Disease Association Prediction. Molecular Therapy-Nucleic Acids. 2019;17:414–23.CrossRef
34.
go back to reference Feng Y, Wang D, Liu Q. Learning to draw samples with amortized stein variational gradient descent. arXiv preprint arXiv:1707.06626. 2017. Feng Y, Wang D, Liu Q. Learning to draw samples with amortized stein variational gradient descent. arXiv preprint arXiv:​1707.​06626. 2017.
35.
go back to reference Pu Y, Gan Z, Henao R, Li C, Han S, Carin L. Vae learning via stein variational gradient descent. In: Adv Neural Inf Process Syst. 2017;4236-45. Pu Y, Gan Z, Henao R, Li C, Han S, Carin L. Vae learning via stein variational gradient descent. In: Adv Neural Inf Process Syst. 2017;4236-45.
37.
go back to reference Korba A, Salim A, Arbel M, Luise G, Gretton A. A non-asymptotic analysis for Stein variational gradient descent. Adv Neural Inf Process Syst. 2020;33:4672–82. Korba A, Salim A, Arbel M, Luise G, Gretton A. A non-asymptotic analysis for Stein variational gradient descent. Adv Neural Inf Process Syst. 2020;33:4672–82.
38.
go back to reference Liu X, Tong X, Liu Q. Profiling pareto front with multi-objective stein variational gradient descent. Adv Neural Inf Process Syst. 2021;34. Liu X, Tong X, Liu Q. Profiling pareto front with multi-objective stein variational gradient descent. Adv Neural Inf Process Syst. 2021;34.
39.
go back to reference Chen P, Ghattas O. Projected Stein variational gradient descent. Adv Neural Inf Process Syst. 2020;33:1947–58. Chen P, Ghattas O. Projected Stein variational gradient descent. Adv Neural Inf Process Syst. 2020;33:1947–58.
40.
go back to reference Jaini P, Holdijk L, Welling M. Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent. Adv Neural Inf Process Syst. 2021;34. Jaini P, Holdijk L, Welling M. Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent. Adv Neural Inf Process Syst. 2021;34.
41.
go back to reference Ba J, Erdogdu MA, Ghassemi M, Sun S, Suzuki T, Wu D, etal. Understanding the Variance Collapse of SVGD in High Dimensions. In: International Conference on Learning Representations. 2021. Ba J, Erdogdu MA, Ghassemi M, Sun S, Suzuki T, Wu D, etal. Understanding the Variance Collapse of SVGD in High Dimensions. In: International Conference on Learning Representations. 2021.
42.
go back to reference Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(7). Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12(7).
43.
go back to reference Hernández-Lobato JM, Adams R. Probabilistic backpropagation for scalable learning of bayesian neural networks. In: International Conference On Machine Learning. PMLR. 2015; 1861-9. Hernández-Lobato JM, Adams R. Probabilistic backpropagation for scalable learning of bayesian neural networks. In: International Conference On Machine Learning. PMLR. 2015; 1861-9.
Metadata
Title
Stein Variational Gradient Descent with Multiple Kernels
Authors
Qingzhong Ai
Shiyu Liu
Lirong He
Zenglin Xu
Publication date
24-11-2022
Publisher
Springer US
Published in
Cognitive Computation / Issue 2/2023
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-022-10069-5

Other articles of this Issue 2/2023

Cognitive Computation 2/2023 Go to the issue

Premium Partner