Skip to main content

2021 | OriginalPaper | Buchkapitel

PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

verfasst von : Anthony Sicilia, Xingchen Zhao, Anastasia Sosnovskikh, Seong Jae Hwang

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a “thorn in the side” of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds, on network performance after training which explicitly quantify the possibility of overfitting. In this work, we explore recent advances using the PAC-Bayesian framework to provide bounds on generalization error for large (stochastic) networks. While previous efforts focus on classification in larger natural image datasets (e.g., MNIST and CIFAR-10), we apply these techniques to both classification and segmentation in a smaller medical imagining dataset: the ISIC 2018 challenge set. We observe the resultant bounds are competitive compared to a simpler baseline, while also being more explainable and alleviating the need for holdout sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Guarantees in this paper are probabilistic. Similar to confidence intervals, one should interpret with care: the guarantees hold with high probability prior to observing data.
 
2
We (very roughly) estimate this by Thm. 6.11 of Shalev-Shwartz & Ben-David [46]. Bartlett et al. [7] provide tight bounds on VC dimension of ReLU networks. Based on these, the sample size must be magnitudes larger than the parameter count for a small generalization gap. See Appendix for additional details and a plot.
 
3
PAC-Bayes is attributed to McAllester [37]; also, Shawe-Taylor & Williamson [47].
 
4
Early formulations of this hypothesis are due to Hochreiter & Schmidhuber [25].
 
5
Sometimes, in classification, this may be called the Gibbs classifier. Not to be confused with the “deterministic”, majority vote classifier. An insightful discussion on the relationship between risk in these distinct cases is provided by Germain et al. [19].
 
6
For example, see Ambroladze et al. [1], Parrado-Hernández et al. [42], Pérez-Ortiz et al. [43], and Dziugaite et al. [15, 17].
 
7
See Freund [18] or Langford & Blum [30].
 
8
We provide additional details on this procedure in the Appendix.
 
9
These datasets have 60K and 50K labeled examples, respectively.
 
10
See Pérez-Ortiz et al. [43] for more detailed discussion.
 
11
We refer here to both the running statistics and any learned weights.
 
12
See Dziugaite et al. [15] who coin the term “prefix”.
 
13
Notice, another approach might be to the fix the posterior mean at the result of, say, the run with \(\sigma _\mathrm {p}=0.01\) and then modulate the variance from this fixed location. We are not guaranteed this run will be near the center of a minimum, and so, may underestimate the minimum’s size by this procedure. Our approach, instead, allows the center of the posterior to change (slightly) when the variance grows.
 
Literatur
1.
Zurück zum Zitat Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes Bounds (2007) Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes Bounds (2007)
3.
Zurück zum Zitat Baldassi, C., et al.: Unreasonable effectiveness of learning neural networks: from accessible states and robust ensembles to basic algorithmic schemes. PNAS 113, E7655–E7662 (2016)CrossRef Baldassi, C., et al.: Unreasonable effectiveness of learning neural networks: from accessible states and robust ensembles to basic algorithmic schemes. PNAS 113, E7655–E7662 (2016)CrossRef
4.
Zurück zum Zitat Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., Zecchina, R.: Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Phys. Rev. Lett. 115, 128101 (2015)CrossRef Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., Zecchina, R.: Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Phys. Rev. Lett. 115, 128101 (2015)CrossRef
5.
Zurück zum Zitat Bartlett, P.L.: For valid generalization, the size of the weights is more important than the size of the network (1997) Bartlett, P.L.: For valid generalization, the size of the weights is more important than the size of the network (1997)
6.
Zurück zum Zitat Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44, 525–536 (1998)MathSciNetCrossRef Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44, 525–536 (1998)MathSciNetCrossRef
7.
Zurück zum Zitat Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. JMLR 20, 2285–2301 (2019)MathSciNetMATH Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. JMLR 20, 2285–2301 (2019)MathSciNetMATH
8.
Zurück zum Zitat Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36, 929–965 (1989)MathSciNetCrossRef Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36, 929–965 (1989)MathSciNetCrossRef
9.
Zurück zum Zitat Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015) Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015)
10.
13.
Zurück zum Zitat Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368v2 (2019) Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:​1902.​03368v2 (2019)
16.
Zurück zum Zitat Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv:1703.11008v2 (2017) Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv:​1703.​11008v2 (2017)
17.
Zurück zum Zitat Dziugaite, G.K., Roy, D.M.: Data-dependent PAC-Bayes priors via differential privacy. In: NeurIPS (2018) Dziugaite, G.K., Roy, D.M.: Data-dependent PAC-Bayes priors via differential privacy. In: NeurIPS (2018)
18.
Zurück zum Zitat Freund, Y.: Self bounding learning algorithms. In: COLT (1998) Freund, Y.: Self bounding learning algorithms. In: COLT (1998)
19.
Zurück zum Zitat Germain, P., Lacasse, A., Laviolette, F., March, M., Roy, J.F.: Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. JMLR 16, 787–860 (2015)MathSciNetMATH Germain, P., Lacasse, A., Laviolette, F., March, M., Roy, J.F.: Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. JMLR 16, 787–860 (2015)MathSciNetMATH
20.
Zurück zum Zitat Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML (2009) Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML (2009)
22.
Zurück zum Zitat Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. In: ICML (2016) Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. In: ICML (2016)
23.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
24.
Zurück zum Zitat Hinton, G.E., Van Camp, D.: Keeping the neural networks simple by minimizing the description length of the weights. In: COLT (1993) Hinton, G.E., Van Camp, D.: Keeping the neural networks simple by minimizing the description length of the weights. In: COLT (1993)
25.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9, 1–42 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9, 1–42 (1997)CrossRef
26.
Zurück zum Zitat Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. arXiv:1912.02178v1 (2019) Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. arXiv:​1912.​02178v1 (2019)
27.
Zurück zum Zitat Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv:1609.04836v2 (2016) Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv:​1609.​04836v2 (2016)
28.
Zurück zum Zitat Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
29.
Zurück zum Zitat Kuzborskij, I., Lampert, C.: Data-dependent stability of stochastic gradient descent. In: ICML (2018) Kuzborskij, I., Lampert, C.: Data-dependent stability of stochastic gradient descent. In: ICML (2018)
30.
Zurück zum Zitat Langford, J., Blum, A.: Microchoice bounds and self bounding learning algorithms. Mach. Learn. 51, 165–179 (2003)CrossRef Langford, J., Blum, A.: Microchoice bounds and self bounding learning algorithms. Mach. Learn. 51, 165–179 (2003)CrossRef
31.
Zurück zum Zitat Langford, J., Caruana, R.: (Not) bounding the true error. In: NeurIPS (2002) Langford, J., Caruana, R.: (Not) bounding the true error. In: NeurIPS (2002)
32.
Zurück zum Zitat Langford, J., Schapire, R.: Tutorial on practical prediction theory for classification. JMLR 6, 273–306 (2005)MathSciNetMATH Langford, J., Schapire, R.: Tutorial on practical prediction theory for classification. JMLR 6, 273–306 (2005)MathSciNetMATH
33.
Zurück zum Zitat Langford, J., Seeger, M.: Bounds for averaging classifiers (2001) Langford, J., Seeger, M.: Bounds for averaging classifiers (2001)
37.
Zurück zum Zitat McAllester, D.A.: Some PAC-Bayesian theorems. Mach. Learn. 37, 355–363 (1999)CrossRef McAllester, D.A.: Some PAC-Bayesian theorems. Mach. Learn. 37, 355–363 (1999)CrossRef
38.
Zurück zum Zitat Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016) Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
39.
Zurück zum Zitat Mou, W., Wang, L., Zhai, X., Zheng, K.: Generalization bounds of SGLD for non-convex learning: two theoretical viewpoints. In: COLT (2018) Mou, W., Wang, L., Zhai, X., Zheng, K.: Generalization bounds of SGLD for non-convex learning: two theoretical viewpoints. In: COLT (2018)
40.
41.
Zurück zum Zitat Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. arXiv:1412.6614v4 (2014) Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. arXiv:​1412.​6614v4 (2014)
42.
Zurück zum Zitat Parrado-Hernández, E., Ambroladze, A., Shawe-Taylor, J., Sun, S.: PAC-Bayes bounds with data dependent priors. JMLR 13, 3507–3531 (2012)MathSciNetMATH Parrado-Hernández, E., Ambroladze, A., Shawe-Taylor, J., Sun, S.: PAC-Bayes bounds with data dependent priors. JMLR 13, 3507–3531 (2012)MathSciNetMATH
43.
Zurück zum Zitat Pérez-Ortiz, M., Rivasplata, O., Shawe-Taylor, J., Szepesvári, C.: Tighter risk certificates for neural networks. arXiv:2007.12911v2 (2020) Pérez-Ortiz, M., Rivasplata, O., Shawe-Taylor, J., Szepesvári, C.: Tighter risk certificates for neural networks. arXiv:​2007.​12911v2 (2020)
46.
Zurück zum Zitat Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRef Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRef
47.
Zurück zum Zitat Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian estimator. In: COLT (1997) Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian estimator. In: COLT (1997)
48.
Zurück zum Zitat Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)CrossRef Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)CrossRef
49.
Zurück zum Zitat Vapnik, V.N., Chervonenkis, A.Y.: On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei Primeneniya 16 (1971) Vapnik, V.N., Chervonenkis, A.Y.: On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei Primeneniya 16 (1971)
Metadaten
Titel
PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging
verfasst von
Anthony Sicilia
Xingchen Zhao
Anastasia Sosnovskikh
Seong Jae Hwang
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-87199-4_53