nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

verfasst von : Anthony Sicilia, Xingchen Zhao, Anastasia Sosnovskikh, Seong Jae Hwang

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Application of deep neural networks to medical imaging tasks has in some sense become commonplace. Still, a “thorn in the side” of the deep learning movement is the argument that deep networks are prone to overfitting and are thus unable to generalize well when datasets are small (as is common in medical imaging tasks). One way to bolster confidence is to provide mathematical guarantees, or bounds, on network performance after training which explicitly quantify the possibility of overfitting. In this work, we explore recent advances using the PAC-Bayesian framework to provide bounds on generalization error for large (stochastic) networks. While previous efforts focus on classification in larger natural image datasets (e.g., MNIST and CIFAR-10), we apply these techniques to both classification and segmentation in a smaller medical imagining dataset: the ISIC 2018 challenge set. We observe the resultant bounds are competitive compared to a simpler baseline, while also being more explainable and alleviating the need for holdout sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Improving the Explainability of Skin Cancer Diagnosis Using CBIR

Nächstes Kapitel Medical Matting: A New Perspective on Medical Segmentation with Uncertainty

Nur mit Berechtigung zugänglich

Guarantees in this paper are probabilistic. Similar to confidence intervals, one should interpret with care: the guarantees hold with high probability prior to observing data.

We (very roughly) estimate this by Thm. 6.11 of Shalev-Shwartz & Ben-David [46]. Bartlett et al. [7] provide tight bounds on VC dimension of ReLU networks. Based on these, the sample size must be magnitudes larger than the parameter count for a small generalization gap. See Appendix for additional details and a plot.

PAC-Bayes is attributed to McAllester [37]; also, Shawe-Taylor & Williamson [47].

Early formulations of this hypothesis are due to Hochreiter & Schmidhuber [25].

Sometimes, in classification, this may be called the Gibbs classifier. Not to be confused with the “deterministic”, majority vote classifier. An insightful discussion on the relationship between risk in these distinct cases is provided by Germain et al. [19].

For example, see Ambroladze et al. [1], Parrado-Hernández et al. [42], Pérez-Ortiz et al. [43], and Dziugaite et al. [15, 17].

See Freund [18] or Langford & Blum [30].

We provide additional details on this procedure in the Appendix.

These datasets have 60K and 50K labeled examples, respectively.

See Pérez-Ortiz et al. [43] for more detailed discussion.

We refer here to both the running statistics and any learned weights.

See Dziugaite et al. [15] who coin the term “prefix”.

Notice, another approach might be to the fix the posterior mean at the result of, say, the run with \(\sigma _\mathrm {p}=0.01\) and then modulate the variance from this fixed location. We are not guaranteed this run will be near the center of a minimum, and so, may underestimate the minimum’s size by this procedure. Our approach, instead, allows the center of the posterior to change (slightly) when the variance grows.

Ambroladze, A., Parrado-Hernández, E., Shawe-Taylor, J.: Tighter PAC-Bayes Bounds (2007)

Arora, S.: Generalization Theory and Deep Nets, An introduction (2017). https://www.offconvex.org/2017/12/08/generalization1/

Baldassi, C., et al.: Unreasonable effectiveness of learning neural networks: from accessible states and robust ensembles to basic algorithmic schemes. PNAS 113, E7655–E7662 (2016)CrossRef

Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., Zecchina, R.: Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Phys. Rev. Lett. 115, 128101 (2015)CrossRef

Bartlett, P.L.: For valid generalization, the size of the weights is more important than the size of the network (1997)

Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44, 525–536 (1998)MathSciNetCrossRef

Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. JMLR 20, 2285–2301 (2019)MathSciNetMATH

Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36, 929–965 (1989)MathSciNetCrossRef

Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: ICML (2015)

10.

Bousquet, O., Elisseeff, A.: Stability and generalization. JMLR 2, 499–526 (2002)MathSciNetMATH

11.

Catoni, O.: PAC-Bayesian supervised classification: the thermodynamics of statistical learning. arXiv:0712.0248v1 (2007)

12.

Chaudhari, P., et al.: Entropy-SGD: biasing gradient descent into wide valleys. arXiv:1611.01838v5 (2016)

13.

Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368v2 (2019)

14.

Dziugaite, G.K., et al.: In search of robust measures of generalization. arXiv:2010.11924v2 (2020)

15.

Dziugaite, G.K., Hsu, K., Gharbieh, W., Roy, D.M.: On the role of data in PAC-Bayes bounds. arXiv:2006.10929v2 (2020)

16.

Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv:1703.11008v2 (2017)

17.

Dziugaite, G.K., Roy, D.M.: Data-dependent PAC-Bayes priors via differential privacy. In: NeurIPS (2018)

18.

Freund, Y.: Self bounding learning algorithms. In: COLT (1998)

19.

Germain, P., Lacasse, A., Laviolette, F., March, M., Roy, J.F.: Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. JMLR 16, 787–860 (2015)MathSciNetMATH

20.

Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML (2009)

21.

Guedj, B.: A primer on PAC-Bayesian learning. arXiv:1901.05353v3 (2019)

22.

Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. In: ICML (2016)

23.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

24.

Hinton, G.E., Van Camp, D.: Keeping the neural networks simple by minimizing the description length of the weights. In: COLT (1993)

25.

Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9, 1–42 (1997)CrossRef

26.

Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. arXiv:1912.02178v1 (2019)

27.

Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv:1609.04836v2 (2016)

28.

Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

29.

Kuzborskij, I., Lampert, C.: Data-dependent stability of stochastic gradient descent. In: ICML (2018)

30.

Langford, J., Blum, A.: Microchoice bounds and self bounding learning algorithms. Mach. Learn. 51, 165–179 (2003)CrossRef

31.

Langford, J., Caruana, R.: (Not) bounding the true error. In: NeurIPS (2002)

32.

Langford, J., Schapire, R.: Tutorial on practical prediction theory for classification. JMLR 6, 273–306 (2005)MathSciNetMATH

33.

Langford, J., Seeger, M.: Bounds for averaging classifiers (2001)

34.

LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/

35.

Maurer, A.: A note on the PAC Bayesian theorem. arXiv:cs/0411099v1 (2004)

36.

McAllester, D.: A PAC-Bayesian tutorial with a dropout bound. arXiv:1307.2118v1 (2013)

37.

McAllester, D.A.: Some PAC-Bayesian theorems. Mach. Learn. 37, 355–363 (1999)CrossRef

38.

Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)

39.

Mou, W., Wang, L., Zhai, X., Zheng, K.: Generalization bounds of SGLD for non-convex learning: two theoretical viewpoints. In: COLT (2018)

40.

Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. arXiv:1706.08947v2 (2017)

41.

Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. arXiv:1412.6614v4 (2014)

42.

Parrado-Hernández, E., Ambroladze, A., Shawe-Taylor, J., Sun, S.: PAC-Bayes bounds with data dependent priors. JMLR 13, 3507–3531 (2012)MathSciNetMATH

43.

Pérez-Ortiz, M., Rivasplata, O., Shawe-Taylor, J., Szepesvári, C.: Tighter risk certificates for neural networks. arXiv:2007.12911v2 (2020)

44.

Rivasplata, O., Tankasali, V.M., Szepesvari, C.: PAC-Bayes with backprop. arXiv:1908.07380v5 (2019)

45.

Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28CrossRef

46.

Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRef

47.

Shawe-Taylor, J., Williamson, R.C.: A PAC analysis of a Bayesian estimator. In: COLT (1997)

48.

Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984)CrossRef

49.

Vapnik, V.N., Chervonenkis, A.Y.: On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei Primeneniya 16 (1971)

Titel: PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging
verfasst von: Anthony Sicilia
Xingchen Zhao
Anastasia Sosnovskikh
Seong Jae Hwang
Verlag: Springer International Publishing
Buch: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Print ISBN: 978-3-030-87198-7

Electronic ISBN: 978-3-030-87199-4

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-87199-4_53

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"