nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

Training Behavior of Deep Neural Network in Frequency Domain

verfasst von : Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao

Erschienen in: Neural Information Processing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [24]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle helps understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insight into a general principle underlying DNN optimization and generalization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Improving Deep Learning by Regularized Scale-Free MSE of Representations

Nächstes Kapitel On the Initialization of Long Short-Term Memory Networks

Nur mit Berechtigung zugänglich

Almost at the same time, another research [15] finds a similar result. However, they add noise to MNIST, which contaminates the labels.

The bias terms are always initialized by standard deviation 0.1.

Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016)

Arpit, D., et al.: A closer look at memorization in deep networks. arXiv preprint arXiv:1706.05394 (2017)

Barnett, A., Greengard, L., Pataki, A., Spivak, M.: Rapid solution of the cryo-EM reconstruction problem by frequency marching. SIAM J. Imaging Sci. 10(3), 1170–1195 (2017)MathSciNetCrossRef

Cai, W., Li, X., Liu, L.: Phasednn-a parallel phase shift deep neural network for adaptive wideband learning. arXiv preprint arXiv:1905.01389 (2019)

Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer Science & Business Media (2013)

Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. arXiv preprint arXiv:1509.01240 (2015)

Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef

10.

Lin, J., Camoriano, R., Rosasco, L.: Generalization properties and implicit regularization for multiple passes SGM. In: International Conference on Machine Learning, pp. 2340–2348 (2016)

11.

Martin, C.H., Mahoney, M.W.: Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. arXiv preprint arXiv:1710.09553 (2017)

12.

Mishali, M., Eldar, Y.C.: Blind multiband signal reconstruction: compressed sensing for analog signals. IEEE Trans. Signal Process. 57(3), 993–1009 (2009)MathSciNetCrossRef

13.

Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications. Cambridge University Press, Cambridge (1993)CrossRef

14.

Rabinowitz, N.C.: Meta-learners’ learning dynamics are unlike learners’. arXiv preprint arXiv:1905.01320 (2019)

15.

Rahaman, N., et al.: On the spectral bias of deep neural networks. arXiv preprint arXiv:1806.08734 (2018)

16.

Saxe, A.M., Bansal, Y., Dapello, J., Advani, M.: On the information bottleneck theory of deep learning. In: International Conference on Learning Representations (2018)

17.

Shannon, C.E.: Communication in the presence of noise. Proc. IRE 37(1), 10–21 (1949)MathSciNetCrossRef

18.

Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017)

19.

Wu, L., Zhu, Z., Weinan, E.: Towards understanding generalization of deep learning: perspective of loss landscapes. arXiv preprint arXiv:1706.10239 (2017)

20.

Xu, Z.Q.J.: Frequency principle in deep learning with general loss functions and its potential application. arXiv preprint arXiv:1811.10146 (2018)

21.

Xu, Z.Q.J., Zhang, Y., Luo, T., Xiao, Y., Ma, Z.: Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523 (2019)

22.

Xu, Z.J.: Understanding training and generalization in deep learning by Fourier analysis. arXiv preprint arXiv:1808.04295 (2018)

23.

Yen, J.: On nonuniform sampling of bandwidth-limited signals. IRE Trans. Circuit Theory 3(4), 251–257 (1956)CrossRef

24.

Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

25.

Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: Explicitizing an implicit bias of the frequency principle in two-layer neural networks. arXiv:1905.10264 [cs, stat], May 2019

26.

Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: A type of generalization error induced by initialization in deep neural networks. arXiv:1905.07777 [cs, stat], May 2019

27.

Zhen, H.L., Lin, X., Tang, A.Z., Li, Z., Zhang, Q., Kwong, S.: Nonlinear collaborative scheme for deep neural networks. arXiv preprint arXiv:1811.01316 (2018)

28.

Zheng, G., Sang, J., Xu, C.: Understanding deep learning generalization by maximum entropy. arXiv preprint arXiv:1711.07758 (2017)

Titel: Training Behavior of Deep Neural Network in Frequency Domain
verfasst von: Zhi-Qin John Xu
Yaoyu Zhang
Yanyang Xiao
Verlag: Springer International Publishing
Buch: Neural Information Processing
Print ISBN: 978-3-030-36707-7

Electronic ISBN: 978-3-030-36708-4

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-36708-4_22

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner