Skip to main content

2019 | OriginalPaper | Buchkapitel

Training Behavior of Deep Neural Network in Frequency Domain

verfasst von : Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao

Erschienen in: Neural Information Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [24]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle helps understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insight into a general principle underlying DNN optimization and generalization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Almost at the same time, another research [15] finds a similar result. However, they add noise to MNIST, which contaminates the labels.
 
2
The bias terms are always initialized by standard deviation 0.1.
 
Literatur
1.
3.
Zurück zum Zitat Barnett, A., Greengard, L., Pataki, A., Spivak, M.: Rapid solution of the cryo-EM reconstruction problem by frequency marching. SIAM J. Imaging Sci. 10(3), 1170–1195 (2017)MathSciNetCrossRef Barnett, A., Greengard, L., Pataki, A., Spivak, M.: Rapid solution of the cryo-EM reconstruction problem by frequency marching. SIAM J. Imaging Sci. 10(3), 1170–1195 (2017)MathSciNetCrossRef
4.
Zurück zum Zitat Cai, W., Li, X., Liu, L.: Phasednn-a parallel phase shift deep neural network for adaptive wideband learning. arXiv preprint arXiv:1905.01389 (2019) Cai, W., Li, X., Liu, L.: Phasednn-a parallel phase shift deep neural network for adaptive wideband learning. arXiv preprint arXiv:​1905.​01389 (2019)
5.
Zurück zum Zitat Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer Science & Business Media (2013) Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer Science & Business Media (2013)
6.
Zurück zum Zitat Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. arXiv preprint arXiv:1509.01240 (2015) Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: stability of stochastic gradient descent. arXiv preprint arXiv:​1509.​01240 (2015)
9.
Zurück zum Zitat LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef
10.
Zurück zum Zitat Lin, J., Camoriano, R., Rosasco, L.: Generalization properties and implicit regularization for multiple passes SGM. In: International Conference on Machine Learning, pp. 2340–2348 (2016) Lin, J., Camoriano, R., Rosasco, L.: Generalization properties and implicit regularization for multiple passes SGM. In: International Conference on Machine Learning, pp. 2340–2348 (2016)
11.
Zurück zum Zitat Martin, C.H., Mahoney, M.W.: Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. arXiv preprint arXiv:1710.09553 (2017) Martin, C.H., Mahoney, M.W.: Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. arXiv preprint arXiv:​1710.​09553 (2017)
12.
Zurück zum Zitat Mishali, M., Eldar, Y.C.: Blind multiband signal reconstruction: compressed sensing for analog signals. IEEE Trans. Signal Process. 57(3), 993–1009 (2009)MathSciNetCrossRef Mishali, M., Eldar, Y.C.: Blind multiband signal reconstruction: compressed sensing for analog signals. IEEE Trans. Signal Process. 57(3), 993–1009 (2009)MathSciNetCrossRef
13.
Zurück zum Zitat Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications. Cambridge University Press, Cambridge (1993)CrossRef Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications. Cambridge University Press, Cambridge (1993)CrossRef
16.
Zurück zum Zitat Saxe, A.M., Bansal, Y., Dapello, J., Advani, M.: On the information bottleneck theory of deep learning. In: International Conference on Learning Representations (2018) Saxe, A.M., Bansal, Y., Dapello, J., Advani, M.: On the information bottleneck theory of deep learning. In: International Conference on Learning Representations (2018)
18.
Zurück zum Zitat Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017) Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:​1703.​00810 (2017)
19.
Zurück zum Zitat Wu, L., Zhu, Z., Weinan, E.: Towards understanding generalization of deep learning: perspective of loss landscapes. arXiv preprint arXiv:1706.10239 (2017) Wu, L., Zhu, Z., Weinan, E.: Towards understanding generalization of deep learning: perspective of loss landscapes. arXiv preprint arXiv:​1706.​10239 (2017)
20.
Zurück zum Zitat Xu, Z.Q.J.: Frequency principle in deep learning with general loss functions and its potential application. arXiv preprint arXiv:1811.10146 (2018) Xu, Z.Q.J.: Frequency principle in deep learning with general loss functions and its potential application. arXiv preprint arXiv:​1811.​10146 (2018)
21.
Zurück zum Zitat Xu, Z.Q.J., Zhang, Y., Luo, T., Xiao, Y., Ma, Z.: Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523 (2019) Xu, Z.Q.J., Zhang, Y., Luo, T., Xiao, Y., Ma, Z.: Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:​1901.​06523 (2019)
22.
23.
Zurück zum Zitat Yen, J.: On nonuniform sampling of bandwidth-limited signals. IRE Trans. Circuit Theory 3(4), 251–257 (1956)CrossRef Yen, J.: On nonuniform sampling of bandwidth-limited signals. IRE Trans. Circuit Theory 3(4), 251–257 (1956)CrossRef
24.
Zurück zum Zitat Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016) Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:​1611.​03530 (2016)
25.
Zurück zum Zitat Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: Explicitizing an implicit bias of the frequency principle in two-layer neural networks. arXiv:1905.10264 [cs, stat], May 2019 Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: Explicitizing an implicit bias of the frequency principle in two-layer neural networks. arXiv:​1905.​10264 [cs, stat], May 2019
26.
Zurück zum Zitat Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: A type of generalization error induced by initialization in deep neural networks. arXiv:1905.07777 [cs, stat], May 2019 Zhang, Y., Xu, Z.Q.J., Luo, T., Ma, Z.: A type of generalization error induced by initialization in deep neural networks. arXiv:​1905.​07777 [cs, stat], May 2019
27.
Zurück zum Zitat Zhen, H.L., Lin, X., Tang, A.Z., Li, Z., Zhang, Q., Kwong, S.: Nonlinear collaborative scheme for deep neural networks. arXiv preprint arXiv:1811.01316 (2018) Zhen, H.L., Lin, X., Tang, A.Z., Li, Z., Zhang, Q., Kwong, S.: Nonlinear collaborative scheme for deep neural networks. arXiv preprint arXiv:​1811.​01316 (2018)
28.
Zurück zum Zitat Zheng, G., Sang, J., Xu, C.: Understanding deep learning generalization by maximum entropy. arXiv preprint arXiv:1711.07758 (2017) Zheng, G., Sang, J., Xu, C.: Understanding deep learning generalization by maximum entropy. arXiv preprint arXiv:​1711.​07758 (2017)
Metadaten
Titel
Training Behavior of Deep Neural Network in Frequency Domain
verfasst von
Zhi-Qin John Xu
Yaoyu Zhang
Yanyang Xiao
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-36708-4_22

Premium Partner