Top

Published in:

2021 | OriginalPaper | Chapter

Knowledge Distillation with Distribution Mismatch

Authors : Dang Nguyen, Sunil Gupta, Trong Nguyen, Santu Rana, Phuoc Nguyen, Truyen Tran, Ky Le, Shannon Ryan, Svetha Venkatesh

Published in: Machine Learning and Knowledge Discovery in Databases. Research Track

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Knowledge distillation (KD) is one of the most efficient methods to compress a large deep neural network (called teacher) to a smaller network (called student). Current state-of-the-art KD methods assume that the distributions of training data of teacher and student are identical to maintain the student’s accuracy close to the teacher’s accuracy. However, this strong assumption is not met in many real-world applications where the distribution mismatch happens between teacher’s training data and student’s training data. As a result, existing KD methods often fail in this case. To overcome this problem, we propose a novel method for KD process, which is still effective when the distribution mismatch happens. We first learn a distribution based on student’s training data, from which we can sample images well-classified by the teacher. By doing this, we can discover the data space where the teacher has good knowledge to transfer to the student. We then propose a new loss function to train the student network, which achieves better accuracy than the standard KD loss function. We conduct extensive experiments to demonstrate that our method works well for KD tasks with or without distribution mismatch. To the best of our knowledge, our method is the first method addressing the challenge of distribution mismatch when performing KD process.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Learning from Noisy Similar and Dissimilar Data

next chapter Certification of Model Robustness in Active Class Selection

This is possible because we use benchmark datasets, and the training and test splits are fixed.

https://keras.io/api/applications/.

Adriana, R., Nicolas, B., Ebrahimi, S., Antoine, C., Carlo, G., Yoshua, B.: FitNets: hints for thin deep nets. In: ICLR (2015)

Ahn, S., Hu, X., Damianou, A., Lawrence, N., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR, pp. 9163–9171 (2019)

Chawla, A., Yin, H., Molchanov, P., Alvarez, J.: Data-free knowledge distillation for object detection. In: CVPR, pp. 3289–3298 (2021)

Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NIPS, pp. 742–751 (2017)

Chen, H., et al.: Data-free learning of student networks. In: ICCV, pp. 3514–3522 (2019)

Eriksson, D., Pearce, M., Gardner, J., Turner, R., Poloczek, M.: Scalable global optimization via local bayesian optimization. In: NIPS, pp. 5496–5507 (2019)

Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. arXiv preprint arXiv:2006.05525 (2020)

Guo, G., Zhang, N.: A survey on deep learning based face recognition. Comput. Vis. Image Underst. 189, 102805 (2019)

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

10.

Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: network compression via factor transfer. In: NIPS, pp. 2760–2769 (2018)

11.

Lee, S., Song, B.C.: Graph-based knowledge distillation by multi-head attention network. arXiv preprint arXiv:1907.02226 (2019)

12.

Meng, Z., Li, J., Zhao, Y., Gong, Y.: Conditional teacher-student learning. In: ICASSP, pp. 6445–6449. IEEE (2019)

13.

Nayak, G.K., Mopuri, K.R., Chakraborty, A.: Effectiveness of arbitrary transfer sets for data-free knowledge distillation. In: CVPR, pp. 1430–1438 (2021)

14.

Nguyen, D., Gupta, S., Rana, S., Shilton, A., Venkatesh, S.: Bayesian optimization for categorical and category-specific continuous inputs. In: AAAI, pp. 5256–5263 (2020)

15.

Passalis, N., Tzelepi, M., Tefas, A.: Heterogeneous knowledge distillation using information flow modeling. In: CVPR, pp. 2339–2348 (2020)

16.

Pouyanfar, S., et al.: A survey on deep learning: algorithms, techniques, and applications. ACM Comput. Surv. 51(5), 1–36 (2018)CrossRef

17.

Salman, H., Ilyas, A., Engstrom, L., Kapoor, A., Madry, A.: Do adversarially robust ImageNet models transfer better? In: NIPS, pp. 3533–3545 (2020)

18.

Shen, L., Margolies, L., Rothstein, J., Fluder, E., McBride, R., Sieh, W.: Deep learning to improve breast cancer detection on screening mammography. Sci. Rep. 9(1), 1–12 (2019)

19.

Snoek, J., Larochelle, H., Adams, R.: Practical Bayesian optimization of machine learning algorithms. In: NIPS, pp. 2951–2959 (2012)

20.

Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS, pp. 3483–3491 (2015)

21.

Sreenu, G., Durai, S.: Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J. Big Data 6(1), 1–27 (2019)CrossRef

22.

Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)

23.

Wang, D., Li, Y., Wang, L., Gong, B.: Neural networks are more productive teachers than human raters: active mixup for data-efficient knowledge distillation from a blackbox model. In: CVPR, pp. 1498–1507 (2020)

24.

Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: CVPR, pp. 4133–4141 (2017)

25.

Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)

Title: Knowledge Distillation with Distribution Mismatch
Authors: Dang Nguyen
Sunil Gupta
Trong Nguyen
Santu Rana
Phuoc Nguyen
Truyen Tran
Ky Le
Shannon Ryan
Svetha Venkatesh
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Discovery in Databases. Research Track
Print ISBN: 978-3-030-86519-1

Electronic ISBN: 978-3-030-86520-7

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-86520-7_16

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner