Skip to main content
Top
Published in: International Journal of Computer Vision 6/2021

22-03-2021

Knowledge Distillation: A Survey

Authors: Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao

Published in: International Journal of Computer Vision | Issue 6/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
A hint means the output of a teacher’s hidden layer that supervises the student’s learning.
 
Literature
go back to reference Aditya, S., Saha, R., Yang, Y., & Baral, C. (2019). Spatial knowledge distillation to aid visual reasoning. In WACV. Aditya, S., Saha, R., Yang, Y., & Baral, C. (2019). Spatial knowledge distillation to aid visual reasoning. In WACV.
go back to reference Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X., & Guo, E. (2020). Knowledge distillation from internal representations. In AAAI. Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X., & Guo, E. (2020). Knowledge distillation from internal representations. In AAAI.
go back to reference Aguinaldo, A., Chiang, P. Y., Gain, A., Patil, A., Pearson, K., & Feizi, S. (2019). Compressing gans using knowledge distillation. arXiv preprint arXiv:1902.00159. Aguinaldo, A., Chiang, P. Y., Gain, A., Patil, A., Pearson, K., & Feizi, S. (2019). Compressing gans using knowledge distillation. arXiv preprint arXiv:​1902.​00159.
go back to reference Ahn, S., Hu, S., Damianou, A., Lawrence, N. D., & Dai, Z. (2019). Variational information distillation for knowledge transfer. In CVPR. Ahn, S., Hu, S., Damianou, A., Lawrence, N. D., & Dai, Z. (2019). Variational information distillation for knowledge transfer. In CVPR.
go back to reference Albanie, S., Nagrani, A., Vedaldi, A., & Zisserman, A. (2018). Emotion recognition in speech using cross-modal transfer in the wild. In ACM MM. Albanie, S., Nagrani, A., Vedaldi, A., & Zisserman, A. (2018). Emotion recognition in speech using cross-modal transfer in the wild. In ACM MM.
go back to reference Allen-Zhu, Z., Li, Y., & Liang, Y. (2019). Learning and generalization in overparameterized neural networks, going beyond two layers. In NeurIPS. Allen-Zhu, Z., Li, Y., & Liang, Y. (2019). Learning and generalization in overparameterized neural networks, going beyond two layers. In NeurIPS.
go back to reference Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E., & Hinton, G. E. (2018). Large scale distributed neural network training through online distillation. In ICLR. Anil, R., Pereyra, G., Passos, A., Ormandi, R., Dahl, G. E., & Hinton, G. E. (2018). Large scale distributed neural network training through online distillation. In ICLR.
go back to reference Arora, S., Cohen, N., & Hazan, E. (2018). On the optimization of deep networks: Implicit acceleration by overparameterization. In ICML. Arora, S., Cohen, N., & Hazan, E. (2018). On the optimization of deep networks: Implicit acceleration by overparameterization. In ICML.
go back to reference Arora, S., Khapra, M. M., & Ramaswamy, H. G. (2019). On knowledge distillation from complex networks for response prediction. In NAACL-HLT. Arora, S., Khapra, M. M., & Ramaswamy, H. G. (2019). On knowledge distillation from complex networks for response prediction. In NAACL-HLT.
go back to reference Asami, T., Masumura, R., Yamaguchi, Y., Masataki, H., & Aono, Y. (2017). Domain adaptation of dnn acoustic models using knowledge distillation. In ICASSP. Asami, T., Masumura, R., Yamaguchi, Y., Masataki, H., & Aono, Y. (2017). Domain adaptation of dnn acoustic models using knowledge distillation. In ICASSP.
go back to reference Ashok, A., Rhinehart, N., Beainy, F., & Kitani, K. M. (2018). N2N learning: Network to network compression via policy gradient reinforcement learning. In ICLR. Ashok, A., Rhinehart, N., Beainy, F., & Kitani, K. M. (2018). N2N learning: Network to network compression via policy gradient reinforcement learning. In ICLR.
go back to reference Asif, U., Tang, J. & Harrer, S. (2020). Ensemble knowledge distillation for learning improved and efficient networks. In ECAI. Asif, U., Tang, J. & Harrer, S. (2020). Ensemble knowledge distillation for learning improved and efficient networks. In ECAI.
go back to reference Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep? In NeurIPS. Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep? In NeurIPS.
go back to reference Bagherinezhad, H., Horton, M., Rastegari, M., & Farhadi, A. (2018). Label refinery: Improving imagenet classification through label progression. arXiv preprint arXiv:1805.02641. Bagherinezhad, H., Horton, M., Rastegari, M., & Farhadi, A. (2018). Label refinery: Improving imagenet classification through label progression. arXiv preprint arXiv:​1805.​02641.
go back to reference Bai, H., Wu, J., King, I., & Lyu, M. (2020). Few shot network compression via cross distillation. In AAAI. Bai, H., Wu, J., King, I., & Lyu, M. (2020). Few shot network compression via cross distillation. In AAAI.
go back to reference Bai, Y., Yi, J., Tao, J., Tian, Z., & Wen, Z. (2019). Learn spelling from teachers: transferring knowledge from language models to sequence-to-sequence speech recognition. In Interspeech. Bai, Y., Yi, J., Tao, J., Tian, Z., & Wen, Z. (2019). Learn spelling from teachers: transferring knowledge from language models to sequence-to-sequence speech recognition. In Interspeech.
go back to reference Bashivan, P., Tensen, M., & DiCarlo, J. J. (2019). Teacher guided architecture search. In ICCV. Bashivan, P., Tensen, M., & DiCarlo, J. J. (2019). Teacher guided architecture search. In ICCV.
go back to reference Belagiannis, V., Farshad, A., & Galasso, F. (2018). Adversarial network compression. In ECCV. Belagiannis, V., Farshad, A., & Galasso, F. (2018). Adversarial network compression. In ECCV.
go back to reference Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE TPAMI, 35(8), 1798–1828.CrossRef Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE TPAMI, 35(8), 1798–1828.CrossRef
go back to reference Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2020). Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In CVPR. Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2020). Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In CVPR.
go back to reference Bhardwaj, S., Srinivasan, M., & Khapra, M. M. (2019). Efficient video classification using fewer frames. In CVPR. Bhardwaj, S., Srinivasan, M., & Khapra, M. M. (2019). Efficient video classification using fewer frames. In CVPR.
go back to reference Bistritz, I., Mann, A., & Bambos, N. (2020). Distributed Distillation for On-Device Learning. In NeurIPS. Bistritz, I., Mann, A., & Bambos, N. (2020). Distributed Distillation for On-Device Learning. In NeurIPS.
go back to reference Bohdal, O., Yang, Y., & Hospedales, T. (2020). Flexible Dataset Distillation: Learn Labels Instead of Images. arXiv preprint arXiv:2006.08572. Bohdal, O., Yang, Y., & Hospedales, T. (2020). Flexible Dataset Distillation: Learn Labels Instead of Images. arXiv preprint arXiv:​2006.​08572.
go back to reference Boo, Y., Shin, S., Choi, J., & Sung, W. (2021). Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks. In AAAI. Boo, Y., Shin, S., Choi, J., & Sung, W. (2021). Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks. In AAAI.
go back to reference Brutzkus, A., & Globerson, A. (2019). Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. In ICML. Brutzkus, A., & Globerson, A. (2019). Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. In ICML.
go back to reference Bucilua, C., Caruana, R. & Niculescu-Mizil, A. (2006). Model compression. In SIGKDD. Bucilua, C., Caruana, R. & Niculescu-Mizil, A. (2006). Model compression. In SIGKDD.
go back to reference Caccia, M., Rodriguez, P., Ostapenko, O., Normandin, F., Lin, M., Caccia, L., Laradji, I., Rish, I., Lacoste, A., Vazquez D., & Charlin, L. (2020). Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning. In NeurIPS. Caccia, M., Rodriguez, P., Ostapenko, O., Normandin, F., Lin, M., Caccia, L., Laradji, I., Rish, I., Lacoste, A., Vazquez D., & Charlin, L. (2020). Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning. In NeurIPS.
go back to reference Chawla, A., Yin, H., Molchanov, P., & Alvarez, J. (2021). Data-Free Knowledge Distillation for Object Detection. In WACV. Chawla, A., Yin, H., Molchanov, P., & Alvarez, J. (2021). Data-Free Knowledge Distillation for Object Detection. In WACV.
go back to reference Chebotar, Y. & Waters, A. (2016). Distilling knowledge from ensembles of neural networks for speech recognition. In Interspeech. Chebotar, Y. & Waters, A. (2016). Distilling knowledge from ensembles of neural networks for speech recognition. In Interspeech.
go back to reference Chen, D., Mei, J. P., Wang, C., Feng, Y. & Chen, C. (2020a). Online knowledge distillation with diverse peers. In AAAI. Chen, D., Mei, J. P., Wang, C., Feng, Y. & Chen, C. (2020a). Online knowledge distillation with diverse peers. In AAAI.
go back to reference Chen, D., Mei, J. P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., & Chen, C. (2021). Cross-layer distillation with semantic calibration. In AAAI. Chen, D., Mei, J. P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., & Chen, C. (2021). Cross-layer distillation with semantic calibration. In AAAI.
go back to reference Chen, G., Choi, W., Yu, X., Han, T., & Chandraker, M. (2017). Learning efficient object detection models with knowledge distillation. In NeurIPS. Chen, G., Choi, W., Yu, X., Han, T., & Chandraker, M. (2017). Learning efficient object detection models with knowledge distillation. In NeurIPS.
go back to reference Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C.,&Tian, Q. (2019a). Data-free learning of student networks. In ICCV. Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., Xu, C., Xu, C.,&Tian, Q. (2019a). Data-free learning of student networks. In ICCV.
go back to reference Chen, H., Wang, Y., Xu, C., Xu, C., & Tao, D. (2021). Learning student networks via feature embedding. IEEE TNNLS, 32(1), 25–35. Chen, H., Wang, Y., Xu, C., Xu, C., & Tao, D. (2021). Learning student networks via feature embedding. IEEE TNNLS, 32(1), 25–35.
go back to reference Chen, T., Goodfellow, I. & Shlens, J. (2016). Net2net: Accelerating learning via knowledge transfer. In ICLR. Chen, T., Goodfellow, I. & Shlens, J. (2016). Net2net: Accelerating learning via knowledge transfer. In ICLR.
go back to reference Chen, W. C., Chang, C. C. & Lee, C. R. (2018a). Knowledge distillation with feature maps for image classification. In ACCV. Chen, W. C., Chang, C. C. & Lee, C. R. (2018a). Knowledge distillation with feature maps for image classification. In ACCV.
go back to reference Chen, X., Zhang, Y., Xu, H., Qin, Z., & Zha, H. (2018b). Adversarial distillation for efficient recommendation with external knowledge. ACM TOIS, 37(1), 1–28.CrossRef Chen, X., Zhang, Y., Xu, H., Qin, Z., & Zha, H. (2018b). Adversarial distillation for efficient recommendation with external knowledge. ACM TOIS, 37(1), 1–28.CrossRef
go back to reference Chen, X., Su, J., & Zhang, J. (2019b). A two-teacher tramework for knowledge distillation. In ISNN. Chen, X., Su, J., & Zhang, J. (2019b). A two-teacher tramework for knowledge distillation. In ISNN.
go back to reference Chen, Y., Wang, N., & Zhang, Z. (2018c). Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In AAAI. Chen, Y., Wang, N., & Zhang, Z. (2018c). Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In AAAI.
go back to reference Chen, Y. C., Gan, Z., Cheng, Y., Liu, J., & Liu, J. (2020b). Distilling knowledge learned in BERT for text generation. In ACL. Chen, Y. C., Gan, Z., Cheng, Y., Liu, J., & Liu, J. (2020b). Distilling knowledge learned in BERT for text generation. In ACL.
go back to reference Chen, Y. C., Lin, Y. Y., Yang, M. H., Huang, J. B. (2019c). Crdoco: Pixel-level domain transfer with cross-domain consistency. In CVPR. Chen, Y. C., Lin, Y. Y., Yang, M. H., Huang, J. B. (2019c). Crdoco: Pixel-level domain transfer with cross-domain consistency. In CVPR.
go back to reference Chen, Z., & Liu, B. (2018). Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3), 1–207.CrossRef Chen, Z., & Liu, B. (2018). Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3), 1–207.CrossRef
go back to reference Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., & Heng, P. A. (2020c). A multi-task mean teacher for semi-supervised shadow detection. In CVPR. Chen, Z., Zhu, L., Wan, L., Wang, S., Feng, W., & Heng, P. A. (2020c). A multi-task mean teacher for semi-supervised shadow detection. In CVPR.
go back to reference Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1), 126–136.CrossRef Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1), 126–136.CrossRef
go back to reference Cheng, X., Rao, Z., Chen, Y., & Zhang, Q. (2020). Explaining knowledge distillation by quantifying the knowledge. In CVPR. Cheng, X., Rao, Z., Chen, Y., & Zhang, Q. (2020). Explaining knowledge distillation by quantifying the knowledge. In CVPR.
go back to reference Cho, J. H. & Hariharan, B. (2019). On the efficacy of knowledge distillation. In ICCV. Cho, J. H. & Hariharan, B. (2019). On the efficacy of knowledge distillation. In ICCV.
go back to reference Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In CVPR. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In CVPR.
go back to reference Chung, I., Park, S., Kim, J. & Kwak, N. (2020). Feature-map-level online adversarial knowledge distillation. In ICML. Chung, I., Park, S., Kim, J. & Kwak, N. (2020). Feature-map-level online adversarial knowledge distillation. In ICML.
go back to reference Clark, K., Luong, M. T., Khandelwal, U., Manning, C. D. & Le, Q. V. (2019). Bam! born-again multi-task networks for natural language understanding. In ACL. Clark, K., Luong, M. T., Khandelwal, U., Manning, C. D. & Le, Q. V. (2019). Bam! born-again multi-task networks for natural language understanding. In ACL.
go back to reference Courbariaux, M., Bengio, Y. & David, J. P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. In NeurIPS. Courbariaux, M., Bengio, Y. & David, J. P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. In NeurIPS.
go back to reference Crowley, E. J., Gray, G. & Storkey, A. J. (2018). Moonshine: Distilling with cheap convolutions. In NeurIPS. Crowley, E. J., Gray, G. & Storkey, A. J. (2018). Moonshine: Distilling with cheap convolutions. In NeurIPS.
go back to reference Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K., et al. (2017). Knowledge distillation across ensembles of multilingual models for low-resource languages. In ICASSP. Cui, J., Kingsbury, B., Ramabhadran, B., Saon, G., Sercu, T., Audhkhasi, K., et al. (2017). Knowledge distillation across ensembles of multilingual models for low-resource languages. In ICASSP.
go back to reference Cui, Z., Song, T., Wang, Y., & Ji, Q. (2020). Knowledge augmented deep neural networks for joint facial expression and action unit recognition. In NeurIPS. Cui, Z., Song, T., Wang, Y., & Ji, Q. (2020). Knowledge augmented deep neural networks for joint facial expression and action unit recognition. In NeurIPS.
go back to reference Cun, X., & Pun, C. M. (2020). Defocus blur detection via depth distillation. In ECCV. Cun, X., & Pun, C. M. (2020). Defocus blur detection via depth distillation. In ECCV.
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
go back to reference Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In NeurIPS. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y. & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In NeurIPS.
go back to reference Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT . Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT .
go back to reference Do, T., Do, T. T., Tran, H., Tjiputra, E. & Tran, Q. D. (2019). Compact trilinear interaction for visual question answering. In ICCV. Do, T., Do, T. T., Tran, H., Tjiputra, E. & Tran, Q. D. (2019). Compact trilinear interaction for visual question answering. In ICCV.
go back to reference Dong, X. & Yang, Y. (2019). Teacher supervises students how to learn from partially labeled images for facial landmark detection. In ICCV. Dong, X. & Yang, Y. (2019). Teacher supervises students how to learn from partially labeled images for facial landmark detection. In ICCV.
go back to reference Dou, Q., Liu, Q., Heng, P. A., & Glocker, B. (2020). Unpaired multi-modal segmentation via knowledge distillation. IEEE TMI, 39(7), 2415–2425. Dou, Q., Liu, Q., Heng, P. A., & Glocker, B. (2020). Unpaired multi-modal segmentation via knowledge distillation. IEEE TMI, 39(7), 2415–2425.
go back to reference Du, S., You, S., Li, X., Wu, J., Wang, F., Qian, C., & Zhang, C. (2020). Agree to disagree: Adaptive ensemble knowledge distillation in gradient space. In NeurIPS. Du, S., You, S., Li, X., Wu, J., Wang, F., Qian, C., & Zhang, C. (2020). Agree to disagree: Adaptive ensemble knowledge distillation in gradient space. In NeurIPS.
go back to reference Duong, C. N., Luu, K., Quach, K. G. & Le, N. (2019.) ShrinkTeaNet: Million-scale lightweight face recognition via shrinking teacher-student networks. arXiv preprint arXiv:1905.10620. Duong, C. N., Luu, K., Quach, K. G. & Le, N. (2019.) ShrinkTeaNet: Million-scale lightweight face recognition via shrinking teacher-student networks. arXiv preprint arXiv:​1905.​10620.
go back to reference Fakoor, R., Mueller, J. W., Erickson, N., Chaudhari, P., & Smola, A. J. (2020). Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation. In NeurIPS. Fakoor, R., Mueller, J. W., Erickson, N., Chaudhari, P., & Smola, A. J. (2020). Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation. In NeurIPS.
go back to reference Flennerhag, S., Moreno, P. G., Lawrence, N. D. & Damianou, A. (2019). Transferring knowledge across learning processes. In ICLR. Flennerhag, S., Moreno, P. G., Lawrence, N. D. & Damianou, A. (2019). Transferring knowledge across learning processes. In ICLR.
go back to reference Freitag, M., Al-Onaizan, Y. & Sankaran, B. (2017). Ensemble distillation for neural machine translation. arXiv preprint arXiv:1702.01802. Freitag, M., Al-Onaizan, Y. & Sankaran, B. (2017). Ensemble distillation for neural machine translation. arXiv preprint arXiv:​1702.​01802.
go back to reference Fu, H., Zhou, S., Yang, Q., Tang, J., Liu, G., Liu, K., & Li, X. (2021). LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding. In AAAI. Fu, H., Zhou, S., Yang, Q., Tang, J., Liu, G., Liu, K., & Li, X. (2021). LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding. In AAAI.
go back to reference Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J. & Ramabhadran, B. (2017). Efficient knowledge distillation from an ensemble of teachers. In Interspeech. Fukuda, T., Suzuki, M., Kurata, G., Thomas, S., Cui, J. & Ramabhadran, B. (2017). Efficient knowledge distillation from an ensemble of teachers. In Interspeech.
go back to reference Furlanello, T., Lipton, Z., Tschannen, M., Itti, L. & Anandkumar, A. (2018). Born again neural networks. In ICML. Furlanello, T., Lipton, Z., Tschannen, M., Itti, L. & Anandkumar, A. (2018). Born again neural networks. In ICML.
go back to reference Gao, L., Mi, H., Zhu, B., Feng, D., Li, Y., & Peng, Y. (2019). An adversarial feature distillation method for audio classification. IEEE Access, 7, 105319–105330.CrossRef Gao, L., Mi, H., Zhu, B., Feng, D., Li, Y., & Peng, Y. (2019). An adversarial feature distillation method for audio classification. IEEE Access, 7, 105319–105330.CrossRef
go back to reference Gao, M., Wang, Y., & Wan, L. (2021). Residual error based knowledge distillation. Neurocomputing, 433, 154–161.CrossRef Gao, M., Wang, Y., & Wan, L. (2021). Residual error based knowledge distillation. Neurocomputing, 433, 154–161.CrossRef
go back to reference Gao, Z., Chung, J., Abdelrazek, M., Leung, S., Hau, W. K., Xian, Z., et al. (2020). Privileged modality distillation for vessel border detection in intracoronary imaging. IEEE TMI, 39(5), 1524–1534. Gao, Z., Chung, J., Abdelrazek, M., Leung, S., Hau, W. K., Xian, Z., et al. (2020). Privileged modality distillation for vessel border detection in intracoronary imaging. IEEE TMI, 39(5), 1524–1534.
go back to reference Garcia, N. C., Morerio, P. & Murino, V. (2018). Modality distillation with multiple stream networks for action recognition. In ECCV. Garcia, N. C., Morerio, P. & Murino, V. (2018). Modality distillation with multiple stream networks for action recognition. In ECCV.
go back to reference Ge, S., Zhao, S., Li, C., & Li, J. (2018). Low-resolution face recognition in the wild via selective knowledge distillation. IEEE TIP, 28(4), 2051–2062.MathSciNet Ge, S., Zhao, S., Li, C., & Li, J. (2018). Low-resolution face recognition in the wild via selective knowledge distillation. IEEE TIP, 28(4), 2051–2062.MathSciNet
go back to reference Ge, S., Zhao, S., Li, C., Zhang, Y., & Li, J. (2020). Efficient low-resolution face recognition via bridge distillation. IEEE TIP, 29, 6898–6908. Ge, S., Zhao, S., Li, C., Zhang, Y., & Li, J. (2020). Efficient low-resolution face recognition via bridge distillation. IEEE TIP, 29, 6898–6908.
go back to reference Ghorbani, S., Bulut, A. E. & Hansen, J. H. (2018). Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm. In SLTW. Ghorbani, S., Bulut, A. E. & Hansen, J. H. (2018). Advancing multi-accented lstm-ctc speech recognition using a domain specific student-teacher learning paradigm. In SLTW.
go back to reference Gil, Y., Chai, Y., Gorodissky, O. & Berant, J. (2019). White-to-black: Efficient distillation of black-box adversarial attacks. In NAACL-HLT. Gil, Y., Chai, Y., Gorodissky, O. & Berant, J. (2019). White-to-black: Efficient distillation of black-box adversarial attacks. In NAACL-HLT.
go back to reference Goldblum, M., Fowl, L., Feizi, S. & Goldstein, T. (2020). Adversarially robust distillation. In AAAI. Goldblum, M., Fowl, L., Feizi, S. & Goldstein, T. (2020). Adversarially robust distillation. In AAAI.
go back to reference Gong, C., Chang, X., Fang, M. & Yang, J. (2018). Teaching semi-supervised classifier via generalized distillation. In IJCAI. Gong, C., Chang, X., Fang, M. & Yang, J. (2018). Teaching semi-supervised classifier via generalized distillation. In IJCAI.
go back to reference Gong, C., Tao, D., Liu, W., Liu, L., & Yang, J. (2017). Label propagation via teaching-to-learn and learning-to-teach. TNNLS, 28(6), 1452–1465. Gong, C., Tao, D., Liu, W., Liu, L., & Yang, J. (2017). Label propagation via teaching-to-learn and learning-to-teach. TNNLS, 28(6), 1452–1465.
go back to reference Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NeurIPS.
go back to reference Gordon, M. A. & Duh, K. (2019). Explaining sequence-level knowledge distillation as data-augmentation for neural machine translation. arXiv preprint arXiv:1912.03334. Gordon, M. A. & Duh, K. (2019). Explaining sequence-level knowledge distillation as data-augmentation for neural machine translation. arXiv preprint arXiv:​1912.​03334.
go back to reference Gu, J., & Tresp, V. (2020). Search for better students to learn distilled knowledge. In ECAI. Gu, J., & Tresp, V. (2020). Search for better students to learn distilled knowledge. In ECAI.
go back to reference Guan, Y., Zhao, P., Wang, B., Zhang, Y., Yao, C., Bian, K., & Tang, J. (2020). Differentiable feature aggregation search for knowledge distillation. In ECCV. Guan, Y., Zhao, P., Wang, B., Zhang, Y., Yao, C., Bian, K., & Tang, J. (2020). Differentiable feature aggregation search for knowledge distillation. In ECCV.
go back to reference Guo, Q., Wang, X., Wu, Y., Yu, Z., Liang, D., Hu, X., & Luo, P. (2020). Online knowledge distillation via collaborative learning. In CVPR. Guo, Q., Wang, X., Wu, Y., Yu, Z., Liang, D., Hu, X., & Luo, P. (2020). Online knowledge distillation via collaborative learning. In CVPR.
go back to reference Gupta, S., Hoffman, J. & Malik, J. (2016). Cross modal distillation for supervision transfer. In CVPR. Gupta, S., Hoffman, J. & Malik, J. (2016). Cross modal distillation for supervision transfer. In CVPR.
go back to reference Hahn, S. & Choi, H. (2019). Self-knowledge distillation in natural language processing. In RANLP. Hahn, S. & Choi, H. (2019). Self-knowledge distillation in natural language processing. In RANLP.
go back to reference Haidar, M. A. & Rezagholizadeh, M. (2019). Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Canadian conference on artificial intelligence. Haidar, M. A. & Rezagholizadeh, M. (2019). Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Canadian conference on artificial intelligence.
go back to reference Han, S., Pool, J., Tran, J. & Dally, W. (2015). Learning both weights and connections for efficient neural network. In NeurIPS. Han, S., Pool, J., Tran, J. & Dally, W. (2015). Learning both weights and connections for efficient neural network. In NeurIPS.
go back to reference Hao, W., & Zhang, Z. (2019). Spatiotemporal distilled dense-connectivity network for video action recognition. Pattern Recognition, 92, 13–24.CrossRef Hao, W., & Zhang, Z. (2019). Spatiotemporal distilled dense-connectivity network for video action recognition. Pattern Recognition, 92, 13–24.CrossRef
go back to reference Haroush, M., Hubara, I., Hoffer, E., & Soudry, D. (2020). The knowledge within: Methods for data-free model compression. In CVPR. Haroush, M., Hubara, I., Hoffer, E., & Soudry, D. (2020). The knowledge within: Methods for data-free model compression. In CVPR.
go back to reference He, C., Annavaram, M., & Avestimehr, S. (2020a). Group knowledge transfer: Federated learning of large CNNs at the edge. In NeurIPS. He, C., Annavaram, M., & Avestimehr, S. (2020a). Group knowledge transfer: Federated learning of large CNNs at the edge. In NeurIPS.
go back to reference He, F., Liu, T., & Tao, D. (2020b). Why resnet works? residuals generalize. IEEE TNNLS, 31(12), 5349–5362.MathSciNet He, F., Liu, T., & Tao, D. (2020b). Why resnet works? residuals generalize. IEEE TNNLS, 31(12), 5349–5362.MathSciNet
go back to reference He, K., Zhang, X., Ren, S. & Sun, J. (2016) Deep residual learning for image recognition. In CVPR. He, K., Zhang, X., Ren, S. & Sun, J. (2016) Deep residual learning for image recognition. In CVPR.
go back to reference He, T., Shen, C., Tian, Z., Gong, D., Sun, C. & Yan, Y. (2019). Knowledge adaptation for efficient semantic segmentation. In CVPR. He, T., Shen, C., Tian, Z., Gong, D., Sun, C. & Yan, Y. (2019). Knowledge adaptation for efficient semantic segmentation. In CVPR.
go back to reference Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., & Choi, J. Y. (2019a). A comprehensive overhaul of feature distillation. In ICCV. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., & Choi, J. Y. (2019a). A comprehensive overhaul of feature distillation. In ICCV.
go back to reference Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019b). Knowledge distillation with adversarial samples supporting decision boundary. In AAAI. Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019b). Knowledge distillation with adversarial samples supporting decision boundary. In AAAI.
go back to reference Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019c). Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In AAAI. Heo, B., Lee, M., Yun, S. & Choi, J. Y. (2019c). Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In AAAI.
go back to reference Hoffman, J., Gupta, S. & Darrell, T. (2016). Learning with side information through modality hallucination. In CVPR. Hoffman, J., Gupta, S. & Darrell, T. (2016). Learning with side information through modality hallucination. In CVPR.
go back to reference Hou, Y., Ma, Z., Liu, C. & Loy, CC. (2019). Learning lightweight lane detection cnns by self attention distillation. In ICCV. Hou, Y., Ma, Z., Liu, C. & Loy, CC. (2019). Learning lightweight lane detection cnns by self attention distillation. In ICCV.
go back to reference Hou, Y., Ma, Z., Liu, C., Hui, T. W., & Loy, C. C. (2020). Inter-Region Affinity Distillation for Road Marking Segmentation. In CVPR. Hou, Y., Ma, Z., Liu, C., Hui, T. W., & Loy, C. C. (2020). Inter-Region Affinity Distillation for Road Marking Segmentation. In CVPR.
go back to reference Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861.
go back to reference Hu, H., Xie, L., Hong, R., & Tian, Q. (2020). Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In CVPR. Hu, H., Xie, L., Hong, R., & Tian, Q. (2020). Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In CVPR.
go back to reference Hu, M., Peng, Y., Wei, F., Huang, Z., Li, D., Yang, N., et al. (2018). Attention-guided answer distillation for machine reading comprehension. In EMNLP. Hu, M., Peng, Y., Wei, F., Huang, Z., Li, D., Yang, N., et al. (2018). Attention-guided answer distillation for machine reading comprehension. In EMNLP.
go back to reference Huang, G., Liu, Z., Van, Der Maaten, L. & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR. Huang, G., Liu, Z., Van, Der Maaten, L. & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.
go back to reference Huang, M., You, Y., Chen, Z., Qian, Y. & Yu, K. (2018). Knowledge distillation for sequence model. In Interspeech. Huang, M., You, Y., Chen, Z., Qian, Y. & Yu, K. (2018). Knowledge distillation for sequence model. In Interspeech.
go back to reference Huang, Z., Zou, Y., Bhagavatula, V., & Huang, D. (2020). Comprehensive attention self-distillation for weakly-supervised object detection. In NeurIPS. Huang, Z., Zou, Y., Bhagavatula, V., & Huang, D. (2020). Comprehensive attention self-distillation for weakly-supervised object detection. In NeurIPS.
go back to reference Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML
go back to reference Jang, Y., Lee, H., Hwang, S. J. & Shin, J. (2019). Learning what and where to transfer. In ICML. Jang, Y., Lee, H., Hwang, S. J. & Shin, J. (2019). Learning what and where to transfer. In ICML.
go back to reference Ji, G., & Zhu, Z. (2020). Knowledge distillation in wide neural networks: Risk bound, data efficiency and imperfect teacher. In NeurIPS. Ji, G., & Zhu, Z. (2020). Knowledge distillation in wide neural networks: Risk bound, data efficiency and imperfect teacher. In NeurIPS.
go back to reference Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., et al. (2020). Tinybert: Distilling bert for natural language understanding. In EMNLP. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., et al. (2020). Tinybert: Distilling bert for natural language understanding. In EMNLP.
go back to reference Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J., Liang, D., Yan, J., & Hu, X. (2019). Knowledge distillation via route constrained optimization. In ICCV. Jin, X., Peng, B., Wu, Y., Liu, Y., Liu, J., Liang, D., Yan, J., & Hu, X. (2019). Knowledge distillation via route constrained optimization. In ICCV.
go back to reference Kang, M., Mun, J. & Han, B. (2020). Towards oracle knowledge distillation with neural architecture search. In AAAI. Kang, M., Mun, J. & Han, B. (2020). Towards oracle knowledge distillation with neural architecture search. In AAAI.
go back to reference Kim, J., Park, S. & Kwak, N. (2018). Paraphrasing complex network: Network compression via factor transfer. In NeurIPS. Kim, J., Park, S. & Kwak, N. (2018). Paraphrasing complex network: Network compression via factor transfer. In NeurIPS.
go back to reference Kim, J., Bhalgat, Y., Lee, J., Patel, C., & Kwak, N. (2019a). QKD: Quantization-aware Knowledge Distillation. arXiv preprint arXiv:1911.12491. Kim, J., Bhalgat, Y., Lee, J., Patel, C., & Kwak, N. (2019a). QKD: Quantization-aware Knowledge Distillation. arXiv preprint arXiv:​1911.​12491.
go back to reference Kim, J., Hyun, M., Chung, I. & Kwak, N. (2019b). Feature fusion for online mutual knowledge distillation. In ICPR. Kim, J., Hyun, M., Chung, I. & Kwak, N. (2019b). Feature fusion for online mutual knowledge distillation. In ICPR.
go back to reference Kim, S. W. & Kim, H. E. (2017). Transferring knowledge to smaller network with class-distance loss. In ICLRW. Kim, S. W. & Kim, H. E. (2017). Transferring knowledge to smaller network with class-distance loss. In ICLRW.
go back to reference Kim, Y., Rush & A. M. (2016). Sequence-level knowledge distillation. In EMNLP. Kim, Y., Rush & A. M. (2016). Sequence-level knowledge distillation. In EMNLP.
go back to reference Kimura, A., Ghahramani, Z., Takeuchi, K., Iwata, T. & Ueda, N. (2018). Few-shot learning of neural networks from scratch by pseudo example optimization. In BMVC. Kimura, A., Ghahramani, Z., Takeuchi, K., Iwata, T. & Ueda, N. (2018). Few-shot learning of neural networks from scratch by pseudo example optimization. In BMVC.
go back to reference Kwon, K., Na, H., Lee, H., & Kim, N. S. (2020). Adaptive knowledge distillation based on entropy. In ICASSP. Kwon, K., Na, H., Lee, H., & Kim, N. S. (2020). Adaptive knowledge distillation based on entropy. In ICASSP.
go back to reference Kong, H., Zhao, J., Tu, X., Xing, J., Shen, S. & Feng, J. (2019). Cross-resolution face recognition via prior-aided face hallucination and residual knowledge distillation. arXiv preprint arXiv:1905.10777. Kong, H., Zhao, J., Tu, X., Xing, J., Shen, S. & Feng, J. (2019). Cross-resolution face recognition via prior-aided face hallucination and residual knowledge distillation. arXiv preprint arXiv:​1905.​10777.
go back to reference Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
go back to reference Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS. Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NeurIPS.
go back to reference Kuncoro, A., Ballesteros, M., Kong, L., Dyer, C. & Smith, N. A. (2016). Distilling an ensemble of greedy dependency parsers into one mst parser. In EMNLP. Kuncoro, A., Ballesteros, M., Kong, L., Dyer, C. & Smith, N. A. (2016). Distilling an ensemble of greedy dependency parsers into one mst parser. In EMNLP.
go back to reference Kundu, J. N., Lakkakula, N. & Babu, R. V. (2019). Um-adapt: Unsupervised multi-task adaptation using adversarial cross-task distillation. In CVPR. Kundu, J. N., Lakkakula, N. & Babu, R. V. (2019). Um-adapt: Unsupervised multi-task adaptation using adversarial cross-task distillation. In CVPR.
go back to reference Lai, K. H., Zha, D., Li, Y., & Hu, X. (2020). Dual policy distillation. In IJCAI. Lai, K. H., Zha, D., Li, Y., & Hu, X. (2020). Dual policy distillation. In IJCAI.
go back to reference Lan, X., Zhu, X., & Gong, S. (2018). Self-referenced deep learning. In ACCV. Lan, X., Zhu, X., & Gong, S. (2018). Self-referenced deep learning. In ACCV.
go back to reference Lee, H., Hwang, S. J. & Shin, J. (2019a). Rethinking data augmentation: Self-supervision and self-distillation. arXiv preprint arXiv:1910.05872. Lee, H., Hwang, S. J. & Shin, J. (2019a). Rethinking data augmentation: Self-supervision and self-distillation. arXiv preprint arXiv:​1910.​05872.
go back to reference Lee, K., Lee, K., Shin, J. & Lee, H. (2019b). Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV. Lee, K., Lee, K., Shin, J. & Lee, H. (2019b). Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV.
go back to reference Lee, K., Nguyen, L. T. & Shim, B. (2019c). Stochasticity and skip connections improve knowledge transfer. In AAAI. Lee, K., Nguyen, L. T. & Shim, B. (2019c). Stochasticity and skip connections improve knowledge transfer. In AAAI.
go back to reference Lee, S. & Song, B. (2019). Graph-based knowledge distillation by multi-head attention network. In BMVC. Lee, S. & Song, B. (2019). Graph-based knowledge distillation by multi-head attention network. In BMVC.
go back to reference Lee, S. H., Kim, D. H. & Song, B. C. (2018). Self-supervised knowledge distillation using singular value decomposition. In ECCV. Lee, S. H., Kim, D. H. & Song, B. C. (2018). Self-supervised knowledge distillation using singular value decomposition. In ECCV.
go back to reference Li, B., Wang, Z., Liu, H., Du, Q., Xiao, T., Zhang, C., & Zhu, J. (2021). Learning light-weight translation models from deep transformer. In AAAI. Li, B., Wang, Z., Liu, H., Du, Q., Xiao, T., Zhang, C., & Zhu, J. (2021). Learning light-weight translation models from deep transformer. In AAAI.
go back to reference Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., & Chang, X. (2020a). Blockwisely supervised neural architecture search with knowledge distillation. In CVPR. Li, C., Peng, J., Yuan, L., Wang, G., Liang, X., Lin, L., & Chang, X. (2020a). Blockwisely supervised neural architecture search with knowledge distillation. In CVPR.
go back to reference Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., & Zhang, T. (2020b). Residual distillation: Towards portable deep neural networks without shortcuts. In NeurIPS. Li, G., Zhang, J., Wang, Y., Liu, C., Tan, M., Lin, Y., Zhang, W., Feng, J., & Zhang, T. (2020b). Residual distillation: Towards portable deep neural networks without shortcuts. In NeurIPS.
go back to reference Li, J., Fu, K., Zhao, S., & Ge, S. (2019). Spatiotemporal knowledge distillation for efficient estimation of aerial video saliency. IEEE TIP, 29, 1902–1914.MathSciNet Li, J., Fu, K., Zhao, S., & Ge, S. (2019). Spatiotemporal knowledge distillation for efficient estimation of aerial video saliency. IEEE TIP, 29, 1902–1914.MathSciNet
go back to reference Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J. Y., & Han, S. (2020c). Gan compression: Efficient architectures for interactive conditional gans. In CVPR. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J. Y., & Han, S. (2020c). Gan compression: Efficient architectures for interactive conditional gans. In CVPR.
go back to reference Li, Q., Jin, S. & Yan, J. (2017). Mimicking very efficient network for object detection. In CVPR. Li, Q., Jin, S. & Yan, J. (2017). Mimicking very efficient network for object detection. In CVPR.
go back to reference Li, T., Li, J., Liu, Z., & Zhang, C. (2020d). Few sample knowledge distillation for efficient network compression. In CVPR. Li, T., Li, J., Liu, Z., & Zhang, C. (2020d). Few sample knowledge distillation for efficient network compression. In CVPR.
go back to reference Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., & Qian, C. (2020e). Local correlation consistency for knowledge distillation. In ECCV. Li, X., Wu, J., Fang, H., Liao, Y., Wang, F., & Qian, C. (2020e). Local correlation consistency for knowledge distillation. In ECCV.
go back to reference Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE TPAMI, 40(12), 2935–2947.CrossRef Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE TPAMI, 40(12), 2935–2947.CrossRef
go back to reference Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. In NeurIPS. Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. In NeurIPS.
go back to reference Liu, I. J., Peng, J. & Schwing, A. G. (2019a). Knowledge flow: Improve upon your teachers. In ICLR. Liu, I. J., Peng, J. & Schwing, A. G. (2019a). Knowledge flow: Improve upon your teachers. In ICLR.
go back to reference Liu, J., Chen, Y. & Liu, K. (2019b). Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. In AAAI. Liu, J., Chen, Y. & Liu, K. (2019b). Exploiting the ground-truth: An adversarial imitation based knowledge distillation approach for event detection. In AAAI.
go back to reference Liu, J., Wen, D., Gao, H., Tao, W., Chen, T. W., Osa, K., et al. (2019c). Knowledge representing: efficient, sparse representation of prior knowledge for knowledge distillation. In CVPRW. Liu, J., Wen, D., Gao, H., Tao, W., Chen, T. W., Osa, K., et al. (2019c). Knowledge representing: efficient, sparse representation of prior knowledge for knowledge distillation. In CVPRW.
go back to reference Liu, P., King, I., Lyu, M. R., & Xu, J. (2019d). DDFlow: Learning optical flow with unlabeled data distillation. In AAAI. Liu, P., King, I., Lyu, M. R., & Xu, J. (2019d). DDFlow: Learning optical flow with unlabeled data distillation. In AAAI.
go back to reference Liu, P., Liu, W., Ma, H., Mei, T. & Seok, M. (2020a). Ktan: knowledge transfer adversarial network. In IJCNN. Liu, P., Liu, W., Ma, H., Mei, T. & Seok, M. (2020a). Ktan: knowledge transfer adversarial network. In IJCNN.
go back to reference Liu, Q., Xie, L., Wang, H., Yuille & A. L. (2019e). Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In ICCV. Liu, Q., Xie, L., Wang, H., Yuille & A. L. (2019e). Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In ICCV.
go back to reference Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., & Ju, Q. (2020b). FastBERT: a self-distilling BERT with adaptive inference time. In ACL. Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., & Ju, Q. (2020b). FastBERT: a self-distilling BERT with adaptive inference time. In ACL.
go back to reference Liu, X., Wang, X. & Matwin, S. (2018b). Improving the interpretability of deep neural networks with knowledge distillation. In ICDMW. Liu, X., Wang, X. & Matwin, S. (2018b). Improving the interpretability of deep neural networks with knowledge distillation. In ICDMW.
go back to reference Liu, X., He, P., Chen, W. & Gao, J. (2019f). Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv preprint arXiv:1904.09482. Liu, X., He, P., Chen, W. & Gao, J. (2019f). Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv preprint arXiv:​1904.​09482.
go back to reference Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y. & Duan, Y. (2019g). Knowledge distillation via instance relationship graph. In CVPR. Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y. & Duan, Y. (2019g). Knowledge distillation via instance relationship graph. In CVPR.
go back to reference Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z. & Wang, J. (2019h). Structured knowledge distillation for semantic segmentation. In CVPR. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z. & Wang, J. (2019h). Structured knowledge distillation for semantic segmentation. In CVPR.
go back to reference Liu, Y., Jia, X., Tan, M., Vemulapalli, R., Zhu, Y., Green, B., et al. (2019i). Search to distill: Pearls are everywhere but not the eyes. In CVPR. Liu, Y., Jia, X., Tan, M., Vemulapalli, R., Zhu, Y., Green, B., et al. (2019i). Search to distill: Pearls are everywhere but not the eyes. In CVPR.
go back to reference Liu, Y., Zhang, W., & Wang, J. (2020c). Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing, 415, 106–113.CrossRef Liu, Y., Zhang, W., & Wang, J. (2020c). Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing, 415, 106–113.CrossRef
go back to reference Lopes, R. G., Fenu, S. & Starner, T. (2017). Data-free knowledge distillation for deep neural networks. In NeurIPS. Lopes, R. G., Fenu, S. & Starner, T. (2017). Data-free knowledge distillation for deep neural networks. In NeurIPS.
go back to reference Lopez-Paz, D., Bottou, L., Schölkopf, B. & Vapnik, V. (2016). Unifying distillation and privileged information. In ICLR. Lopez-Paz, D., Bottou, L., Schölkopf, B. & Vapnik, V. (2016). Unifying distillation and privileged information. In ICLR.
go back to reference Lu, L., Guo, M. & Renals, S. (2017). Knowledge distillation for small-footprint highway networks. In ICASSP. Lu, L., Guo, M. & Renals, S. (2017). Knowledge distillation for small-footprint highway networks. In ICASSP.
go back to reference Luo, P., Zhu, Z., Liu, Z., Wang, X. & Tang, X. (2016). Face model compression by distilling knowledge from neurons. In AAAI. Luo, P., Zhu, Z., Liu, Z., Wang, X. & Tang, X. (2016). Face model compression by distilling knowledge from neurons. In AAAI.
go back to reference Luo, S., Pan, W., Wang, X., Wang, D., Tang, H., & Song, M. (2020). Collaboration by competition: Self-coordinated knowledge amalgamation for multi-talent student learning. In ECCV. Luo, S., Pan, W., Wang, X., Wang, D., Tang, H., & Song, M. (2020). Collaboration by competition: Self-coordinated knowledge amalgamation for multi-talent student learning. In ECCV.
go back to reference Luo, S., Wang, X., Fang, G., Hu, Y., Tao, D., & Song, M. (2019). Knowledge amalgamation from heterogeneous networks by common feature learning. In IJCAI. Luo, S., Wang, X., Fang, G., Hu, Y., Tao, D., & Song, M. (2019). Knowledge amalgamation from heterogeneous networks by common feature learning. In IJCAI.
go back to reference Luo, Z., Hsieh, J. T., Jiang, L., Carlos Niebles, J. & Fei-Fei, L. (2018). Graph distillation for action detection with privileged modalities. In ECCV. Luo, Z., Hsieh, J. T., Jiang, L., Carlos Niebles, J. & Fei-Fei, L. (2018). Graph distillation for action detection with privileged modalities. In ECCV.
go back to reference Macko, V., Weill, C., Mazzawi, H. & Gonzalvo, J. (2019). Improving neural architecture search image classifiers via ensemble learning. In NeurIPS workshop. Macko, V., Weill, C., Mazzawi, H. & Gonzalvo, J. (2019). Improving neural architecture search image classifiers via ensemble learning. In NeurIPS workshop.
go back to reference Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient CNN architecture design. In ECCV. Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient CNN architecture design. In ECCV.
go back to reference Meng, Z., Li, J., Zhao, Y. & Gong, Y. (2019). Conditional teacher-student learning. In ICASSP. Meng, Z., Li, J., Zhao, Y. & Gong, Y. (2019). Conditional teacher-student learning. In ICASSP.
go back to reference Micaelli, P. & Storkey, A. J. (2019). Zero-shot knowledge transfer via adversarial belief matching. In NeurIPS. Micaelli, P. & Storkey, A. J. (2019). Zero-shot knowledge transfer via adversarial belief matching. In NeurIPS.
go back to reference Minami, S., Hirakawa, T., Yamashita, T. & Fujiyoshi, H. (2019). Knowledge transfer graph for deep collaborative learning. arXiv preprint arXiv:1909.04286. Minami, S., Hirakawa, T., Yamashita, T. & Fujiyoshi, H. (2019). Knowledge transfer graph for deep collaborative learning. arXiv preprint arXiv:​1909.​04286.
go back to reference Mirzadeh, S. I., Farajtabar, M., Li, A. & Ghasemzadeh, H. (2020). Improved knowledge distillation via teacher assistant. In AAAI. Mirzadeh, S. I., Farajtabar, M., Li, A. & Ghasemzadeh, H. (2020). Improved knowledge distillation via teacher assistant. In AAAI.
go back to reference Mishra, A. & Marr, D. (2018). Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In ICLR. Mishra, A. & Marr, D. (2018). Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In ICLR.
go back to reference Mobahi, H., Farajtabar, M., & Bartlett, P. L. (2020). Self-distillation amplifies regularization in hilbert space. In NeurIPS. Mobahi, H., Farajtabar, M., & Bartlett, P. L. (2020). Self-distillation amplifies regularization in hilbert space. In NeurIPS.
go back to reference Mou, L., Jia, R., Xu, Y., Li, G., Zhang, L. & Jin, Z. (2016). Distilling word embeddings: An encoding approach. In CIKM. Mou, L., Jia, R., Xu, Y., Li, G., Zhang, L. & Jin, Z. (2016). Distilling word embeddings: An encoding approach. In CIKM.
go back to reference Mukherjee, P., Das, A., Bhunia, A. K. & Roy, P. P. (2019). Cogni-net: Cognitive feature learning through deep visual perception. In ICIP. Mukherjee, P., Das, A., Bhunia, A. K. & Roy, P. P. (2019). Cogni-net: Cognitive feature learning through deep visual perception. In ICIP.
go back to reference Mullapudi, R. T., Chen, S., Zhang, K., Ramanan, D. & Fatahalian, K. (2019). Online model distillation for efficient video inference. In ICCV. Mullapudi, R. T., Chen, S., Zhang, K., Ramanan, D. & Fatahalian, K. (2019). Online model distillation for efficient video inference. In ICCV.
go back to reference Muller, R., Kornblith, S. & Hinton, G. E. (2019). When does label smoothing help? In NeurIPS. Muller, R., Kornblith, S. & Hinton, G. E. (2019). When does label smoothing help? In NeurIPS.
go back to reference Mun, J., Lee, K., Shin, J. & Han, B. (2018). Learning to specialize with knowledge distillation for visual question answering. In NeurIPS. Mun, J., Lee, K., Shin, J. & Han, B. (2018). Learning to specialize with knowledge distillation for visual question answering. In NeurIPS.
go back to reference Munjal, B., Galasso, F. & Amin, S. (2019). Knowledge distillation for end-to-end person search. In BMVC. Munjal, B., Galasso, F. & Amin, S. (2019). Knowledge distillation for end-to-end person search. In BMVC.
go back to reference Nakashole, N. & Flauger, R. (2017). Knowledge distillation for bilingual dictionary induction. In EMNLP. Nakashole, N. & Flauger, R. (2017). Knowledge distillation for bilingual dictionary induction. In EMNLP.
go back to reference Nayak, G. K., Mopuri, K. R., & Chakraborty, A. (2021). Effectiveness of arbitrary transfer sets for data-free knowledge distillation. In WACV. Nayak, G. K., Mopuri, K. R., & Chakraborty, A. (2021). Effectiveness of arbitrary transfer sets for data-free knowledge distillation. In WACV.
go back to reference Nayak, G. K., Mopuri, K. R., Shaj, V., Babu, R. V. & Chakraborty, A. (2019). Zero-shot knowledge distillation in deep networks. In ICML. Nayak, G. K., Mopuri, K. R., Shaj, V., Babu, R. V. & Chakraborty, A. (2019). Zero-shot knowledge distillation in deep networks. In ICML.
go back to reference Ng, R. W., Liu, X. & Swietojanski, P. (2018). Teacher-student training for text-independent speaker recognition. In SLTW. Ng, R. W., Liu, X. & Swietojanski, P. (2018). Teacher-student training for text-independent speaker recognition. In SLTW.
go back to reference Nie, X., Li, Y., Luo, L., Zhang, N. & Feng, J. (2019). Dynamic kernel distillation for efficient pose estimation in videos. In ICCV. Nie, X., Li, Y., Luo, L., Zhang, N. & Feng, J. (2019). Dynamic kernel distillation for efficient pose estimation in videos. In ICCV.
go back to reference Noroozi, M., Vinjimoor, A., Favaro, P. & Pirsiavash, H. (2018). Boosting self-supervised learning via knowledge transfer. In CVPR. Noroozi, M., Vinjimoor, A., Favaro, P. & Pirsiavash, H. (2018). Boosting self-supervised learning via knowledge transfer. In CVPR.
go back to reference Nowak, T. S. & Corso, J. J. (2018). Deep net triage: Analyzing the importance of network layers via structural compression. arXiv preprint arXiv:1801.04651. Nowak, T. S. & Corso, J. J. (2018). Deep net triage: Analyzing the importance of network layers via structural compression. arXiv preprint arXiv:​1801.​04651.
go back to reference Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., et al. (2018). Parallel wavenet: Fast high-fidelity speech synthesis. In ICML. Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., et al. (2018). Parallel wavenet: Fast high-fidelity speech synthesis. In ICML.
go back to reference Pan, B., Cai, H., Huang, D. A., Lee, K. H., Gaidon, A., Adeli, E., & Niebles, J. C. (2020). Spatio-temporal graph for video captioning with knowledge distillation. In CVPR Pan, B., Cai, H., Huang, D. A., Lee, K. H., Gaidon, A., Adeli, E., & Niebles, J. C. (2020). Spatio-temporal graph for video captioning with knowledge distillation. In CVPR
go back to reference Pan, Y., He, F., & Yu, H. (2019). A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing, 332, 137–148.CrossRef Pan, Y., He, F., & Yu, H. (2019). A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing, 332, 137–148.CrossRef
go back to reference Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I. & Talwar, K. (2017). Semi-supervised knowledge transfer for deep learning from private training data. In ICLR. Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I. & Talwar, K. (2017). Semi-supervised knowledge transfer for deep learning from private training data. In ICLR.
go back to reference Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE SP. Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE SP.
go back to reference Park, S. & Kwak, N. (2020). Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In ECAI. Park, S. & Kwak, N. (2020). Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In ECAI.
go back to reference Park, W., Kim, D., Lu, Y. & Cho, M. (2019). Relational knowledge distillation. In CVPR. Park, W., Kim, D., Lu, Y. & Cho, M. (2019). Relational knowledge distillation. In CVPR.
go back to reference Passban, P., Wu, Y., Rezagholizadeh, M., & Liu, Q. (2021). ALP-KD: Attention-based layer projection for knowledge distillation. In AAAI. Passban, P., Wu, Y., Rezagholizadeh, M., & Liu, Q. (2021). ALP-KD: Attention-based layer projection for knowledge distillation. In AAAI.
go back to reference Passalis, N. & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In ECCV. Passalis, N. & Tefas, A. (2018). Learning deep representations with probabilistic knowledge transfer. In ECCV.
go back to reference Passalis, N., Tzelepi, M., & Tefas, A. (2020b). Heterogeneous knowledge distillation using information flow modeling. In CVPR. Passalis, N., Tzelepi, M., & Tefas, A. (2020b). Heterogeneous knowledge distillation using information flow modeling. In CVPR.
go back to reference Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y., et al. (2019a). Correlation congruence for knowledge distillation. In ICCV. Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y., et al. (2019a). Correlation congruence for knowledge distillation. In ICCV.
go back to reference Peng, H., Du, H., Yu, H., Li, Q., Liao, J., & Fu, J. (2020). Cream of the crop: Distilling prioritized paths for one-shot neural architecture search. In NeurIPS. Peng, H., Du, H., Yu, H., Li, Q., Liao, J., & Fu, J. (2020). Cream of the crop: Distilling prioritized paths for one-shot neural architecture search. In NeurIPS.
go back to reference Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G. J. & Tang, J. (2019b). Few-shot image recognition with knowledge transfer. In ICCV. Peng, Z., Li, Z., Zhang, J., Li, Y., Qi, G. J. & Tang, J. (2019b). Few-shot image recognition with knowledge transfer. In ICCV.
go back to reference Perez, A., Sanguineti, V., Morerio, P. & Murino, V. (2020). Audio-visual model distillation using acoustic images. In WACV. Perez, A., Sanguineti, V., Morerio, P. & Murino, V. (2020). Audio-visual model distillation using acoustic images. In WACV.
go back to reference Phuong, M., & Lampert, C. H. (2019a). Towards understanding knowledge distillation. In ICML. Phuong, M., & Lampert, C. H. (2019a). Towards understanding knowledge distillation. In ICML.
go back to reference Phuong, M., & Lampert, C. H. (2019b). Distillation-based training for multi-exit architectures. In ICCV. Phuong, M., & Lampert, C. H. (2019b). Distillation-based training for multi-exit architectures. In ICCV.
go back to reference Pilzer, A., Lathuiliere, S., Sebe, N. & Ricci, E. (2019). Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In CVPR. Pilzer, A., Lathuiliere, S., Sebe, N. & Ricci, E. (2019). Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. In CVPR.
go back to reference Polino, A., Pascanu, R. & Alistarh, D. (2018). Model compression via distillation and quantization. In ICLR. Polino, A., Pascanu, R. & Alistarh, D. (2018). Model compression via distillation and quantization. In ICLR.
go back to reference Price, R., Iso, K., & Shinoda, K. (2016). Wise teachers train better DNN acoustic models. EURASIP Journal on Audio, Speech, and Music Processing, 2016(1), 10.CrossRef Price, R., Iso, K., & Shinoda, K. (2016). Wise teachers train better DNN acoustic models. EURASIP Journal on Audio, Speech, and Music Processing, 2016(1), 10.CrossRef
go back to reference Radosavovic, I., Dollar, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In CVPR. Radosavovic, I., Dollar, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In CVPR.
go back to reference Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollar P. (2020). Designing network design spaces. In CVPR. Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., & Dollar P. (2020). Designing network design spaces. In CVPR.
go back to reference Roheda, S., Riggan, B. S., Krim, H. & Dai, L. (2018). Cross-modality distillation: A case for conditional generative adversarial networks. In ICASSP. Roheda, S., Riggan, B. S., Krim, H. & Dai, L. (2018). Cross-modality distillation: A case for conditional generative adversarial networks. In ICASSP.
go back to reference Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In ICLR. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2015). Fitnets: Hints for thin deep nets. In ICLR.
go back to reference Ross, A. S. & Doshi-Velez, F. (2018). Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI. Ross, A. S. & Doshi-Velez, F. (2018). Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI.
go back to reference Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.
go back to reference Sanh, V., Debut, L., Chaumond, J. & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. Sanh, V., Debut, L., Chaumond, J. & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:​1910.​01108.
go back to reference Saputra, M. R. U., de Gusmao, P. P., Almalioglu, Y., Markham, A. & Trigoni, N. (2019). Distilling knowledge from a deep pose regressor network. In ICCV. Saputra, M. R. U., de Gusmao, P. P., Almalioglu, Y., Markham, A. & Trigoni, N. (2019). Distilling knowledge from a deep pose regressor network. In ICCV.
go back to reference Sau, B. B. & Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge from noisy teachers. arXiv preprint arXiv:1610.09650. Sau, B. B. & Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge from noisy teachers. arXiv preprint arXiv:​1610.​09650.
go back to reference Shen, C., Wang, X., Song, J., Sun, L., & Song, M. (2019a). Amalgamating knowledge towards comprehensive classification. In AAAI. Shen, C., Wang, X., Song, J., Sun, L., & Song, M. (2019a). Amalgamating knowledge towards comprehensive classification. In AAAI.
go back to reference Shen, C., Wang, X., Yin, Y., Song, J., Luo, S., & Song, M. (2021). Progressive network grafting for few-shot knowledge distillation. In AAAI. Shen, C., Wang, X., Yin, Y., Song, J., Luo, S., & Song, M. (2021). Progressive network grafting for few-shot knowledge distillation. In AAAI.
go back to reference Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019b). Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In ICCV. Shen, C., Xue, M., Wang, X., Song, J., Sun, L., & Song, M. (2019b). Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In ICCV.
go back to reference Shen, J., Vesdapunt, N., Boddeti, V. N. & Kitani, K. M. (2016). In teacher we trust: Learning compressed models for pedestrian detection. arXiv preprint arXiv:1612.00478. Shen, J., Vesdapunt, N., Boddeti, V. N. & Kitani, K. M. (2016). In teacher we trust: Learning compressed models for pedestrian detection. arXiv preprint arXiv:​1612.​00478.
go back to reference Shen, P., Lu, X., Li, S. & Kawai, H. (2018). Feature representation of short utterances based on knowledge distillation for spoken language identification. In Interspeech. Shen, P., Lu, X., Li, S. & Kawai, H. (2018). Feature representation of short utterances based on knowledge distillation for spoken language identification. In Interspeech.
go back to reference Shen, P., Lu, X., Li, S., & Kawai, H. (2020). Knowledge distillation-based representation learning for short-utterance spoken language identification. IEEE/ACM Transactions on Audio Speech and Language, 28, 2674–2683.CrossRef Shen, P., Lu, X., Li, S., & Kawai, H. (2020). Knowledge distillation-based representation learning for short-utterance spoken language identification. IEEE/ACM Transactions on Audio Speech and Language, 28, 2674–2683.CrossRef
go back to reference Shen, P., Lu, X., Li, S. & Kawai, H. (2019c). Interactive learning of teacher-student model for short utterance spoken language identification. In ICASSP. Shen, P., Lu, X., Li, S. & Kawai, H. (2019c). Interactive learning of teacher-student model for short utterance spoken language identification. In ICASSP.
go back to reference Shen, Z., He, Z. & Xue, X. (2019d). Meal: Multi-model ensemble via adversarial learning. In AAAI. Shen, Z., He, Z. & Xue, X. (2019d). Meal: Multi-model ensemble via adversarial learning. In AAAI.
go back to reference Shi, B., Sun, M., Kao, C. C., Rozgic, V., Matsoukas, S. & Wang, C. (2019a). Compression of acoustic event detection models with quantized distillation. In Interspeech. Shi, B., Sun, M., Kao, C. C., Rozgic, V., Matsoukas, S. & Wang, C. (2019a). Compression of acoustic event detection models with quantized distillation. In Interspeech.
go back to reference Shi, B., Sun, M., Kao, CC., Rozgic, V., Matsoukas, S. & Wang, C. (2019b). Semi-supervised acoustic event detection based on tri-training. In ICASSP. Shi, B., Sun, M., Kao, CC., Rozgic, V., Matsoukas, S. & Wang, C. (2019b). Semi-supervised acoustic event detection based on tri-training. In ICASSP.
go back to reference Shi, Y., Hwang, M. Y., Lei, X., & Sheng, H. (2019c). Knowledge distillation for recurrent neural network language modeling with trust regularization. In ICASSP. Shi, Y., Hwang, M. Y., Lei, X., & Sheng, H. (2019c). Knowledge distillation for recurrent neural network language modeling with trust regularization. In ICASSP.
go back to reference Shin, S., Boo, Y. & Sung, W. (2019). Empirical analysis of knowledge distillation technique for optimization of quantized deep neural networks. arXiv preprint arXiv:1909.01688. Shin, S., Boo, Y. & Sung, W. (2019). Empirical analysis of knowledge distillation technique for optimization of quantized deep neural networks. arXiv preprint arXiv:​1909.​01688.
go back to reference Shmelkov, K., Schmid, C., & Alahari, K. (2017). Incremental learning of object detectors without catastrophic forgetting. In ICCV. Shmelkov, K., Schmid, C., & Alahari, K. (2017). Incremental learning of object detectors without catastrophic forgetting. In ICCV.
go back to reference Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., & Ma, L.(2019). Knowledge squeezed adversarial network compression. arXiv preprint arXiv:1904.05100. Shu, C., Li, P., Xie, Y., Qu, Y., Dai, L., & Ma, L.(2019). Knowledge squeezed adversarial network compression. arXiv preprint arXiv:​1904.​05100.
go back to reference Siam, M., Jiang, C., Lu, S., Petrich, L., Gamal, M., Elhoseiny, M., et al. (2019). Video object segmentation using teacher-student adaptation in a human robot interaction (HRI) setting. In ICRA. Siam, M., Jiang, C., Lu, S., Petrich, L., Gamal, M., Elhoseiny, M., et al. (2019). Video object segmentation using teacher-student adaptation in a human robot interaction (HRI) setting. In ICRA.
go back to reference Sindhwani, V., Sainath, T. & Kumar, S. (2015). Structured transforms for small-footprint deep learning. In NeurIPS. Sindhwani, V., Sainath, T. & Kumar, S. (2015). Structured transforms for small-footprint deep learning. In NeurIPS.
go back to reference Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
go back to reference Song, X., Feng, F., Han, X., Yang, X., Liu, W. & Nie, L. (2018). Neural compatibility modeling with attentive knowledge distillation. In SIGIR. Song, X., Feng, F., Han, X., Yang, X., Liu, W. & Nie, L. (2018). Neural compatibility modeling with attentive knowledge distillation. In SIGIR.
go back to reference Srinivas, S. & Fleuret, F. (2018). Knowledge transfer with jacobian matching. In ICML. Srinivas, S. & Fleuret, F. (2018). Knowledge transfer with jacobian matching. In ICML.
go back to reference Su, J. C. & Maji, S. (2017). Adapting models to signal degradation using distillation. In BMVC. Su, J. C. & Maji, S. (2017). Adapting models to signal degradation using distillation. In BMVC.
go back to reference Sun, L., Gou, J., Yu, B., Du, L., & Tao, D. (2021) Collaborative teacher–student learning via multiple knowledge transfer. arXiv preprint arXiv:2101.08471. Sun, L., Gou, J., Yu, B., Du, L., & Tao, D. (2021) Collaborative teacher–student learning via multiple knowledge transfer. arXiv preprint arXiv:​2101.​08471.
go back to reference Sun, S., Cheng, Y., Gan, Z. & Liu, J. (2019). Patient knowledge distillation for bert model compression. In NEMNLP-IJCNLP. Sun, S., Cheng, Y., Gan, Z. & Liu, J. (2019). Patient knowledge distillation for bert model compression. In NEMNLP-IJCNLP.
go back to reference Sun, P., Feng, W., Han, R., Yan, S., & Wen, Y. (2019). Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. arXiv preprint arXiv:1902.06855. Sun, P., Feng, W., Han, R., Yan, S., & Wen, Y. (2019). Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. arXiv preprint arXiv:​1902.​06855.
go back to reference Takashima, R., Li, S. & Kawai, H. (2018). An investigation of a knowledge distillation method for CTC acoustic models. In ICASSP. Takashima, R., Li, S. & Kawai, H. (2018). An investigation of a knowledge distillation method for CTC acoustic models. In ICASSP.
go back to reference Tan, H., Liu, X., Liu, M., Yin, B., & Li, X. (2021). KT-GAN: Knowledge-transfer generative adversarial network for text-to-image synthesis. IEEE TIP, 30, 1275–1290. Tan, H., Liu, X., Liu, M., Yin, B., & Li, X. (2021). KT-GAN: Knowledge-transfer generative adversarial network for text-to-image synthesis. IEEE TIP, 30, 1275–1290.
go back to reference Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR. Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR.
go back to reference Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML.
go back to reference Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z. & Liu, T. Y. (2019). Multilingual neural machine translation with knowledge distillation. In ICLR. Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z. & Liu, T. Y. (2019). Multilingual neural machine translation with knowledge distillation. In ICLR.
go back to reference Tang, J., Shivanna, R., Zhao, Z., Lin, D., Singh, A., Chi, E. H., & Jain, S. (2020). Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532. Tang, J., Shivanna, R., Zhao, Z., Lin, D., Singh, A., Chi, E. H., & Jain, S. (2020). Understanding and improving knowledge distillation. arXiv preprint arXiv:​2002.​03532.
go back to reference Tang, J., & Wang, K. (2018). Ranking distillation: Learning compact ranking models with high performance for recommender system. In SIGKDD. Tang, J., & Wang, K. (2018). Ranking distillation: Learning compact ranking models with high performance for recommender system. In SIGKDD.
go back to reference Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O. & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O. & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint arXiv:​1903.​12136.
go back to reference Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS. Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS.
go back to reference Thoker, F. M. & Gall, J. (2019). Cross-modal knowledge distillation for action recognition. In ICIP. Thoker, F. M. & Gall, J. (2019). Cross-modal knowledge distillation for action recognition. In ICIP.
go back to reference Tian, Y., Krishnan, D. & Isola, P. (2020). Contrastive representation distillation. In ICLR. Tian, Y., Krishnan, D. & Isola, P. (2020). Contrastive representation distillation. In ICLR.
go back to reference Tu, Z., He, F., & Tao, D. (2020). Understanding generalization in recurrent neural networks. In International conference on learning representations. ICLR. Tu, Z., He, F., & Tao, D. (2020). Understanding generalization in recurrent neural networks. In International conference on learning representations. ICLR.
go back to reference Tung, F., & Mori, G. (2019). Similarity-preserving knowledge distillation. In ICCV. Tung, F., & Mori, G. (2019). Similarity-preserving knowledge distillation. In ICCV.
go back to reference Turc, I., Chang, M. W., Lee, K. & Toutanova, K. (2019). Well-read students learn better: The impact of student initialization on knowledge distillation. arXiv preprint arXiv:1908.08962. Turc, I., Chang, M. W., Lee, K. & Toutanova, K. (2019). Well-read students learn better: The impact of student initialization on knowledge distillation. arXiv preprint arXiv:​1908.​08962.
go back to reference Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R., (2017). Do deep convolutional nets really need to be deep and convolutional? In ICLR. Urban, G., Geras, K. J., Kahou, S. E., Aslan, O., Wang, S., Caruana, R., (2017). Do deep convolutional nets really need to be deep and convolutional? In ICLR.
go back to reference Vapnik, V., & Izmailov, R. (2015). Learning using privileged information: Similarity control and knowledge transfer. Journal of Machine Learning Research, 16(1), 2023–2049.MathSciNetMATH Vapnik, V., & Izmailov, R. (2015). Learning using privileged information: Similarity control and knowledge transfer. Journal of Machine Learning Research, 16(1), 2023–2049.MathSciNetMATH
go back to reference Vongkulbhisal, J., Vinayavekhin, P. & Visentini-Scarzanella, M. (2019). Unifying heterogeneous classifiers with distillation. In CVPR. Vongkulbhisal, J., Vinayavekhin, P. & Visentini-Scarzanella, M. (2019). Unifying heterogeneous classifiers with distillation. In CVPR.
go back to reference Walawalkar, D., Shen, Z., & Savvides, M. (2020). Online ensemble model compression using knowledge distillation. In ECCV. Walawalkar, D., Shen, Z., & Savvides, M. (2020). Online ensemble model compression using knowledge distillation. In ECCV.
go back to reference Wang, C., Lan, X. & Zhang, Y. (2017). Model distillation with knowledge transfer from face classification to alignment and verification. arXiv preprint arXiv:1709.02929. Wang, C., Lan, X. & Zhang, Y. (2017). Model distillation with knowledge transfer from face classification to alignment and verification. arXiv preprint arXiv:​1709.​02929.
go back to reference Wang, L., & Yoon, K. J. (2020). Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. arXiv preprint arXiv:2004.05937. Wang, L., & Yoon, K. J. (2020). Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. arXiv preprint arXiv:​2004.​05937.
go back to reference Wang, H., Zhao, H., Li, X. & Tan, X. (2018a). Progressive blockwise knowledge distillation for neural network acceleration. In IJCAI. Wang, H., Zhao, H., Li, X. & Tan, X. (2018a). Progressive blockwise knowledge distillation for neural network acceleration. In IJCAI.
go back to reference Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B., & Philip, S. Y. (2019a). Private model compression via knowledge distillation. In AAAI. Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B., & Philip, S. Y. (2019a). Private model compression via knowledge distillation. In AAAI.
go back to reference Wang, J., Gou, L., Zhang, W., Yang, H., & Shen, H. W. (2019b). Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. TVCG, 25(6), 2168–2180. Wang, J., Gou, L., Zhang, W., Yang, H., & Shen, H. W. (2019b). Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. TVCG, 25(6), 2168–2180.
go back to reference Wang, M., Liu, R., Abe, N., Uchida, H., Matsunami, T., & Yamada, S. (2018b). Discover the effective strategy for face recognition model compression by improved knowledge distillation. In ICIP. Wang, M., Liu, R., Abe, N., Uchida, H., Matsunami, T., & Yamada, S. (2018b). Discover the effective strategy for face recognition model compression by improved knowledge distillation. In ICIP.
go back to reference Wang, M., Liu, R., Hajime, N., Narishige, A., Uchida, H. & Matsunami, T.(2019c). Improved knowledge distillation for training fast low resolution face recognition model. In ICCVW. Wang, M., Liu, R., Hajime, N., Narishige, A., Uchida, H. & Matsunami, T.(2019c). Improved knowledge distillation for training fast low resolution face recognition model. In ICCVW.
go back to reference Wang, T., Yuan, L., Zhang, X. & Feng, J. (2019d). Distilling object detectors with fine-grained feature imitation. In CVPR. Wang, T., Yuan, L., Zhang, X. & Feng, J. (2019d). Distilling object detectors with fine-grained feature imitation. In CVPR.
go back to reference Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020a). Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In NeurIPS. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020a). Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In NeurIPS.
go back to reference Wang, W., Zhang, J., Zhang, H., Hwang, M. Y., Zong, C. & Li, Z. (2018d). A teacher-student framework for maintainable dialog manager. In EMNLP. Wang, W., Zhang, J., Zhang, H., Hwang, M. Y., Zong, C. & Li, Z. (2018d). A teacher-student framework for maintainable dialog manager. In EMNLP.
go back to reference Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020b). Exclusivity-consistency regularized knowledge distillation for face recognition. In ECCV. Wang, X., Fu, T., Liao, S., Wang, S., Lei, Z., & Mei, T. (2020b). Exclusivity-consistency regularized knowledge distillation for face recognition. In ECCV.
go back to reference Wang, X., Hu, J. F., Lai, J. H., Zhang, J. & Zheng, W. S. (2019e). Progressive teacher-student learning for early action prediction. In CVPR. Wang, X., Hu, J. F., Lai, J. H., Zhang, J. & Zheng, W. S. (2019e). Progressive teacher-student learning for early action prediction. In CVPR.
go back to reference Wang, X., Zhang, R., Sun, Y. & Qi, J. (2018e) Kdgan: Knowledge distillation with generative adversarial networks. In NeurIPS. Wang, X., Zhang, R., Sun, Y. & Qi, J. (2018e) Kdgan: Knowledge distillation with generative adversarial networks. In NeurIPS.
go back to reference Wang, Y., Xu, C., Xu, C., & Tao, D. (2019f). Packing convolutional neural networks in the frequency domain. IEEE TPAMI, 41(10), 2495–2510.CrossRef Wang, Y., Xu, C., Xu, C., & Tao, D. (2019f). Packing convolutional neural networks in the frequency domain. IEEE TPAMI, 41(10), 2495–2510.CrossRef
go back to reference Wang, Y., Xu, C., Xu, C. & Tao, D. (2018f). Adversarial learning of portable student networks. In AAAI. Wang, Y., Xu, C., Xu, C. & Tao, D. (2018f). Adversarial learning of portable student networks. In AAAI.
go back to reference Wang, Z. R., & Du, J. (2021). Joint architecture and knowledge distillation in CNN for Chinese text recognition. Pattern Recognition, 111, 107722.CrossRef Wang, Z. R., & Du, J. (2021). Joint architecture and knowledge distillation in CNN for Chinese text recognition. Pattern Recognition, 111, 107722.CrossRef
go back to reference Watanabe, S., Hori, T., Le Roux, J. & Hershey, J. R. (2017). Student-teacher network learning with enhanced features. In ICASSP. Watanabe, S., Hori, T., Le Roux, J. & Hershey, J. R. (2017). Student-teacher network learning with enhanced features. In ICASSP.
go back to reference Wei, H. R., Huang, S., Wang, R., Dai, X. & Chen, J. (2019). Online distilling from checkpoints for neural machine translation. In NAACL-HLT. Wei, H. R., Huang, S., Wang, R., Dai, X. & Chen, J. (2019). Online distilling from checkpoints for neural machine translation. In NAACL-HLT.
go back to reference Wei, Y., Pan, X., Qin, H., Ouyang, W. & Yan, J. (2018). Quantization mimic: Towards very tiny CNN for object detection. In ECCV. Wei, Y., Pan, X., Qin, H., Ouyang, W. & Yan, J. (2018). Quantization mimic: Towards very tiny CNN for object detection. In ECCV.
go back to reference Wong, J. H. & Gales, M. (2016). Sequence student-teacher training of deep neural networks. In Interspeech. Wong, J. H. & Gales, M. (2016). Sequence student-teacher training of deep neural networks. In Interspeech.
go back to reference Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., et al. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR. Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., et al. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR.
go back to reference Wu, A., Zheng, W. S., Guo, X. & Lai, J. H. (2019a). Distilled person re-identification: Towards a more scalable system. In CVPR. Wu, A., Zheng, W. S., Guo, X. & Lai, J. H. (2019a). Distilled person re-identification: Towards a more scalable system. In CVPR.
go back to reference Wu, G., & Gong, S. (2021). Peer collaborative learning for online knowledge distillation. In AAAI. Wu, G., & Gong, S. (2021). Peer collaborative learning for online knowledge distillation. In AAAI.
go back to reference Wu, J., Leng, C., Wang, Y., Hu, Q. & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In CVPR. Wu, J., Leng, C., Wang, Y., Hu, Q. & Cheng, J. (2016). Quantized convolutional neural networks for mobile devices. In CVPR.
go back to reference Wu, M. C., Chiu, C. T. & Wu, K. H. (2019b). Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In ICASSP. Wu, M. C., Chiu, C. T. & Wu, K. H. (2019b). Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In ICASSP.
go back to reference Wu, X., He, R., Hu, Y., & Sun, Z. (2020). Learning an evolutionary embedding via massive knowledge distillation. International Journal of Computer Vision, 1–18. Wu, X., He, R., Hu, Y., & Sun, Z. (2020). Learning an evolutionary embedding via massive knowledge distillation. International Journal of Computer Vision, 1–18.
go back to reference Xia, S., Wang, G., Chen, Z., & Duan, Y. (2018). Complete random forest based class noise filtering learning for improving the generalizability of classifiers. IEEE TKDE, 31(11), 2063–2078. Xia, S., Wang, G., Chen, Z., & Duan, Y. (2018). Complete random forest based class noise filtering learning for improving the generalizability of classifiers. IEEE TKDE, 31(11), 2063–2078.
go back to reference Xie, J., Lin, S., Zhang, Y. & Luo, L. (2019). Training convolutional neural networks with cheap convolutions and online distillation. arXiv preprint arXiv:1909.13063. Xie, J., Lin, S., Zhang, Y. & Luo, L. (2019). Training convolutional neural networks with cheap convolutions and online distillation. arXiv preprint arXiv:​1909.​13063.
go back to reference Xie, Q., Hovy, E., Luong, M. T., & Le, Q. V. (2020). Self-training with Noisy Student improves ImageNet classification. In CVPR. Xie, Q., Hovy, E., Luong, M. T., & Le, Q. V. (2020). Self-training with Noisy Student improves ImageNet classification. In CVPR.
go back to reference Xu, G., Liu, Z., Li, X., & Loy, C. C. (2020a). Knowledge distillation meets self-supervision. In ECCV. Xu, G., Liu, Z., Li, X., & Loy, C. C. (2020a). Knowledge distillation meets self-supervision. In ECCV.
go back to reference Xu, K., Rui, L., Li, Y., & Gu, L. (2020b). Feature normalized knowledge distillation for image classification. In ECCV. Xu, K., Rui, L., Li, Y., & Gu, L. (2020b). Feature normalized knowledge distillation for image classification. In ECCV.
go back to reference Xu, Z., Wu, K., Che, Z., Tang, J., & Ye, J. (2020c). Knowledge transfer in multi-task deep reinforcement learning for continuous control. In NeurIPS. Xu, Z., Wu, K., Che, Z., Tang, J., & Ye, J. (2020c). Knowledge transfer in multi-task deep reinforcement learning for continuous control. In NeurIPS.
go back to reference Xu, Z., Hsu, Y. C. & Huang, J. (2018a). Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In ICLR workshop. Xu, Z., Hsu, Y. C. & Huang, J. (2018a). Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In ICLR workshop.
go back to reference Xu, Z., Hsu, Y. C. & Huang, J. (2018b). Training student networks for acceleration with conditional adversarial networks. In BMVC. Xu, Z., Hsu, Y. C. & Huang, J. (2018b). Training student networks for acceleration with conditional adversarial networks. In BMVC.
go back to reference Xu, T. B., & Liu, C. L. (2019). Data-distortion guided self-distillation for deep neural networks. In AAAI. Xu, T. B., & Liu, C. L. (2019). Data-distortion guided self-distillation for deep neural networks. In AAAI.
go back to reference Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G. & Su, Z. (2019). Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. In ICCVW. Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G. & Su, Z. (2019). Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. In ICCVW.
go back to reference Yang, C., Xie, L., Qiao, S. & Yuille, A. (2019a). Knowledge distillation in generations: More tolerant teachers educate better students. In AAAI. Yang, C., Xie, L., Qiao, S. & Yuille, A. (2019a). Knowledge distillation in generations: More tolerant teachers educate better students. In AAAI.
go back to reference Yang, C., Xie, L., Su, C. & Yuille, A. L. (2019b). Snapshot distillation: Teacher-student optimization in one generation. In CVPR. Yang, C., Xie, L., Su, C. & Yuille, A. L. (2019b). Snapshot distillation: Teacher-student optimization in one generation. In CVPR.
go back to reference Yang, J., Martinez, B., Bulat, A., & Tzimiropoulos, G. (2020a). Knowledge distillation via adaptive instance normalization. In ECCV. Yang, J., Martinez, B., Bulat, A., & Tzimiropoulos, G. (2020a). Knowledge distillation via adaptive instance normalization. In ECCV.
go back to reference Yang, Y., Qiu, J., Song, M., Tao, D. & Wang, X. (2020b). Distilling knowledge from graph convolutional networks. In CVPR. Yang, Y., Qiu, J., Song, M., Tao, D. & Wang, X. (2020b). Distilling knowledge from graph convolutional networks. In CVPR.
go back to reference Yang, Z., Shou, L., Gong, M., Lin, W. & Jiang, D. (2020c). Model compression with two-stage multi-teacher knowledge distillation for web question answering system. In WSDM. Yang, Z., Shou, L., Gong, M., Lin, W. & Jiang, D. (2020c). Model compression with two-stage multi-teacher knowledge distillation for web question answering system. In WSDM.
go back to reference Yao, A., & Sun, D. (2020). Knowledge transfer via dense cross-layer mutual-distillation. In ECCV. Yao, A., & Sun, D. (2020). Knowledge transfer via dense cross-layer mutual-distillation. In ECCV.
go back to reference Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N. V., & Li, Z. (2020). Graph few-shot learning via knowledge transfer. In AAAI. Yao, H., Zhang, C., Wei, Y., Jiang, M., Wang, S., Huang, J., Chawla, N. V., & Li, Z. (2020). Graph few-shot learning via knowledge transfer. In AAAI.
go back to reference Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-GAN. In CVPR. Ye, J., Ji, Y., Wang, X., Gao, X., & Song, M. (2020). Data-free knowledge amalgamation via group-stack dual-GAN. In CVPR.
go back to reference Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D. & Song, M. (2019). Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In CVPR. Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D. & Song, M. (2019). Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In CVPR.
go back to reference Yim, J., Joo, D., Bae, J. & Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR. Yim, J., Joo, D., Bae, J. & Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR.
go back to reference Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., Jha, Niraj K., & Kautz, J. (2020). Dreaming to distill: Data-free knowledge transfer via DeepInversion. In CVPR. Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., Jha, Niraj K., & Kautz, J. (2020). Dreaming to distill: Data-free knowledge transfer via DeepInversion. In CVPR.
go back to reference Yoo, J., Cho, M., Kim, T., & Kang, U. (2019). Knowledge extraction with no observable data. In NeurIPS. Yoo, J., Cho, M., Kim, T., & Kang, U. (2019). Knowledge extraction with no observable data. In NeurIPS.
go back to reference You, S., Xu, C., Xu, C., & Tao, D. (2017). Learning from multiple teacher networks. In SIGKDD. You, S., Xu, C., Xu, C., & Tao, D. (2017). Learning from multiple teacher networks. In SIGKDD.
go back to reference You, S., Xu, C., Xu, C. & Tao, D. (2018). Learning with single-teacher multi-student. In AAAI. You, S., Xu, C., Xu, C. & Tao, D. (2018). Learning with single-teacher multi-student. In AAAI.
go back to reference You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., et al. (2019). Large batch optimization for deep learning: Training bert in 76 minutes. In ICLR. You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., et al. (2019). Large batch optimization for deep learning: Training bert in 76 minutes. In ICLR.
go back to reference Yu, L., Yazici, V. O., Liu, X., Weijer, J., Cheng, Y. & Ramisa, A. (2019). Learning metrics from teachers: Compact networks for image embedding. In CVPR. Yu, L., Yazici, V. O., Liu, X., Weijer, J., Cheng, Y. & Ramisa, A. (2019). Learning metrics from teachers: Compact networks for image embedding. In CVPR.
go back to reference Yu, X., Liu, T., Wang, X., & Tao, D. (2017). On compressing deep models by low rank and sparse decomposition. In CVPR. Yu, X., Liu, T., Wang, X., & Tao, D. (2017). On compressing deep models by low rank and sparse decomposition. In CVPR.
go back to reference Yuan, F., Shou, L., Pei, J., Lin, W., Gong, M., Fu, Y., & Jiang, D. (2021). Reinforced multi-teacher selection for knowledge distillation. In AAAI. Yuan, F., Shou, L., Pei, J., Lin, W., Gong, M., Fu, Y., & Jiang, D. (2021). Reinforced multi-teacher selection for knowledge distillation. In AAAI.
go back to reference Yuan, L., Tay, F. E., Li, G., Wang, T. & Feng, J. (2020). Revisit knowledge distillation: a teacher-free framework. In CVPR. Yuan, L., Tay, F. E., Li, G., Wang, T. & Feng, J. (2020). Revisit knowledge distillation: a teacher-free framework. In CVPR.
go back to reference Yuan, M., & Peng, Y. (2020). CKD: Cross-task knowledge distillation for text-to-image synthesis. IEEE TMM, 22(8), 1955–1968. Yuan, M., & Peng, Y. (2020). CKD: Cross-task knowledge distillation for text-to-image synthesis. IEEE TMM, 22(8), 1955–1968.
go back to reference Yue, K., Deng, J., & Zhou, F. (2020). Matching guided distillation. In ECCV. Yue, K., Deng, J., & Zhou, F. (2020). Matching guided distillation. In ECCV.
go back to reference Yun, S., Park, J., Lee, K. & Shin, J. (2020). Regularizing class-wise predictions via self-knowledge distillation. In CVPR. Yun, S., Park, J., Lee, K. & Shin, J. (2020). Regularizing class-wise predictions via self-knowledge distillation. In CVPR.
go back to reference Zagoruyko, S. & Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR. Zagoruyko, S. & Komodakis, N. (2017). Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR.
go back to reference Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M. & Mori, G. (2019). Lifelong gan: Continual learning for conditional image generation. In ICCV. Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M. & Mori, G. (2019). Lifelong gan: Continual learning for conditional image generation. In ICCV.
go back to reference Zhai, S., Cheng, Y., Zhang, Z. M. & Lu, W. (2016). Doubly convolutional neural networks. In NeurIPS. Zhai, S., Cheng, Y., Zhang, Z. M. & Lu, W. (2016). Doubly convolutional neural networks. In NeurIPS.
go back to reference Zhao, C., & Hospedales, T. (2020). Robust domain randomised reinforcement learning through peer-to-peer distillation. In NeurIPS. Zhao, C., & Hospedales, T. (2020). Robust domain randomised reinforcement learning through peer-to-peer distillation. In NeurIPS.
go back to reference Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020b). Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. In CVPR. Zhao, L., Peng, X., Chen, Y., Kapadia, M., & Metaxas, D. N. (2020b). Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. In CVPR.
go back to reference Zhao, M., Li, T., Abu Alsheikh, M., Tian, Y., Zhao, H., Torralba, A. & Katabi, D. (2018). Through-wall human pose estimation using radio signals. In CVPR. Zhao, M., Li, T., Abu Alsheikh, M., Tian, Y., Zhao, H., Torralba, A. & Katabi, D. (2018). Through-wall human pose estimation using radio signals. In CVPR.
go back to reference Zhang, C. & Peng, Y. (2018). Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. In IJCAI. Zhang, C. & Peng, Y. (2018). Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. In IJCAI.
go back to reference Zhang, F., Zhu, X. & Ye, M. (2019a). Fast human pose estimation. In CVPR. Zhang, F., Zhu, X. & Ye, M. (2019a). Fast human pose estimation. In CVPR.
go back to reference Zhang, H., Hu, Z., Qin, W., Xu, M., & Wang, M. (2021a). Adversarial co-distillation learning for image recognition. Pattern Recognition, 111, 107659.CrossRef Zhang, H., Hu, Z., Qin, W., Xu, M., & Wang, M. (2021a). Adversarial co-distillation learning for image recognition. Pattern Recognition, 111, 107659.CrossRef
go back to reference Zhang, L., Shi, Y., Shi, Z., Ma, K., & Bao, C. (2020a). Task-oriented feature distillation. In NeurIPS. Zhang, L., Shi, Y., Shi, Z., Ma, K., & Bao, C. (2020a). Task-oriented feature distillation. In NeurIPS.
go back to reference Zhang, L., Song, J., Gao, A., Chen, J., Bao, C. & Ma, K. (2019b). Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C. & Ma, K. (2019b). Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV.
go back to reference Zhang, M., Song, G., Zhou, H., & Liu, Y. (2020b). Discriminability distillation in group representation learning. In ECCV. Zhang, M., Song, G., Zhou, H., & Liu, Y. (2020b). Discriminability distillation in group representation learning. In ECCV.
go back to reference Zhang, S., Feng, Y., & Li, L. (2021b). Future-guided incremental transformer for simultaneous translation. In AAAI. Zhang, S., Feng, Y., & Li, L. (2021b). Future-guided incremental transformer for simultaneous translation. In AAAI.
go back to reference Zhang, S., Guo, S., Wang, L., Huang, W., & Scott, M. R. (2020c). Knowledge integration networks for action recognition. In AAAI. Zhang, S., Guo, S., Wang, L., Huang, W., & Scott, M. R. (2020c). Knowledge integration networks for action recognition. In AAAI.
go back to reference Zhang, W., Miao, X., Shao, Y., Jiang, J., Chen, L., Ruas, O., & Cui, B. (2020d). Reliable data distillation on graph convolutional network. In ACM SIGMOD. Zhang, W., Miao, X., Shao, Y., Jiang, J., Chen, L., Ruas, O., & Cui, B. (2020d). Reliable data distillation on graph convolutional network. In ACM SIGMOD.
go back to reference Zhang, X., Wang, X., Bian, J. W., Shen, C., & You, M. (2021c). Diverse knowledge distillation for end-to-end person search. In AAAI. Zhang, X., Wang, X., Bian, J. W., Shen, C., & You, M. (2021c). Diverse knowledge distillation for end-to-end person search. In AAAI.
go back to reference Zhang, X., Zhou, X., Lin, M. & Sun, J. (2018a). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR. Zhang, X., Zhou, X., Lin, M. & Sun, J. (2018a). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR.
go back to reference Zhang, Y., Lan, Z., Dai, Y., Zeng, F., Bai, Y., Chang, J., & Wei, Y. (2020e). Prime-aware adaptive distillation. In ECCV. Zhang, Y., Lan, Z., Dai, Y., Zeng, F., Bai, Y., Chang, J., & Wei, Y. (2020e). Prime-aware adaptive distillation. In ECCV.
go back to reference Zhang, Y., Xiang, T., Hospedales, T. M. & Lu, H. (2018b). Deep mutual learning. In CVPR. Zhang, Y., Xiang, T., Hospedales, T. M. & Lu, H. (2018b). Deep mutual learning. In CVPR.
go back to reference Zhang, Z., & Sabuncu, M. R. (2020). Self-distillation as instance-specific label smoothing. In NeurIPS. Zhang, Z., & Sabuncu, M. R. (2020). Self-distillation as instance-specific label smoothing. In NeurIPS.
go back to reference Zhang, Z., Shi, Y., Yuan, C., Li, B., Wang, P., Hu, W., & Zha, Z. J. (2020f). Object relational graph with teacher-recommended learning for video captioning. In CVPR. Zhang, Z., Shi, Y., Yuan, C., Li, B., Wang, P., Hu, W., & Zha, Z. J. (2020f). Object relational graph with teacher-recommended learning for video captioning. In CVPR.
go back to reference Zhou C, Neubig G, Gu J (2019a) Understanding knowledge distillation in non-autoregressive machine translation. In ICLR. Zhou C, Neubig G, Gu J (2019a) Understanding knowledge distillation in non-autoregressive machine translation. In ICLR.
go back to reference Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X. & Gai, K. (2018). Rocket launching: A universal and efficient framework for training well-performing light net. In AAAI. Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X. & Gai, K. (2018). Rocket launching: A universal and efficient framework for training well-performing light net. In AAAI.
go back to reference Zhou, J., Zeng, S. & Zhang, B. (2019b) Two-stage image classification supervised by a single teacher single student model. In BMVC. Zhou, J., Zeng, S. & Zhang, B. (2019b) Two-stage image classification supervised by a single teacher single student model. In BMVC.
go back to reference Zhou, P., Mai, L., Zhang, J., Xu, N., Wu, Z. & Davis, L. S. (2020). M2KD: Multi-model and multi-level knowledge distillation for incremental learning. In BMVC. Zhou, P., Mai, L., Zhang, J., Xu, N., Wu, Z. & Davis, L. S. (2020). M2KD: Multi-model and multi-level knowledge distillation for incremental learning. In BMVC.
go back to reference Zhu, M., Han, K., Zhang, C., Lin, J., & Wang, Y. (2019). Low-resolution visual recognition via deep feature distillation. In ICASSP. Zhu, M., Han, K., Zhang, C., Lin, J., & Wang, Y. (2019). Low-resolution visual recognition via deep feature distillation. In ICASSP.
go back to reference Zhu, X., & Gong, S. (2018). Knowledge distillation by on-the-fly native ensemble. In NeurIPS. Zhu, X., & Gong, S. (2018). Knowledge distillation by on-the-fly native ensemble. In NeurIPS.
Metadata
Title
Knowledge Distillation: A Survey
Authors
Jianping Gou
Baosheng Yu
Stephen J. Maybank
Dacheng Tao
Publication date
22-03-2021
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 6/2021
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-021-01453-z

Other articles of this Issue 6/2021

International Journal of Computer Vision 6/2021 Go to the issue

Premium Partner