Skip to main content
Erschienen in: Optical Memory and Neural Networks 4/2023

01.12.2023

Enhancement of Knowledge Distillation via Non-Linear Feature Alignment

verfasst von: Jiangxiao Zhang, Feng Gao, Lina Huo, Hongliang Wang, Ying Dang

Erschienen in: Optical Memory and Neural Networks | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deploying AI models on resource-constrained devices is indeed a challenging task. It requires models to have a small parameter while maintaining high performance. Achieving a balance between model size and performance is essential to ensuring the efficient and effective deployment of AI models in such environments. Knowledge distillation (KD) is an important model compression technique that aims to have a small model learn from a larger model by leveraging the high-performance features of the larger model to enhance the performance of the smaller model, ultimately achieving or surpassing the performance of the larger models. This paper presents a pipeline-based knowledge distillation method that improves model performance through non-linear feature alignment (FA) after the feature extraction stage. We conducted experiments on both single-teacher distillation and multi-teacher distillation and through extensive experimentation, we demonstrated that our method can improve the accuracy of knowledge distillation on the existing KD loss function and further improve the performance of small models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, and Ian Lane, The capio 2017 conversational speech recognition system, arXiv preprint arXiv:1801.00059, 2017. Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, and Ian Lane, The capio 2017 conversational speech recognition system, arXiv preprint arXiv:1801.00059, 2017.
2.
Zurück zum Zitat Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Rrecognition, 2016. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Rrecognition, 2016.
3.
Zurück zum Zitat Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018.
4.
Zurück zum Zitat Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho, Relational knowledge distillation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3967–3976. Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho, Relational knowledge distillation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3967–3976.
5.
Zurück zum Zitat Zhong Meng, Jinyu Li, Yong Zhao, and Yifan Gong, Conditional teacher-student learning, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6445–6449. Zhong Meng, Jinyu Li, Yong Zhao, and Yifan Gong, Conditional teacher-student learning, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 6445–6449.
6.
Zurück zum Zitat Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al., Distilling the knowledge in a neural network, vol. 2, no. 7, arXiv preprint arXiv:1503.02531, 2015. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al., Distilling the knowledge in a neural network, vol. 2, no. 7, arXiv preprint arXiv:1503.02531, 2015.
7.
Zurück zum Zitat Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio, Fitnets: Hints for thin deep nets, arXiv preprint arXiv:1412.6550, 2014. Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio, Fitnets: Hints for thin deep nets, arXiv preprint arXiv:1412.6550, 2014.
8.
Zurück zum Zitat Zagoruyko, S. and Komodakis, N., Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, arXiv preprint arXiv:1612.03928, 2016. Zagoruyko, S. and Komodakis, N., Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer, arXiv preprint arXiv:1612.03928, 2016.
9.
Zurück zum Zitat Jangho Kim, SeongUk Park, and Nojun Kwak, Paraphrasing complex network: Network compression via factor transfer, Adv. Neural Inf. Process. Syst., 2018, vol. 31. Jangho Kim, SeongUk Park, and Nojun Kwak, Paraphrasing complex network: Network compression via factor transfer, Adv. Neural Inf. Process. Syst., 2018, vol. 31.
10.
Zurück zum Zitat Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi, Knowledge transfer via distillation of activation boundaries formed by hidden neurons, in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 3779–3787.CrossRef Byeongho Heo, Minsik Lee, Sangdoo Yun, and Jin Young Choi, Knowledge transfer via distillation of activation boundaries formed by hidden neurons, in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 3779–3787.CrossRef
11.
Zurück zum Zitat Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, and Qun Liu, Alp-kd: Attentionbased layer projection for knowledge distillation, in Proceedings of the AAAI Conference on artificial intelligence, 2021, vol. 35, pp. 13657–13665.CrossRef Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, and Qun Liu, Alp-kd: Attentionbased layer projection for knowledge distillation, in Proceedings of the AAAI Conference on artificial intelligence, 2021, vol. 35, pp. 13657–13665.CrossRef
12.
Zurück zum Zitat Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia, Distilling knowledge via knowledge review, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5008–5017. Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia, Distilling knowledge via knowledge review, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5008–5017.
13.
Zurück zum Zitat Yonglong Tian, Dilip Krishnan, and Phillip Isola, Contrastive representation distillation, arXiv preprint arXiv:1910.10699, 2019. Yonglong Tian, Dilip Krishnan, and Phillip Isola, Contrastive representation distillation, arXiv preprint arXiv:1910.10699, 2019.
14.
Zurück zum Zitat Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, and Li Fei-Fei, Graph distillation for action detection with privileged modalities, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 166–183. Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, and Li Fei-Fei, Graph distillation for action detection with privileged modalities, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 166–183.
15.
Zurück zum Zitat Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang, Few-shot image recognition with knowledge transfer, in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 441–449. Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang, Few-shot image recognition with knowledge transfer, in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 441–449.
16.
Zurück zum Zitat Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, and Youliang Yan, Knowledge adaptation for efficient semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 578–587. Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, and Youliang Yan, Knowledge adaptation for efficient semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 578–587.
17.
Zurück zum Zitat Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang, Structured knowledge distillation for semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2604–2613. Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang, Structured knowledge distillation for semantic segmentation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2604–2613.
18.
Zurück zum Zitat Quanquan Li, Shengying Jin, and Junjie Yan, Mimicking very efficient network for object detection, in Proceedings of the ieee conference on computer vision and pattern recognition, 2017, pp. 6356–6364. Quanquan Li, Shengying Jin, and Junjie Yan, Mimicking very efficient network for object detection, in Proceedings of the ieee conference on computer vision and pattern recognition, 2017, pp. 6356–6364.
19.
Zurück zum Zitat Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari, Incremental learning of object detectors without catastrophic forgetting, in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3400–3409. Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari, Incremental learning of object detectors without catastrophic forgetting, in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3400–3409.
20.
Zurück zum Zitat Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, and Zheng-Jun Zha, Object relational graph with teacherrecommended learning for video captioning, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13278–13288. Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, and Zheng-Jun Zha, Object relational graph with teacherrecommended learning for video captioning, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13278–13288.
21.
Zurück zum Zitat Hongchen Tan, Xiuping Liu, Meng Liu, Baocai Yin, and Xin Li, Kt-gan: Knowledgetransfer generative adversarial network for textto-image synthesis, IEEE Trans. Image Process., 2020, vol. 30, pp. 1275–1290.CrossRef Hongchen Tan, Xiuping Liu, Meng Liu, Baocai Yin, and Xin Li, Kt-gan: Knowledgetransfer generative adversarial network for textto-image synthesis, IEEE Trans. Image Process., 2020, vol. 30, pp. 1275–1290.CrossRef
22.
Zurück zum Zitat Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang, Decoupled knowledge distillation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11953–11962. Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang, Decoupled knowledge distillation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11953–11962.
23.
Zurück zum Zitat Yuang Liu, Wei Zhang, and Jun Wang, Adaptive multi-teacher multi-level knowledge distillation, Neurocomputing, 2020, vol. 415, pp. 106– 113.CrossRef Yuang Liu, Wei Zhang, and Jun Wang, Adaptive multi-teacher multi-level knowledge distillation, Neurocomputing, 2020, vol. 415, pp. 106– 113.CrossRef
24.
Zurück zum Zitat Hailin Zhang, Defang Chen, and Can Wang, Confidence-aware multi-teacher knowledge distillation, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 4498–4502. Hailin Zhang, Defang Chen, and Can Wang, Confidence-aware multi-teacher knowledge distillation, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2022, pp. 4498–4502.
25.
Zurück zum Zitat Frederick Tung and Greg Mori, Similaritypreserving knowledge distillation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1365–1374. Frederick Tung and Greg Mori, Similaritypreserving knowledge distillation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1365–1374.
26.
Zurück zum Zitat Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, and Zhenwen Dai, Variational information distillation for knowledge transfer, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9163–9171. Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, and Zhenwen Dai, Variational information distillation for knowledge transfer, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9163–9171.
27.
Zurück zum Zitat Passalis, N. and Tefas, A., Learning deep representations with probabilistic knowledge transfer, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 268–284. Passalis, N. and Tefas, A., Learning deep representations with probabilistic knowledge transfer, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 268–284.
28.
Zurück zum Zitat Krizhevsky, A. Hinton, G., et al., Learning multiple layers of features from tiny images, 2009. Krizhevsky, A. Hinton, G., et al., Learning multiple layers of features from tiny images, 2009.
29.
Zurück zum Zitat Ya Le and Xuan Yang, Tiny imagenet visual recognition challenge, 2015, CS 231N, vol. 7, no. 7, p. 3. Ya Le and Xuan Yang, Tiny imagenet visual recognition challenge, 2015, CS 231N, vol. 7, no. 7, p. 3.
30.
Zurück zum Zitat Zagoruyko, S. and Komodakis, N., Wide residual networks, arXiv preprint arXiv:1605.07146, 2016. Zagoruyko, S. and Komodakis, N., Wide residual networks, arXiv preprint arXiv:1605.07146, 2016.
31.
Zurück zum Zitat Sandler, M., Howard, A., Menglong Zhu, Zhmoginov, A., and Liang-Chieh Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520. Sandler, M., Howard, A., Menglong Zhu, Zhmoginov, A., and Liang-Chieh Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
32.
Zurück zum Zitat Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
33.
Zurück zum Zitat Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116–131. Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116–131.
34.
Zurück zum Zitat Simonyan, K. and Zisserman, A., Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556, 2014. Simonyan, K. and Zisserman, A., Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556, 2014.
35.
Zurück zum Zitat Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang, Correlation congruence for knowledge distillation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5007–5016. Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang, Correlation congruence for knowledge distillation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5007–5016.
Metadaten
Titel
Enhancement of Knowledge Distillation via Non-Linear Feature Alignment
verfasst von
Jiangxiao Zhang
Feng Gao
Lina Huo
Hongliang Wang
Ying Dang
Publikationsdatum
01.12.2023
Verlag
Pleiades Publishing
Erschienen in
Optical Memory and Neural Networks / Ausgabe 4/2023
Print ISSN: 1060-992X
Elektronische ISSN: 1934-7898
DOI
https://doi.org/10.3103/S1060992X23040136

Weitere Artikel der Ausgabe 4/2023

Optical Memory and Neural Networks 4/2023 Zur Ausgabe

Premium Partner