Top

Neural Processing Letters

Published in:

14-11-2022

Positive-Unlabeled Learning for Knowledge Distillation

Authors: Ning Jiang, Jialiang Tang, Wenxin Yu

Published in: Neural Processing Letters | Issue 3/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Convolutional neural networks (CNNs) have greatly promoted the development of artificial intelligence. In general, CNNs with high performance are over-parameterized, requiring massive calculations to process and predict the data. It leads CNNs unable to apply to exiting resource-limited intelligence devices. In this paper, we propose an efficient model compression framework based on knowledge distillation to train a compact student network by a large teacher network. Our key point is to introduce a positive-unlabeled (PU) classifier to promote the compressed student network to learn the features of the prominent teacher network as much as possible. During the training, the PU classifier is to discriminate the features of the teacher network as high-quality and discriminate the features of the student network as low-quality. Simultaneously, the student network learns knowledge from the teacher network through the soft-targets and attention features. Extensive experimental evaluations on four benchmark image classification datasets show that our method outperforms the prior works with a large margin at the same parameters and calculations cost. When selecting the VGGNet19 as the teacher network to train on the CIFAR dataset, the student network VGGNet13 achieves 94.47% and 75.73% accuracy on the CIFAR-10 and CIFAR-100 datasets, which improved 1.02% and 2.44%, respectively.

previous article A Review on Multi-objective Optimization in Wireless Sensor Networks Using Nature Inspired Meta-heuristic Algorithms

next article Detection of Drug Abuse Using Rough Set and Neural Network-Based Elevated Mathematical Predictive Modelling

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Ba LJ, Caruana R (2013) Do deep nets really need to be deep? Advances in Neural Information Processing Systems, pp 2654–2662

Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, Springer pp 177–186

Chen D, Mei JP, Zhang Y, Wang C, Wang Z, Feng Y, Chen C (2021) Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp 7028–7036

Chen H, Wang Y, Xu C, Shi B, Xu C, Tian Q, Xu C (2020) Addernet: Do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1468–1477

Chen W, Hays J (2018) Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9416–9425

Chen Y, Wang N, Zhang Z (2018) Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: Thirty-Second AAAI Conference on Artificial Intelligence

Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

Coates A, Ng A, Lee H (2021) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223. JMLR Workshop and Conference Proceedings

Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

10.

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

11.

Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

12.

Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115

13.

Guo T, Xu C, Huang J, Wang Y, Shi B, Xu C, Tao D (2020) On positive-unlabeled classification in gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8385–8393

14.

Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: IEEE international conference on neural networks, pp 293–299. IEEE

15.

He K, Zhang X, Ren S, Jian S (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

16.

Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3779–3787

17.

Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comp Sci 14(7):38–39

18.

Hoang T, Do TT, Nguyen TV, Cheung NM (2020) Direct quantization for training highly accurate low bit-width deep neural networks. arXiv preprint arXiv:2012.13762

19.

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

20.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

21.

Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and\(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360

22.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift

23.

Kianfar D, Wiggers A, Said A, Pourreza R, Cohen T (2020) Parallelized rate-distortion optimized quantization using deep learning. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp 1–6. IEEE

24.

KingaD A (2015) A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR

25.

Kiryo R, Niu G, Plessis MCd, Sugiyama M (2017) Positive-unlabeled learning with non-negative risk estimator. arXiv preprint arXiv:1703.00593

26.

Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images

27.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

28.

LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

29.

LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1989) Optimal brain damage. In: NIPs, vol. 2, pp 598–605. Citeseer

30.

Li F, Zhang B, Liu B (2016) Ternary weight networks. arXiv preprint arXiv:1605.04711

31.

Li H, Hu J, Ran L, Wang Z, Lü Q, Du Z, Huang T (2021) Decentralized dual proximal gradient algorithms for non-smooth constrained composite optimization problems. IEEE Trans Parallel Distrib Syst 32(10):2594–2605CrossRef

32.

Li Y, Chen X, Wu F, Zha ZJ (2019) Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2323–2331

33.

Liang H, Guo X, Pan Y, Huang T (2020) Event-triggered fuzzy bipartite tracking control for network systems based on distributed reduced-order observers. IEEE Trans Fuzzy Syst 29(6):1601–1614CrossRef

34.

Liang H, Liu G, Zhang H, Huang T (2020) Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties. IEEE Transactions on Neural Networks and Learning Systems 32(5):2239–2250MathSciNetCrossRef

35.

Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130

36.

Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7096–7104

37.

Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744

38.

Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng TKT, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. arXiv preprint arXiv:1903.10258

39.

Liu Z, Sun M, Zhou T, Huang G, Darrell T (2018) Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270

40.

Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440

41.

Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning

42.

Odena A (2016) Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583

43.

Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3967–3976

44.

Passalis N, Tefas A (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 268–284

45.

Peng B, Jin X, Liu J, Li D, Wu Y, Liu Y, Zhou S, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5007–5016

46.

Ping Q, Wu B, Ding W, Yuan J (2019) Fashion-attgan: Attribute-aware fashion editing with multi-objective gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

47.

Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision, Springer. pp 525–542

48.

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

49.

Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. Computer Science

50.

Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32

51.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

52.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

53.

Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114. PMLR

54.

Tang J, Jin L, Li Z, Gao S (2015) Rgb-d object recognition via incorporating latent data structure and prior knowledge. IEEE Trans Multimedia 17(11):1899–1908CrossRef

55.

Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv preprint arXiv:1910.10699

56.

Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1365–1374

57.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

58.

Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

59.

Wang K, Gao X, Zhao Y, Li X, Dou D, Xu CZ (2019) Pay attention to features, transfer learn faster cnns. In: International Conference on Learning Representations

60.

Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks

61.

Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

62.

Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4820–4828

63.

Xi C, Yan D, Houthooft R, Schulman J, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Neural Information Processing Systems (NIPS)

64.

Xu S, Ren X, Ma S, Wang H (2017) meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting. In: ICML 2017

65.

Xu Y, Wang Y, Chen H, Han K, Xu C, Tao D, Xu C (2019) Positive-unlabeled compression on the cloud. arXiv preprint arXiv:1909.09757

66.

Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. In: IJCAI, pp 3182–3188

67.

Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks

68.

Yi Z, Zhang H, Tan P, Gong M (2017) Dualgan: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp 2849–2857

69.

Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928

70.

Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146

71.

Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

72.

Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5802–5810

Title: Positive-Unlabeled Learning for Knowledge Distillation
Authors: Ning Jiang
Jialiang Tang
Wenxin Yu
Publication date: 14-11-2022
Publisher: Springer US
Published in: Neural Processing Letters / Issue 3/2023
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-11038-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2023

Attacking Object Detector by Simultaneously Learning Perturbations and Locations

Deep Constraints Space of Medium Modality for RGB-Infrared Person Re-identification

LGADet: Light-weight Anchor-free Multispectral Pedestrian Detection with Mixed Local and Global Attention

Adaptive Decentralized Tracking Control for a Class of Large-Scale Nonlinear Systems with Dynamic Uncertainties Using Multi-dimensional Taylor Network Approach

A Novel Hybrid Sampling Method ESMOTE+SSLM for Handling the Problem of Class Imbalance with Overlap in Financial Distress Detection

Generate Usable Adversarial Examples via Simulating Additional Light Sources