Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2023

26.09.2023 | Short Paper

ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

verfasst von: Farhat Abbas, Mussarat Yasmin, Muhammad Fayyaz, Usman Asim

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision. 3361–3369. Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision. 3361–3369.
2.
Zurück zum Zitat Yoshihashi R, Trinh TT, Kawakami R, You S, Iida M, Naemura T (2018) Pedestrian detection with motion features via two-stream ConvNets. IPSJ Trans Compute Vis Appl 10:12 Yoshihashi R, Trinh TT, Kawakami R, You S, Iida M, Naemura T (2018) Pedestrian detection with motion features via two-stream ConvNets. IPSJ Trans Compute Vis Appl 10:12
3.
Zurück zum Zitat Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2018) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Analy Appl 22:1377–1397MathSciNet Khan MA, Akram T, Sharif M, Javed MY, Muhammad N, Yasmin M (2018) An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Analy Appl 22:1377–1397MathSciNet
4.
Zurück zum Zitat Yao H, Zhang S, Hong R, Zhang Y, Xu C, Tian Q (2019) Deep representation learning with part loss for person re-identification. IEEE Trans Image Proc 28(6):2860–2871MathSciNetMATH Yao H, Zhang S, Hong R, Zhang Y, Xu C, Tian Q (2019) Deep representation learning with part loss for person re-identification. IEEE Trans Image Proc 28(6):2860–2871MathSciNetMATH
5.
Zurück zum Zitat Ng C-B, Tay Y-H, Goi B-M (2015) A review of facial gender recognition. Pattern Anal Appl 18:739–755MathSciNet Ng C-B, Tay Y-H, Goi B-M (2015) A review of facial gender recognition. Pattern Anal Appl 18:739–755MathSciNet
6.
Zurück zum Zitat Azzopardi G, Greco A, Saggese A, Vento M (2017) Fast gender recognition in videos using a novel descriptor based on the gradient magnitudes of facial landmarks. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. Azzopardi G, Greco A, Saggese A, Vento M (2017) Fast gender recognition in videos using a novel descriptor based on the gradient magnitudes of facial landmarks. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6.
7.
Zurück zum Zitat Fayyaz M, Yasmin M, Sharif M, Raza M (2021) J-LDFR: joint low-level and deep neural network feature representations for pedestrian gender classification. Neural Comput Appl 33:361–391 Fayyaz M, Yasmin M, Sharif M, Raza M (2021) J-LDFR: joint low-level and deep neural network feature representations for pedestrian gender classification. Neural Comput Appl 33:361–391
8.
Zurück zum Zitat Cai L, Zhu J, Zeng H, Chen J, Cai C (2018) Deep-learned and hand-crafted features fusion network for pedestrian gender recognition. In: Proceedings of ELM-2016, Springer. 207–215. Cai L, Zhu J, Zeng H, Chen J, Cai C (2018) Deep-learned and hand-crafted features fusion network for pedestrian gender recognition. In: Proceedings of ELM-2016, Springer. 207–215.
9.
Zurück zum Zitat Gornale S, Basavanna M, Kruti R (2017) Fingerprint based gender classification using local binary pattern. Int J Comput Intell Res ISSN, 0973–1873 Gornale S, Basavanna M, Kruti R (2017) Fingerprint based gender classification using local binary pattern. Int J Comput Intell Res ISSN, 0973–1873
10.
Zurück zum Zitat Kruti R, Patil A, Gornale S (2019) Fusion of local binary pattern and local phase quantization features set for gender classification using fingerprints. Int J Comput Sci Eng 7:22–29 Kruti R, Patil A, Gornale S (2019) Fusion of local binary pattern and local phase quantization features set for gender classification using fingerprints. Int J Comput Sci Eng 7:22–29
11.
Zurück zum Zitat Salih BM, Abdulazeez AM, Hassan OMS (2021) Gender classification based on iris recognition using artificial neural networks. Qubahan Acad J 1:156–163 Salih BM, Abdulazeez AM, Hassan OMS (2021) Gender classification based on iris recognition using artificial neural networks. Qubahan Acad J 1:156–163
12.
Zurück zum Zitat Tapia J, Arellano C (2019) Gender classification from Iris texture images using a new set of binary statistical image features. In: 2019 International Conference on Biometrics (ICB). 1-7. Tapia J, Arellano C (2019) Gender classification from Iris texture images using a new set of binary statistical image features. In: 2019 International Conference on Biometrics (ICB). 1-7.
13.
Zurück zum Zitat Ahmed K, Saini M (2022) FCML-gait: fog computing and machine learning inspired human identity and gender recognition using gait sequences. Signal, Image Video Proc 17(4):925–936 Ahmed K, Saini M (2022) FCML-gait: fog computing and machine learning inspired human identity and gender recognition using gait sequences. Signal, Image Video Proc 17(4):925–936
14.
Zurück zum Zitat Lee M, Lee J-H, Kim D-H (2022) Gender recognition using optimal gait feature based on recursive feature elimination in normal walking. Expert Syst Appl 189:116040 Lee M, Lee J-H, Kim D-H (2022) Gender recognition using optimal gait feature based on recursive feature elimination in normal walking. Expert Syst Appl 189:116040
15.
Zurück zum Zitat Liu T, Ye X, Sun (2018) Combining convolutional neural network and support vector machine for gait-based gender recognition. In: 2018 Chinese Automation Congress (CAC) 3477-3481. Liu T, Ye X, Sun (2018) Combining convolutional neural network and support vector machine for gait-based gender recognition. In: 2018 Chinese Automation Congress (CAC) 3477-3481.
16.
Zurück zum Zitat Gupta S, Thakur K, Kumar M (2021) 2D-human face recognition using SIFT and SURF descriptors of face’s feature regions. Vis Comput 37:447–456 Gupta S, Thakur K, Kumar M (2021) 2D-human face recognition using SIFT and SURF descriptors of face’s feature regions. Vis Comput 37:447–456
17.
Zurück zum Zitat Ahmadi N, Akbarizadeh G (2020) Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Comput Appl 32:2267–2281 Ahmadi N, Akbarizadeh G (2020) Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Comput Appl 32:2267–2281
18.
Zurück zum Zitat Carletti V, Greco A, Saggese A, Vento M (2020) An effective real time gender recognition system for smart cameras. J Ambient Intell Humaniz Comput 11:2407–2419 Carletti V, Greco A, Saggese A, Vento M (2020) An effective real time gender recognition system for smart cameras. J Ambient Intell Humaniz Comput 11:2407–2419
19.
Zurück zum Zitat Greco A, Saggese A, Vento M, Vigilante V (2021) Gender recognition in the wild: a robustness evaluation over corrupted images. J Ambient Intell Humaniz Comput 12:10461–10472 Greco A, Saggese A, Vento M, Vigilante V (2021) Gender recognition in the wild: a robustness evaluation over corrupted images. J Ambient Intell Humaniz Comput 12:10461–10472
20.
Zurück zum Zitat Guo G, Mu G, Fu Y (2009) Gender from body: a biologically-inspired approach with manifold learning. In: Asian conference on computer vision 236–245. Guo G, Mu G, Fu Y (2009) Gender from body: a biologically-inspired approach with manifold learning. In: Asian conference on computer vision 236–245.
21.
Zurück zum Zitat Yaghoubi E, Alirezazadeh P, Assunção E, Neves JC, Proençaã H (2019) Region-based cnns for pedestrian gender recognition in visual surveillance environments. In: 2019 International Conference of the Biometrics Special Interest Group (BIOSIG) 1-5. Yaghoubi E, Alirezazadeh P, Assunção E, Neves JC, Proençaã H (2019) Region-based cnns for pedestrian gender recognition in visual surveillance environments. In: 2019 International Conference of the Biometrics Special Interest Group (BIOSIG) 1-5.
22.
Zurück zum Zitat Sun Y, Zhang M, Sun Z, Tan T (2017) Demographic analysis from biometric data: achievements, challenges, and new frontiers. IEEE Trans Pattern Anal Mach Intell 40:332–351 Sun Y, Zhang M, Sun Z, Tan T (2017) Demographic analysis from biometric data: achievements, challenges, and new frontiers. IEEE Trans Pattern Anal Mach Intell 40:332–351
23.
Zurück zum Zitat Ahad M, Fayyaz M (2021) Pedestrian gender recognition with handcrafted feature ensembles. Azerbaijan J High Perform Comput 4(1):60–90 Ahad M, Fayyaz M (2021) Pedestrian gender recognition with handcrafted feature ensembles. Azerbaijan J High Perform Comput 4(1):60–90
24.
Zurück zum Zitat Ng C-B, Tay Y-H, Goi B-M (2013) A convolutional neural network for pedestrian gender recognition. In: International symposium on neural networks 558–564. Ng C-B, Tay Y-H, Goi B-M (2013) A convolutional neural network for pedestrian gender recognition. In: International symposium on neural networks 558–564.
25.
Zurück zum Zitat Abbas F, Yasmin M, Fayyaz M, Abd Elaziz M, Lu S, El-Latif AAA (2021) Gender classification using proposed CNN-based model and ant colony optimization. Mathematics 9(2499):2021 Abbas F, Yasmin M, Fayyaz M, Abd Elaziz M, Lu S, El-Latif AAA (2021) Gender classification using proposed CNN-based model and ant colony optimization. Mathematics 9(2499):2021
26.
Zurück zum Zitat Cai L, Zhu J, Zeng H, Chen J, Cai C, Ma K-K (2018) Hog-assisted deep feature learning for pedestrian gender recognition. J Franklin Inst 355:1991–2008 Cai L, Zhu J, Zeng H, Chen J, Cai C, Ma K-K (2018) Hog-assisted deep feature learning for pedestrian gender recognition. J Franklin Inst 355:1991–2008
27.
Zurück zum Zitat Cai L, Zeng H, Zhu J, Cao J, Wang Y, Ma K-K (2020) Cascading scene and viewpoint feature learning for pedestrian gender recognition. IEEE Internet Things J 8:3014–3026 Cai L, Zeng H, Zhu J, Cao J, Wang Y, Ma K-K (2020) Cascading scene and viewpoint feature learning for pedestrian gender recognition. IEEE Internet Things J 8:3014–3026
28.
Zurück zum Zitat Geelen CD, Wijnhoven RG, Dubbelman G (2015) Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs. Video Surv Transp Imaging Appl 2015:170–183 Geelen CD, Wijnhoven RG, Dubbelman G (2015) Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs. Video Surv Transp Imaging Appl 2015:170–183
29.
Zurück zum Zitat Raza M, Zonghai C, Rehman SU, Zhenhua G, Jikai W, Peng B (2017) Part-wise pedestrian gender recognition via deep convolutional neural networks Raza M, Zonghai C, Rehman SU, Zhenhua G, Jikai W, Peng B (2017) Part-wise pedestrian gender recognition via deep convolutional neural networks
30.
Zurück zum Zitat Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16 Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
31.
Zurück zum Zitat Cui R, Hua G, Zhu A, Wu J, Liu H (2019) Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7:8245–8257 Cui R, Hua G, Zhu A, Wu J, Liu H (2019) Hard sample mining and learning for skeleton-based human action recognition and identification. IEEE Access 7:8245–8257
32.
Zurück zum Zitat Nogay HS, Akinci TC, Yilmaz M (2022) Detection of invisible cracks in ceramic materials using by pre-trained deep convolutional neural network. Neural Comput Appl 34:1423–1432 Nogay HS, Akinci TC, Yilmaz M (2022) Detection of invisible cracks in ceramic materials using by pre-trained deep convolutional neural network. Neural Comput Appl 34:1423–1432
33.
Zurück zum Zitat Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I et al (2022) DS-CNN: A pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288 Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I et al (2022) DS-CNN: A pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288
34.
Zurück zum Zitat Krishnaswamy Rangarajan A, Purushothaman R (2020) Disease classification in eggplant using pre-trained VGG16 and MSVM. Sci Rep 10:2322 Krishnaswamy Rangarajan A, Purushothaman R (2020) Disease classification in eggplant using pre-trained VGG16 and MSVM. Sci Rep 10:2322
35.
Zurück zum Zitat Bhojanapalli S, Chakrabarti A, Glasner D, Li D, Unterthiner T, Veit A (2021) Understanding robustness of transformers for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision. 10231–10241. Bhojanapalli S, Chakrabarti A, Glasner D, Li D, Unterthiner T, Veit A (2021) Understanding robustness of transformers for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision. 10231–10241.
36.
Zurück zum Zitat Dong H, Zhang L, Zou B (2021) Exploring vision transformers for polarimetric SAR image classification. IEEE Trans Geosci Remote Sens 60:1–15 Dong H, Zhang L, Zou B (2021) Exploring vision transformers for polarimetric SAR image classification. IEEE Trans Geosci Remote Sens 60:1–15
37.
Zurück zum Zitat Yu S, Ma K, Bi Q, Bian C, Ning M, He N, Li Y, Liu H, Zheng Y (2021) Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, Proceedings, Part VIII 24 2021 (pp. 45-54). Springer International Publishing. Yu S, Ma K, Bi Q, Bian C, Ning M, He N, Li Y, Liu H, Zheng Y (2021) Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, Proceedings, Part VIII 24 2021 (pp. 45-54). Springer International Publishing.
38.
Zurück zum Zitat Paul S, Chen P-Y (2022) Vision transformers are robust earners. In: Proceedings of the AAAI conference on Artificial Intelligence 2071–2081. Paul S, Chen P-Y (2022) Vision transformers are robust earners. In: Proceedings of the AAAI conference on Artificial Intelligence 2071–2081.
39.
Zurück zum Zitat Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to cnns. arXiv preprint arXiv:2110.02797 Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to cnns. arXiv preprint arXiv:​2110.​02797
41.
Zurück zum Zitat Collins M, Zhang J, Miller P, Wang H (2009) Full body image feature representations for gender profiling. In: 2009 IEEE 12th International conference on computer vision workshops, ICCV workshops. 1235-1242. Collins M, Zhang J, Miller P, Wang H (2009) Full body image feature representations for gender profiling. In: 2009 IEEE 12th International conference on computer vision workshops, ICCV workshops. 1235-1242.
42.
Zurück zum Zitat Cao L, Dikmen M, Fu Y, Huang TS (2008) Gender recognition from body. In: Proceedings of the 16th ACM international conference on Multimedia. 725–728. Cao L, Dikmen M, Fu Y, Huang TS (2008) Gender recognition from body. In: Proceedings of the 16th ACM international conference on Multimedia. 725–728.
43.
Zurück zum Zitat Li C, Guo J, Porikli F, Pang Y (2018) Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recogn Lett 104:15–22 Li C, Guo J, Porikli F, Pang Y (2018) Lightennet: a convolutional neural network for weakly illuminated image enhancement. Pattern Recogn Lett 104:15–22
44.
Zurück zum Zitat Rashid M, Khan MA, Sharif M, Raza M, Sarfraz MM, Afza F (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features. Multimed Tools Appl 78:15751–15777 Rashid M, Khan MA, Sharif M, Raza M, Sarfraz MM, Afza F (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features. Multimed Tools Appl 78:15751–15777
45.
Zurück zum Zitat Khan MA, Akram T, Sharif M, Awais M, Javed K, Ali H et al (2018) CCDF: automatic system for segmentation and recognition of fruit crops diseases based on correlation coefficient and deep CNN features. Comput Electron Agric 155:220–236 Khan MA, Akram T, Sharif M, Awais M, Javed K, Ali H et al (2018) CCDF: automatic system for segmentation and recognition of fruit crops diseases based on correlation coefficient and deep CNN features. Comput Electron Agric 155:220–236
46.
Zurück zum Zitat Sharif M, Attique Khan M, Rashid M, Yasmin M, Afza F, Tanik UJ (2019) Deep CNN and geometric features-based gastrointestinal tract diseases detection and classification from wireless capsule endoscopy images. J Exp Theore Artif Intell 33(4):577–599 Sharif M, Attique Khan M, Rashid M, Yasmin M, Afza F, Tanik UJ (2019) Deep CNN and geometric features-based gastrointestinal tract diseases detection and classification from wireless capsule endoscopy images. J Exp Theore Artif Intell 33(4):577–599
47.
Zurück zum Zitat Raza M, Sharif M, Yasmin M, Khan MA, Saba T, Fernandes SL (2018) Appearance based pedestrians’ gender recognition by employing stacked auto encoders in deep learning. Futur Gener Comput Syst 88:28–39 Raza M, Sharif M, Yasmin M, Khan MA, Saba T, Fernandes SL (2018) Appearance based pedestrians’ gender recognition by employing stacked auto encoders in deep learning. Futur Gener Comput Syst 88:28–39
48.
Zurück zum Zitat Cai L, Zeng H, Zhu J, Cao J, Wang Y, Ma KK (2020) Cascading scene and viewpoint feature learning for pedestrian gender recognition. IEEE Internet Things J 8(4):3014–3026 Cai L, Zeng H, Zhu J, Cao J, Wang Y, Ma KK (2020) Cascading scene and viewpoint feature learning for pedestrian gender recognition. IEEE Internet Things J 8(4):3014–3026
49.
Zurück zum Zitat Abbas F, Yasmin M, Fayyaz M, Elaziz MA, Lu S, El-Latif AAA (2021) Gender classification using proposed cnn-based model and ant colony optimization. Mathematics 9:2499 Abbas F, Yasmin M, Fayyaz M, Elaziz MA, Lu S, El-Latif AAA (2021) Gender classification using proposed cnn-based model and ant colony optimization. Mathematics 9:2499
50.
Zurück zum Zitat Ng CB, Tay YH, Goi BM (2017) Training strategy for convolutional neural networks in pedestrian gender classification. In: Second International Workshop on Pattern Recognition 10443: 226-230. SPIE. Ng CB, Tay YH, Goi BM (2017) Training strategy for convolutional neural networks in pedestrian gender classification. In: Second International Workshop on Pattern Recognition 10443: 226-230. SPIE.
51.
Zurück zum Zitat Cai L, Zeng H, Zhu J, Cao J, Hou J, Cai C (2017) Multi-view joint learning network for pedestrian gender classification. In: 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) 23-27. Cai L, Zeng H, Zhu J, Cao J, Hou J, Cai C (2017) Multi-view joint learning network for pedestrian gender classification. In: 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) 23-27.
52.
Zurück zum Zitat Ng C-B, Tay Y-H, Goi B-M (2013) Comparing image representations for training a convolutional neural network to classify gender. In: 2013 1st International Conference on Artificial Intelligence, Modelling and Simulatio 29–33. Ng C-B, Tay Y-H, Goi B-M (2013) Comparing image representations for training a convolutional neural network to classify gender. In: 2013 1st International Conference on Artificial Intelligence, Modelling and Simulatio 29–33.
53.
Zurück zum Zitat Antipov G, Berrani S-A, Ruchaud N, Dugelay J-L (2015) Learned vs. hand-crafted features for pedestrian gender recognition. In: Proceedings of the 23rd ACM international conference on Multimedia 1263–1266. Antipov G, Berrani S-A, Ruchaud N, Dugelay J-L (2015) Learned vs. hand-crafted features for pedestrian gender recognition. In: Proceedings of the 23rd ACM international conference on Multimedia 1263–1266.
54.
Zurück zum Zitat Ng C-B, Tay Y-H, Goi B-M (2019) Pedestrian gender classification using combined global and local parts-based convolutional neural networks. Pattern Anal Appl 22:1469–1480MathSciNet Ng C-B, Tay Y-H, Goi B-M (2019) Pedestrian gender classification using combined global and local parts-based convolutional neural networks. Pattern Anal Appl 22:1469–1480MathSciNet
55.
Zurück zum Zitat Xu J, Luo L, Deng C, Huang H (2018) Bilevel distance metric learning for robust image recognition. In: Advances in Neural Information Processing Systems 4198–4207. Xu J, Luo L, Deng C, Huang H (2018) Bilevel distance metric learning for robust image recognition. In: Advances in Neural Information Processing Systems 4198–4207.
56.
Zurück zum Zitat Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3415–3424. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3415–3424.
57.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proc Syst 25 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proc Syst 25
58.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017) Attention is all you need. Adv Neural Inf Proc Syst 30 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017) Attention is all you need. Adv Neural Inf Proc Syst 30
59.
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805.
60.
Zurück zum Zitat Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
61.
Zurück zum Zitat Srinivas A, Lin T-Y, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 16519–16529. Srinivas A, Lin T-Y, Parmar N, Shlens J, Abbeel P, Vaswani A (2021) Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 16519–16529.
62.
Zurück zum Zitat Jie H, Li S, Gang S, Albanie S (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Jie H, Li S, Gang S, Albanie S (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition.
63.
64.
Zurück zum Zitat Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929.
65.
Zurück zum Zitat Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision 843–852. Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE international conference on computer vision 843–852.
66.
Zurück zum Zitat Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning 10347-10357. PMLR. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning 10347-10357. PMLR.
67.
Zurück zum Zitat Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z, et al. (2021) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In: 2021 IEEE, in CVF International Conference on Computer Vision, ICCV 538-547. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z, et al. (2021) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In: 2021 IEEE, in CVF International Conference on Computer Vision, ICCV 538-547.
68.
Zurück zum Zitat Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 11936–11945. Heo B, Yun S, Han D, Chun S, Choe J, Oh SJ (2021) Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 11936–11945.
69.
Zurück zum Zitat Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 22-31). Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision (pp. 22-31).
70.
Zurück zum Zitat Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision 10012-10022 Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision 10012-10022
71.
Zurück zum Zitat Deng Y, Luo P, Loy CC, Tang X (2014)Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on Multimedia 789-792. Deng Y, Luo P, Loy CC, Tang X (2014)Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on Multimedia 789-792.
73.
Zurück zum Zitat He Y-L, Zhang X-L, Ao W, Huang JZ (2018) Determining the optimal temperature parameter for Softmax function in reinforcement learning. Appl Soft Comput 70:80–85 He Y-L, Zhang X-L, Ao W, Huang JZ (2018) Determining the optimal temperature parameter for Softmax function in reinforcement learning. Appl Soft Comput 70:80–85
74.
Zurück zum Zitat Lin F, Wu Y, Zhuang Y, Long X, Xu W (2016) Human gender classification: a review. Int J Biomet 8:275–300 Lin F, Wu Y, Zhuang Y, Long X, Xu W (2016) Human gender classification: a review. Int J Biomet 8:275–300
75.
Zurück zum Zitat Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:​1602.​07261.
76.
Zurück zum Zitat Toğaçar M, Ergen B, Cömert Z, Özyurt F (2020) A deep feature learning model for pneumonia detection applying a combination of mRMR feature selection and machine learning models. Irbm 41:212–222 Toğaçar M, Ergen B, Cömert Z, Özyurt F (2020) A deep feature learning model for pneumonia detection applying a combination of mRMR feature selection and machine learning models. Irbm 41:212–222
77.
Zurück zum Zitat Yuan B, Han L, Gu X, Yan H (2021) Multi-deep features fusion for high-resolution remote sensing image scene classification. Neural Comput Appl 33:2047–2063 Yuan B, Han L, Gu X, Yan H (2021) Multi-deep features fusion for high-resolution remote sensing image scene classification. Neural Comput Appl 33:2047–2063
78.
Zurück zum Zitat Cıbuk M, Budak U, Guo Y, Ince MC, Sengur A (2019) Efficient deep features selections and classification for flower species recognition. Measurement 137:7–13 Cıbuk M, Budak U, Guo Y, Ince MC, Sengur A (2019) Efficient deep features selections and classification for flower species recognition. Measurement 137:7–13
79.
Zurück zum Zitat Li S, Wang L, Li J, Yao Y (2021) Image classification algorithm based on improved AlexNet. J Phys: Conf Ser 012051. Li S, Wang L, Li J, Yao Y (2021) Image classification algorithm based on improved AlexNet. J Phys: Conf Ser 012051.
80.
Zurück zum Zitat Xu Z, Sun K, Mao J (2020) Research on ResNet101 network chemical reagent label image classification based on transfer learning. In: 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT) 354–358. Xu Z, Sun K, Mao J (2020) Research on ResNet101 network chemical reagent label image classification based on transfer learning. In: 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT) 354–358.
81.
Zurück zum Zitat Lu T, Han B, Chen L, Yu F, Xue C (2021) A generic intelligent tomato classification system for practical applications using DenseNet-201 with transfer learning. Sci Rep 11:15824 Lu T, Han B, Chen L, Yu F, Xue C (2021) A generic intelligent tomato classification system for practical applications using DenseNet-201 with transfer learning. Sci Rep 11:15824
82.
Zurück zum Zitat Acikgoz H (2022) A novel approach based on integration of convolutional neural networks and deep feature selection for short-term solar radiation forecasting. Appl Energy 305:117912 Acikgoz H (2022) A novel approach based on integration of convolutional neural networks and deep feature selection for short-term solar radiation forecasting. Appl Energy 305:117912
83.
Zurück zum Zitat Zhang M, Su H, Wen J (2021) Classification of flower image based on attention mechanism and multi-loss attention network. Comput Commun 179:307–317 Zhang M, Su H, Wen J (2021) Classification of flower image based on attention mechanism and multi-loss attention network. Comput Commun 179:307–317
84.
Zurück zum Zitat Emmadi SC, Aerra MR, Bantu S (2023) Performance Analysis of VGG-16 Deep Learning Model for COVID-19 Detection using Chest X-Ray Images. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom) 1001-1007. IEEE. Emmadi SC, Aerra MR, Bantu S (2023) Performance Analysis of VGG-16 Deep Learning Model for COVID-19 Detection using Chest X-Ray Images. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom) 1001-1007. IEEE.
85.
Zurück zum Zitat Zhang Q (2022) A novel ResNet101 model based on dense dilated convolution for image classification. SN Appl Sci 4:1–13 Zhang Q (2022) A novel ResNet101 model based on dense dilated convolution for image classification. SN Appl Sci 4:1–13
86.
Zurück zum Zitat Sanghvi HA, Patel RH, Agarwal A, Gupta S, Sawhney V, Pandya AS (2023) A deep learning approach for classification of COVID and pneumonia using DenseNet-201. Int J Imaging Syst Technol 33:18–38 Sanghvi HA, Patel RH, Agarwal A, Gupta S, Sawhney V, Pandya AS (2023) A deep learning approach for classification of COVID and pneumonia using DenseNet-201. Int J Imaging Syst Technol 33:18–38
87.
Zurück zum Zitat Zhao C, Wang X, Wong WK, Zheng W, Yang J, Miao D (2017) Multiple metric learning based on bar-shape descriptor for person re-identification. Pattern Recog Zhao C, Wang X, Wong WK, Zheng W, Yang J, Miao D (2017) Multiple metric learning based on bar-shape descriptor for person re-identification. Pattern Recog
88.
Zurück zum Zitat Geelen CD, Wijnhoven RG, Dubbelman G (2015) Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs. In: Video Surveillance and Transportation Imaging Applications 9407: 170-183. SPIE. Geelen CD, Wijnhoven RG, Dubbelman G (2015) Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs. In: Video Surveillance and Transportation Imaging Applications 9407: 170-183. SPIE.
89.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556.
90.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition 1-9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition 1-9
91.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 770-778. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition 770-778.
Metadaten
Titel
ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
verfasst von
Farhat Abbas
Mussarat Yasmin
Muhammad Fayyaz
Usman Asim
Publikationsdatum
26.09.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01196-2

Weitere Artikel der Ausgabe 4/2023

Pattern Analysis and Applications 4/2023 Zur Ausgabe

Premium Partner