nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

07.08.2023 | Original Article

Adaptive occlusion hybrid second-order attention network for head pose estimation

verfasst von: Qi Fu, Kai Xie, Chang Wen, Jianbiao He, Wei Zhang, Hongling Tian, Sheng Yang

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Head pose estimation (HPE) is a challenging and critical research subject with a wide range of applications in areas such as driver monitoring, attention recognition, and human-computer interaction. However, there are two challenging problems in HPE, the first one is that in real application scenarios, occlusion is very common, which affects the accuracy of HPE to a great extent. The second is that most research works use Euler angles to represent the head pose, which may lead to problems in neural network optimization. To solve these problems, an adaptive occlusion hybrid second-order attention network model was proposed. First, facial landmarks were detected by the occlusion-aware module to generate heat maps reflecting the presence or absence of occlusion in the specific facial parts, thereby enhancing features in the non-occluded parts of the face and suppressing features in the occluded regions. Meanwhile, we designed a novel second-order information attention module to interact with spatial and channel information using second-order statistical information, such that the model learns the feature correlations of different facial parts while paying more attention to important channels and suppressing redundant ones to further reduce the effect of occlusion and excavate more powerful features. Furthermore, to avoid ambiguity in common head pose representation, we introduced an exponential map to represent the head pose and designed a prediction framework capable of capturing the geometry of the pose space. The results of the experiments showed that the proposed model was competitive with methods using depth information from the BIWI dataset and achieved obvious advantages on the challenging AFLW2000 dataset, with more robust performance under large poses and occlusion interference, and stronger robustness compared with other models.

Vorheriger Artikel MSIF: multi-spectrum image fusion method for cross-modality person re-identification

Nächster Artikel Federated domain generalization for intelligent fault diagnosis based on pseudo-siamese network and robust global model aggregation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

Nur mit Berechtigung zugänglich

Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626. https://doi.org/10.1109/TPAMI.2008.106CrossRef

Wang K, Zhao R, Ji Q (2018) Human Computer Interaction with Head Pose, Eye Gaze and Body Gestures. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp 789-789. https://doi.org/10.1109/FG.2018.00126

Li Y, Li J, Jiang X et al (2019) A Driving Attention Detection Method Based on Head Pose. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 483-490. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00124

Bosch N, Dmello SK (2021) Automatic detection of mind wandering from video in the lab and in the classroom. IEEE Trans Affect Comput 12(4):974–988. https://doi.org/10.1109/TAFFC.2019.2908837CrossRef

Zhuang Z, Tao H, Chen Y et al (2022) An Optimal Iterative Learning Control Approach for Linear Systems With Nonuniform Trial Lengths Under Input Constraints. IEEE Trans on Syst, Man, and Cybern: Syst 1–13. https://doi.org/10.1109/TSMC.2022.3225381

Zhuang Z, Tao H, Chen Y et al (2022) Iterative learning control for repetitive tasks with randomly varying trial lengths using successive projection. Int J Adapt Control Signal Process 36(5):1196–1215. https://doi.org/10.1002/acs.3396MathSciNetCrossRef

Stojanovic V, Nedic N (2016) Robust Kalman filtering for nonlinear multivariable stochastic systems in the presence of non-Gaussian noise. Int J of Robust and Nonlinear Control 26(3):445–460. https://doi.org/10.1002/rnc.3319MathSciNetCrossRef

Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated carp species identification. Aquac Eng 89:102053. https://doi.org/10.1016/j.aquaeng.2020.102053CrossRef

Chen C, Zhang Q, Kashani MH et al (2022) Forecast of rainfall distribution based on fixed sliding window long short-term memory. Eng Appl of Comput Fluid Mech 16(1):248–261. https://doi.org/10.1080/19942060.2021.2009374CrossRef

10.

Afan HA, Ibrahem Ahmed Osman A, Essam Y et al (2021) Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques. Eng Appl of Comput Fluid Mech 15(1):1420–1439. https://doi.org/10.1080/19942060.2021.1974093CrossRef

11.

Chen W, Sharifrazi D, Liang G et al (2022) Accurate discharge coefficient prediction of streamlined weirs by coupling linear regression and deep convolutional gated recurrent unit. Eng Appl of Comput Fluid Mech 16(1):965–976. https://doi.org/10.1080/19942060.2022.2053786CrossRef

12.

Wang W, Du Y, Chau K et al (2021) An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network. Water Resour Manag 35:4695–4726. https://doi.org/10.1007/S11269-021-02920-5CrossRef

13.

Lepetit V, Fua P (2005) Monocular Model-Based 3D Tracking of Rigid Objects: A Survey. Found Trends Comput Graph Vis 1(1):1–89. https://doi.org/10.1561/0600000001CrossRef

14.

Gao S, Wang J, Lu H et al (2020) Pose-Guided Visible Part Matching for Occluded Person Reid. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11741-11749. https://doi.org/10.1109/CVPR42600.2020.01176

15.

Dai T, Cai J, Zhang Y et al (2019) Second-Order Attention Network for Single Image Super-Resolution. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp 11057-11066. https://doi.org/10.1109/CVPR.2019.01132

16.

Hall, B.C (2003) Lie Algebras and the Exponential Mapping. In: Lie Groups, Lie Algebras, and Representations, pp 27-62. https://doi.org/10.1007/978-0-387-21554-9_2

17.

Abate AF, Bisogni C, Castiglione A et al (2022) Head pose estimation: An extensive survey on recent techniques and applications. Pattern Recognit 127:108591. https://doi.org/10.1016/j.patcog.2022.108591CrossRef

18.

Dong X, Yu S, Weng X et al (2018) Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 360-368. https://doi.org/10.1109/CVPR.2018.00045

19.

Dong X, Yu S, Weng X et al (2021) Supervision by Registration and Triangulation for Landmark Detection. IEEE Trans Pattern Anal Mach Intell 43(10):3681–3694. https://doi.org/10.1109/TPAMI.2020.2983935CrossRef

20.

Ranjan R, Patel VM, Chellappa R (2019) Hyperface: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135. https://doi.org/10.1109/TPAMI.2017.2781233CrossRef

21.

Kumar A, Alavi A, Chellappa R (2017) Kepler: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp 258-265. https://doi.org/10.1109/FG.2017.149

22.

Bulat A, Tzimiropoulos G (2017) How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks). In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1021-1030. https://doi.org/10.1109/ICCV.2017.116

23.

Sun Y, Wang X-G, Tang X (2013) Deep Convolutional Network Cascade for Facial Point Detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 3476-3483. https://doi.org/10.1109/CVPR.2013.446

24.

Zhu X, Lei Z, Liu X et al (2016) Face Alignment Across Large Poses: A 3D Solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 146-155. https://doi.org/10.1109/CVPR.2016.23

25.

Guo J, Zhu X, Yang Yet al (2020) Towards Fast, Accurate and Stable 3D Dense Face Alignment. In: Vedaldi A, Bischof H, Brox T, Frahm JM. (eds) Computer Vision - ECCV 2020, Lecture Notes in Computer Science. Springer, Cham, pp 152-168. https://doi.org/10.1007/978-3-030-58529-7_10

26.

Ruiz N, Chong E, Rehg JM (2018) Fine-Grained Head Pose Estimation Without Keypoints. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2074-2083. https://doi.org/10.1109/CVPRW.2018.00281

27.

Yang TY, Chen YT, Lin YY et al (2019) FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1087-1096. https://doi.org/10.1109/CVPR.2019.00118

28.

Zhang H, Wang M, Liu Y et al (2020) FDN: Feature Decoupling Network for Head Pose Estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 34(07): 12789-12796. https://doi.org/10.1609/aaai.v34i07.6974

29.

Dhingra N (2022) LwPosr: Lightweight Efficient Fine Grained Head Pose Estimation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 1495-1505. https://doi.org/10.1109/WACV51458.2022.00127

30.

Dhingra N (2021) HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp 1-8. https://doi.org/10.1109/FG52635.2021.9667080

31.

Xu Y-Q, Jung C, Chang Y (2021) Head pose estimation using deep neural networks and 3D point clouds. Pattern Recognit 121:108210. https://doi.org/10.1016/j.patcog.2021.108210CrossRef

32.

Hu Z, Zhang Y, Xing Y et al (2022) Toward Human-Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker. IEEE Veh Technol Mag 2–9. https://doi.org/10.1109/MVT.2021.3140047

33.

Cao Z, Chu Z, Liu D et al (2021) A Vector-based Representation to Enhance Head Pose Estimation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1188-1197. https://doi.org/10.1109/WACV48630.2021.00123

34.

Liu H, Fang S, Zhang Z et al (2021) MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimed 24:2449–2460. https://doi.org/10.1109/TMM.2021.3081873CrossRef

35.

Hsu H-W, Wu T-Y, Wan S et al (2019) Quatnet: Quaternion-Based Head Pose Estimation with Multiregression Loss. IEEE Trans Multimed 21(4):1035–1046. https://doi.org/10.1109/TMM.2018.2866770CrossRef

36.

Tay NC, Tee C, Ong TS, Teh PS (2019) Abnormal Behavior Recognition using CNN-LSTM with Attention Mechanism. In: 2019 1st International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), pp 1-5. https://doi.org/10.1109/ICECIE47765.2019.8974824

37.

Wang K, Liu M (2022) YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52(2):2070–2091. https://doi.org/10.1007/s10489-021-02491-3CrossRef

38.

Li YX, Wu XR, Li C (2022) A hierarchical conditional random field-based attention mechanism approach for gastric histopathology image classification. Appl Intell 52(9): 9717-9738. https://doi.org/10.1007/s10489-021-02886-2

39.

DING, Z. R (2022) GLPose: Global-Local Attention Network with Feature Interpolation Regularization for Head Pose Estimation of People Wearing Facial Masks. In 33rd British Machine Vision Conference 2022

40.

Zhu X, Yang Q, Zhao L et al (2022) An Improved Tiered Head Pose Estimation Network with Self-Adjust Loss Function. Entropy 24(7):974. https://doi.org/10.3390/e24070974CrossRef

41.

Li Y K, Yu Y Z, Liu Y L, et al (2022) MS-GCN: Multi-Stream Graph Convolution Network for Driver Head Pose Estimation. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp: 3819-3824. https://doi.org/10.1109/ITSC55140.2022.9922277

42.

Li Y, Zeng JB, Shan SG, Chen XL (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439-2450. https://doi.org/10.1109/TIP.2018.2886767

43.

Hu J, Shen L, Sun G et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372CrossRef

44.

Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018, Lecture Notes in Computer Science. Springer Cham, pp 3-19. https://doi.org/10.1007/978-3-030-01234-2_1

45.

Liu H, Nie H, Zhang Z et al (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068CrossRef

46.

Liu T, Wang J, Yang B et al (2021) NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436: 210-220. https://doi.org/10.1016/j.neucom.2020.12.090

47.

Xu LH, Chen JY, Gan YL (2019) Head pose estimation with soft labels using regularized convolutional neural network. Neurocomputing 337:339–353. https://doi.org/10.1016/j.neucom.2018.12.074CrossRef

48.

Lee T (2018) Bayesian attitude estimation with the matrix fisher distribution on SO(3). IEEE Trans Autom Control 63(10):3377–3392. https://doi.org/10.1109/TAC.2018.2797162MathSciNetCrossRef

49.

He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770-778. https://doi.org/10.1109/CVPR.2016.90

50.

Dong X, Yan Y, Ouyang W et al (2018) Style aggregated network for facial landmark detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 379-388. https://doi.org/10.1109/CVPR.2018.00047

51.

Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708-13717. https://doi.org/10.1109/CVPR46437.2021.01350

52.

Richard M. Murray and Zexiang Li and S. Shankar Sastry. A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton, pp 22-34

53.

MacQueen J (1967) Classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp 281-297

54.

Fanelli G, Dantone M, Gall J et al (2013) Random Forests for Real Time 3D Face Analysis. Int J Comput Vis 101(3):437–458. https://doi.org/10.1007/s11263-012-0549-0CrossRef

55.

Sagonas C, Tzimiropoulos G, Zafeiriou S et al (2013) 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In: 2013 IEEE International Conference on Computer Vision Workshops, pp 397-403. https://doi.org/10.1109/ICCVW.2013.59

56.

Zhang KP, Zhang ZP, Li ZF et al (2016) Joint Face Detection and Alignment using Multitask Cascaded Convolutional Networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342CrossRef

57.

Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) International Conference on Learning Representations, San Diego

58.

Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1867-1874. https://doi.org/10.1109/CVPR.2014.241

59.

Xin M, Mo S, Lin Y (2021) EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1462-1471. https://doi.org/10.1109/CVPRW53098.2021.00162

60.

Mukherjee SS, Robertson NM (2015) Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimed 17(11):2094–2107. https://doi.org/10.1109/TMM.2015.2482819CrossRef

61.

Gu JW, Yang XD, Mello SD et al (2017) Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1531-1540. https://doi.org/10.1109/CVPR.2017.167

62.

Martin M, Camp FVD, Stiefelhagen R (2014) Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras. In: 2014 2nd International Conference on 3D Vision, pp 641-648. https://doi.org/10.1109/3DV.2014.54

63.

Wang Q, Wu B, Zhu P et al (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531-11539. https://doi.org/10.1109/CVPR42600.2020.01155

Titel: Adaptive occlusion hybrid second-order attention network for head pose estimation
verfasst von: Qi Fu
Kai Xie
Chang Wen
Jianbiao He
Wei Zhang
Hongling Tian
Sheng Yang
Publikationsdatum: 07.08.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 2/2024
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-023-01933-3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Additiv gefertigte Teile/© Marina_Skoropadskaya | Getty Images | iStock, Warnschild "Land unter"/© Bluedesign / Fotolia, Gardiner von Trapp/© Alpega Group, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 2/2024

Multiple-model and time-sensitive dynamic active learning for recurrent graph convolutional network model extraction attacks

A novel data-free continual learning method with contrastive reversion

Non-local tensor sparse representation and tensor low rank regularization for dynamic MRI reconstruction

Three-way conflict analysis with similarity degree on an issue set

An improved deep network-based RGB-D semantic segmentation method for indoor scenes

Feature selection based on probability and mathematical expectation

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.