Skip to main content
Erschienen in: The Journal of Supercomputing 6/2022

07.01.2022

Driver attention prediction based on convolution and transformers

verfasst von: Chao Gou, Yuchen Zhou, Dan Li

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, studying how drivers allocate their attention while driving is critical in achieving human-like cognitive ability for autonomous vehicles. And it has been an active topic in the community of human–machine augmented intelligence for self-driving. However, existing state-of-the-art methods for driver attention prediction are mainly built upon convolutional neural network (CNN) with local receptive field which has a limitation to capture the long-range dependencies. In this work, we propose a novel Attention prediction method based on CNN and Transformer which is termed as ACT-Net. In particular, CNN and Transformer are combined as a block which is further stacked to form the deep model. Through this design, both local and long-range dependencies are captured that both are crucial for driver attention prediction. Exhaustive comparison experiments over other state-of-the-art techniques conducted on widely used dataset of BDD-A and private collected data on BDD-X validate the effectiveness of the proposed ACT-Net.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Nanning Z, Liu Z, Pengju R, Ma Y, Chen ST, Yu S, Xue J, Chen B, Wang F (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179CrossRef Nanning Z, Liu Z, Pengju R, Ma Y, Chen ST, Yu S, Xue J, Chen B, Wang F (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179CrossRef
2.
Zurück zum Zitat A Tawari, B Kang (2017) A computational framework for drivers visual attention using a fully convolutional architecture. IEEE Intelligent Vehicles Symposium (IV), pp. 887–894 A Tawari, B Kang (2017) A computational framework for drivers visual attention using a fully convolutional architecture. IEEE Intelligent Vehicles Symposium (IV), pp. 887–894
3.
Zurück zum Zitat Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733CrossRef Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733CrossRef
4.
Zurück zum Zitat Y Xia, J Kim, J Canny, K Zipser, T Canas-Bajo, D Whitney (2020) Periphery-fovea multi-resolution driving model guided by human attention. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1767–1775 Y Xia, J Kim, J Canny, K Zipser, T Canas-Bajo, D Whitney (2020) Periphery-fovea multi-resolution driving model guided by human attention. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1767–1775
5.
Zurück zum Zitat A Pal, Mondal S, Christensen H (2020) looking at the right stuff guided semantic-gaze for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11880–11889 A Pal, Mondal S, Christensen H (2020) looking at the right stuff guided semantic-gaze for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11880–11889
6.
Zurück zum Zitat A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, L Kaiser, Illia Polosukhin (2017) Attention is all you need. ArXiv, abs/1706.03762 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, L Kaiser, Illia Polosukhin (2017) Attention is all you need. ArXiv, abs/1706.03762
7.
Zurück zum Zitat Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, Zhang L. (2020)Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. ArXiv, abs/2012.15840 Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, Zhang L. (2020)Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. ArXiv, abs/2012.15840
8.
Zurück zum Zitat Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. ArXiv, abs/2010.11929 Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. ArXiv, abs/2010.11929
9.
Zurück zum Zitat Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. ArXiv, abs/2103.00112 Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. ArXiv, abs/2103.00112
10.
Zurück zum Zitat Deng T, Yan H, Qin L, Ngo T, Manjunath BS (2020) How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks. IEEE Trans Intell Transp Syst 21(5):2146–2154CrossRef Deng T, Yan H, Qin L, Ngo T, Manjunath BS (2020) How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks. IEEE Trans Intell Transp Syst 21(5):2146–2154CrossRef
11.
Zurück zum Zitat Fang J, Yan D, Qiao J, Xue J, Yu H (2021) Dada: driver attention prediction in driving accident scenarios. In: IEEE Transactions on Intelligent Transportation Systems, pp. 1–13 Fang J, Yan D, Qiao J, Xue J, Yu H (2021) Dada: driver attention prediction in driving accident scenarios. In: IEEE Transactions on Intelligent Transportation Systems, pp. 1–13
12.
Zurück zum Zitat Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229
13.
Zurück zum Zitat Yan H, Li Z, Li W, Wang C, Wu M, Zhang C (2021) Contnet: Why not use convolution and transformer at the same time? ArXiv, abs/2104.13497 Yan H, Li Z, Li W, Wang C, Wu M, Zhang C (2021) Contnet: Why not use convolution and transformer at the same time? ArXiv, abs/2104.13497
14.
Zurück zum Zitat Yang G, Tang H, Ding M, Sebe N, Ricci E (2021)Transformers solve the limited receptive field for monocular depth prediction. ArXiv, abs/2103.12091 Yang G, Tang H, Ding M, Sebe N, Ricci E (2021)Transformers solve the limited receptive field for monocular depth prediction. ArXiv, abs/2103.12091
15.
Zurück zum Zitat Xia Y, Zhang D, Kim J, Nakayama K, Zipser K, Whitney D (2018)Predicting driver attention in critical situations. In: Asian conference on computer vision, pp. 658–674. Springer Xia Y, Zhang D, Kim J, Nakayama K, Zipser K, Whitney D (2018)Predicting driver attention in critical situations. In: Asian conference on computer vision, pp. 658–674. Springer
16.
Zurück zum Zitat Kim J, Rohrbach A, Darrell T, Canny J, Akata Z (2018) Textual explanations for self-driving vehicles. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 577–593, Kim J, Rohrbach A, Darrell T, Canny J, Akata Z (2018) Textual explanations for self-driving vehicles. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 577–593,
17.
Zurück zum Zitat Moran J, Desimone R (1985) Selective attention gates visual processing in the extrastriate cortex. Science 229(4715):782–4CrossRef Moran J, Desimone R (1985) Selective attention gates visual processing in the extrastriate cortex. Science 229(4715):782–4CrossRef
18.
Zurück zum Zitat Alaparthi S, Mishra M (2020) Bidirectional encoder representations from transformers (bert): a sentiment analysis odyssey. ArXiv, abs/2007.01127 Alaparthi S, Mishra M (2020) Bidirectional encoder representations from transformers (bert): a sentiment analysis odyssey. ArXiv, abs/2007.01127
19.
Zurück zum Zitat Prakash A, Chitta K, Geiger A (2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7077–7087 Prakash A, Chitta K, Geiger A (2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7077–7087
21.
Zurück zum Zitat Sheng H, Cai S, Liu Y, Deng B, Huang J, Hua XS, Zhao MJ (2021) Improving 3d object detection with channel-wise transformer. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.2743–2752 Sheng H, Cai S, Liu Y, Deng B, Huang J, Hua XS, Zhao MJ (2021) Improving 3d object detection with channel-wise transformer. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.2743–2752
22.
Zurück zum Zitat Morando A, Victor T, Dozza M (2019) A reference model for driver attention in automation: Glance behavior changes during lateral and longitudinal assistance. IEEE Trans Intell Transp Syst 20:2999–3009CrossRef Morando A, Victor T, Dozza M (2019) A reference model for driver attention in automation: Glance behavior changes during lateral and longitudinal assistance. IEEE Trans Intell Transp Syst 20:2999–3009CrossRef
23.
Zurück zum Zitat Fang J, Yan D, Qiao J, Xue J, Wang H, Li S (2019) Dada-2000: can driving accident be predicted by driver attention analyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 4303–4309. IEEE Fang J, Yan D, Qiao J, Xue J, Wang H, Li S (2019) Dada-2000: can driving accident be predicted by driver attention analyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 4303–4309. IEEE
24.
Zurück zum Zitat Lv K, Sheng H, Xiong Z, Li W, Zheng L (2020) Improving driver gaze prediction with reinforced attention. IEEE Trans Multimed 23:4198–4207CrossRef Lv K, Sheng H, Xiong Z, Li W, Zheng L (2020) Improving driver gaze prediction with reinforced attention. IEEE Trans Multimed 23:4198–4207CrossRef
25.
Zurück zum Zitat Deng T, Yan H, Li YJ (2018) Learning to boost bottom-up fixation prediction in driving environments via random forest. IEEE Trans Intell Transp Syst 19(9):3059–3067CrossRef Deng T, Yan H, Li YJ (2018) Learning to boost bottom-up fixation prediction in driving environments via random forest. IEEE Trans Intell Transp Syst 19(9):3059–3067CrossRef
26.
Zurück zum Zitat Tawari A, Mallela P, Martin S (2018) Learning to attend to salient targets in driving videos using fully convolutional rnn. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3225–3232. IEEE Tawari A, Mallela P, Martin S (2018) Learning to attend to salient targets in driving videos using fully convolutional rnn. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3225–3232. IEEE
28.
Zurück zum Zitat Shirpour M, Beauchemin S, Bauer M (2021) Driver’s eye fixation prediction by deep neural network. In: VISIGRAPP (4: VISAPP), pp. 67–75 Shirpour M, Beauchemin S, Bauer M (2021) Driver’s eye fixation prediction by deep neural network. In: VISIGRAPP (4: VISAPP), pp. 67–75
29.
Zurück zum Zitat Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848CrossRef Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848CrossRef
30.
Zurück zum Zitat Xu H, Gao Y, Yu F, Darrell T (2017) End-to-end learning of driving models from large-scale video datasets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3530–3538 Xu H, Gao Y, Yu F, Darrell T (2017) End-to-end learning of driving models from large-scale video datasets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3530–3538
31.
Zurück zum Zitat Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. ArXiv, abs/1805.04687 Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. ArXiv, abs/1805.04687
32.
Zurück zum Zitat Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271
33.
Zurück zum Zitat Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav Res Methods 45:251–266CrossRef Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav Res Methods 45:251–266CrossRef
34.
Zurück zum Zitat Wang W, Shen J, Xie J, Cheng MM, Ling H, Borji A (2021) Revisiting video saliency prediction in the deep learning era. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 43:220–237 Wang W, Shen J, Xie J, Cheng MM, Ling H, Borji A (2021) Revisiting video saliency prediction in the deep learning era. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 43:220–237
35.
Zurück zum Zitat Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef
36.
Zurück zum Zitat Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Neural Information Processing Systems (NIPS), pp. 545–552 Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Neural Information Processing Systems (NIPS), pp. 545–552
37.
Zurück zum Zitat Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 262–270 Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 262–270
Metadaten
Titel
Driver attention prediction based on convolution and transformers
verfasst von
Chao Gou
Yuchen Zhou
Dan Li
Publikationsdatum
07.01.2022
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 6/2022
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-04151-2

Weitere Artikel der Ausgabe 6/2022

The Journal of Supercomputing 6/2022 Zur Ausgabe

Premium Partner