Skip to main content
Erschienen in: Neural Computing and Applications 18/2020

21.05.2019 | Extreme Learning Machine and Deep Learning Networks

Hierarchical attentive Siamese network for real-time visual tracking

verfasst von: Kang Yang, Huihui Song, Kaihua Zhang, Qingshan Liu

Erschienen in: Neural Computing and Applications | Ausgabe 18/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Visual tracking is a fundamental and highly useful component in various tasks of computer vision. Recently, end-to-end off-line training Siamese networks have demonstrated great success in visual tracking with high performance in terms of speed and accuracy. However, Siamese trackers usually employ visual features from the last simple convolutional layers to represent the targets while ignoring the fact that features from different layers characterize different representation capabilities of the targets, and hence this may degrade tracking performance in the presence of severe deformation and occlusion. In this paper, we present a novel hierarchical attentive Siamese (HASiam) network for high-performance visual tracking, which exploits different kinds of attention mechanisms to effectively fuse a series of attentional features from different layers. More specifically, we combine a deeper network with a shallow one to take full advantage of the features from different layers and apply spatial and channel-wise attentions on different layers to better capture visual attentions on multi-level semantic abstractions, which is helpful to enhance the discriminative capacity of the model. Furthermore, the top-layer feature maps have low resolution that may affect localization accuracy if each feature is treated independently. To address this issue, a non-local attention module is also adopted on the top layer to force the network to pay more attention to the structural dependency of features at all locations during off-line training. The proposed HASiam is trained off-line in an end-to-end manner and needs no online updating the network parameters during tracking. Extensive evaluations demonstrate that our HASiam has achieved favorable results with AUC scores of \(64.6\%\), \(62.8\%\) and EAO scores of 0.227 while having a speed of 60 fps on the OTB2013, OTB100 and VOT2017 real-time experiments, respectively. Our tracker with high accuracy and real-time speed can be applied to numerous vision applications like visual surveillance systems, robotics and augmented reality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arulampalam MS, Maskell S, Gordon N, Clapp T (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 50(2):174–188CrossRef Arulampalam MS, Maskell S, Gordon N, Clapp T (2002) A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans Signal Process 50(2):174–188CrossRef
2.
Zurück zum Zitat Tavares JMRS, Padilha A (1995) Matching lines in image sequences with geometric constraints. In: RecPad’95-7th Portuguese conference on pattern recognition Tavares JMRS, Padilha A (1995) Matching lines in image sequences with geometric constraints. In: RecPad’95-7th Portuguese conference on pattern recognition
3.
Zurück zum Zitat Pinho RR, Tavares JMRS, Correia MV (2007) An improved management model for tracking missing features in computer vision long image sequences. WSEAS Trans Inf Sci Appl 1:196–203 Pinho RR, Tavares JMRS, Correia MV (2007) An improved management model for tracking missing features in computer vision long image sequences. WSEAS Trans Inf Sci Appl 1:196–203
4.
Zurück zum Zitat Pinho RR, Correia MV et al (2005) A movement tracking management model with Kalman filtering, global optimization techniques and mahalanobis distance. Adv Comput Methods Sci Eng 4 A & 4 B:100–104 Pinho RR, Correia MV et al (2005) A movement tracking management model with Kalman filtering, global optimization techniques and mahalanobis distance. Adv Comput Methods Sci Eng 4 A & 4 B:100–104
5.
Zurück zum Zitat Pinho RR, Tavares JMRS (2009) Tracking features in image sequences with kalman filtering, global optimization, mahalanobis distance and a management model. Comput Model Eng Sci 6:51–75MATH Pinho RR, Tavares JMRS (2009) Tracking features in image sequences with kalman filtering, global optimization, mahalanobis distance and a management model. Comput Model Eng Sci 6:51–75MATH
6.
Zurück zum Zitat Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848CrossRef Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848CrossRef
7.
Zurück zum Zitat Lei J, Li GH, Tu S, Guo Q (2014) Convolutional restricted Boltzmann machines learning for robust visual tracking. Neural Comput Appl 25(6):1383–1391CrossRef Lei J, Li GH, Tu S, Guo Q (2014) Convolutional restricted Boltzmann machines learning for robust visual tracking. Neural Comput Appl 25(6):1383–1391CrossRef
8.
Zurück zum Zitat Sun S, An Z, Jiang X, Zhang B, Zhang J (2019) Robust object tracking with the inverse relocation strategy. Neural Comput Appl 31:123–132CrossRef Sun S, An Z, Jiang X, Zhang B, Zhang J (2019) Robust object tracking with the inverse relocation strategy. Neural Comput Appl 31:123–132CrossRef
9.
Zurück zum Zitat Almomani R, Dong M, Zhu D (2017) Object tracking via Dirichlet process-based appearance models. Neural Comput Appl 28(5):867–879CrossRef Almomani R, Dong M, Zhu D (2017) Object tracking via Dirichlet process-based appearance models. Neural Comput Appl 28(5):867–879CrossRef
10.
Zurück zum Zitat Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M et al (2017) Eco: efficient convolution operators for tracking. In: CVPR, vol 1, p 3 Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M et al (2017) Eco: efficient convolution operators for tracking. In: CVPR, vol 1, p 3
11.
Zurück zum Zitat Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302 Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
12.
Zurück zum Zitat Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. arXiv preprint arXiv:1606.09549 Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. arXiv preprint arXiv:​1606.​09549
13.
Zurück zum Zitat Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429 Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429
14.
Zurück zum Zitat Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5000–5008 Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5000–5008
15.
Zurück zum Zitat Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, pp 749–765 Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, pp 749–765
16.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
17.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems. NIPSs Foundation, Inc., Lake Tahoe, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems. NIPSs Foundation, Inc., Lake Tahoe, pp 1097–1105
18.
Zurück zum Zitat Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRef
19.
Zurück zum Zitat Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13(11):4700–4719CrossRef Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13(11):4700–4719CrossRef
20.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
21.
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. NIPSs Foundation, Inc., Palai, Montreal CANADA, pp 91–99 Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. NIPSs Foundation, Inc., Palai, Montreal CANADA, pp 91–99
22.
Zurück zum Zitat Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRef
23.
Zurück zum Zitat Pławiak P, Rzecki K (2015) Approximation of phenol concentration using computational intelligence methods based on signals from the metal-oxide sensor array. IEEE Sens J 15(3):1770–1783 Pławiak P, Rzecki K (2015) Approximation of phenol concentration using computational intelligence methods based on signals from the metal-oxide sensor array. IEEE Sens J 15(3):1770–1783
24.
Zurück zum Zitat Pławiak P, Maziarz W (2014) Classification of tea specimens using novel hybrid artificial intelligence methods. Sens Actuators B Chem 192:117–125CrossRef Pławiak P, Maziarz W (2014) Classification of tea specimens using novel hybrid artificial intelligence methods. Sens Actuators B Chem 192:117–125CrossRef
25.
Zurück zum Zitat Yıldırım Ö, Pławiak P, Tan R-S, Acharya UR (2018) Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput Biol Med 102:411–420CrossRef Yıldırım Ö, Pławiak P, Tan R-S, Acharya UR (2018) Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput Biol Med 102:411–420CrossRef
26.
Zurück zum Zitat Pławiak P, Acharya UR (2019) Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals. Neural Comput Appl 5:1–25 Pławiak P, Acharya UR (2019) Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals. Neural Comput Appl 5:1–25
27.
Zurück zum Zitat Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic Siamese network for visual object tracking. In: The IEEE international conference on computer vision (ICCV), Oct 2017 Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic Siamese network for visual object tracking. In: The IEEE international conference on computer vision (ICCV), Oct 2017
28.
Zurück zum Zitat Rensink RA (2000) The dynamic representation of scenes. Vis Cogn 7(1–3):17–42CrossRef Rensink RA (2000) The dynamic representation of scenes. Vis Cogn 7(1–3):17–42CrossRef
29.
Zurück zum Zitat Choi J, Jin Chang H, Jeong J, Demiris Y, Young Choi J (2016) Visual tracking using attention-modulated disintegration and integration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4321–4330 Choi J, Jin Chang H, Jeong J, Demiris Y, Young Choi J (2016) Visual tracking using attention-modulated disintegration and integration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4321–4330
30.
Zurück zum Zitat Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J et al (2017) Attentional correlation filter network for adaptive visual tracking. In: CVPR, vol 2, p 7 Choi J, Jin Chang H, Yun S, Fischer T, Demiris Y, Young Choi J et al (2017) Attentional correlation filter network for adaptive visual tracking. In: CVPR, vol 2, p 7
31.
Zurück zum Zitat Kosiorek A, Bewley A, Posner I (2017) Hierarchical attentive recurrent tracking. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems. NIPS Foundation, Inc., Long Beach, pp 3053–3061 Kosiorek A, Bewley A, Posner I (2017) Hierarchical attentive recurrent tracking. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems. NIPS Foundation, Inc., Long Beach, pp 3053–3061
32.
Zurück zum Zitat Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4854–4863 Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional Siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4854–4863
34.
Zurück zum Zitat Zhu Z, Wei W, Zou W, Yan J (2017) End-to-end flow correlation tracking with spatial-temporal attention. Illumination 42:20 Zhu Z, Wei W, Zou W, Yan J (2017) End-to-end flow correlation tracking with spatial-temporal attention. Illumination 42:20
35.
Zurück zum Zitat Woo S, Park J, Lee J-Y, Kweon I S (2018) Cbam: convolutional block attention module. In: Proceedings of European conference on computer vision Woo S, Park J, Lee J-Y, Kweon I S (2018) Cbam: convolutional block attention module. In: Proceedings of European conference on computer vision
36.
Zurück zum Zitat Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured Siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 351–366 Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured Siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 351–366
37.
Zurück zum Zitat Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR) Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR)
38.
39.
Zurück zum Zitat Song Y, Ma C, Gong L, Zhang J, Lau RWH, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 2574–2583 Song Y, Ma C, Gong L, Zhang J, Lau RWH, Yang M-H (2017) Crest: convolutional residual learning for visual tracking. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 2574–2583
40.
Zurück zum Zitat Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318 Danelljan M, Hager G, Shahbaz Khan F, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 4310–4318
41.
Zurück zum Zitat Lukežič A, Vojíř T, Čehovin L, Matas J, Kristan M (2016) Discriminative correlation filter with channel and spatial reliability. arXiv preprint arXiv:1611.08461 Lukežič A, Vojíř T, Čehovin L, Matas J, Kristan M (2016) Discriminative correlation filter with channel and spatial reliability. arXiv preprint arXiv:​1611.​08461
42.
Zurück zum Zitat Martín A, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283 Martín A, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
43.
Zurück zum Zitat Wu Yi, Lim Jongwoo, Yang Ming-Hsuan (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2411–2418 Wu Yi, Lim Jongwoo, Yang Ming-Hsuan (2013) Online object tracking: A benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2411–2418
44.
Zurück zum Zitat Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc L, Vojir T, Häger G, Lukežič A, Eldesokey A, Fernandez G (2017) The visual object tracking vot2017 challenge results. In: IEEE international conference on computer vision (ICCV) Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc L, Vojir T, Häger G, Lukežič A, Eldesokey A, Fernandez G (2017) The visual object tracking vot2017 challenge results. In: IEEE international conference on computer vision (ICCV)
45.
Zurück zum Zitat Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409 Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr PHS (2016) Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409
46.
Zurück zum Zitat Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1–5, 2014. BMVA Press Danelljan M, Häger G, Khan F, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: British machine vision conference, Nottingham, September 1–5, 2014. BMVA Press
47.
Zurück zum Zitat Wang Q, Gao J, Xing J, Zhang M, Hu W (2017) Dcfnet: discriminant correlation filters network for visual tracking. arXiv preprint arXiv:1704.04057 Wang Q, Gao J, Xing J, Zhang M, Hu W (2017) Dcfnet: discriminant correlation filters network for visual tracking. arXiv preprint arXiv:​1704.​04057
48.
Zurück zum Zitat Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Metadaten
Titel
Hierarchical attentive Siamese network for real-time visual tracking
verfasst von
Kang Yang
Huihui Song
Kaihua Zhang
Qingshan Liu
Publikationsdatum
21.05.2019
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 18/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-019-04238-1

Weitere Artikel der Ausgabe 18/2020

Neural Computing and Applications 18/2020 Zur Ausgabe

Premium Partner