ABSTRACT
In generic visual tracking, traditional appearance based trackers suffer from distracting factors like bad lighting or major target deformation, etc., as well as insufficiency of training data. In this work, we propose to exploit the category-specific semantics to boost visual object tracking, and develop a new visual tracking model that augments the appearance based tracker with a top-down reasoning component. The continuous feedback from this reasoning component guides the tracker to reliably identify candidate regions with consistent semantics across frames and localize the target object instance more robustly and accurately. Specifically, a generic object recognition model and a semantic activation map method are deployed to provide effective top-down reasoning about object locations for the tracker. In addition, we develop a voting based scheme for the reasoning component to infer the object semantics. Therefore, even without sufficient training data, the tracker can still obtain reliable top-down clues about the objects. Together with the appearance clues, the tracker can localize objects accurately even in presence of various major distracting factors. Extensive evaluations on two large-scale benchmark datasets, OTB2013 and OTB2015, clearly demonstrate that the top-down reasoning substantially enhances the robustness of the tracker and provides state-of-the-art performance.
- M. Andriluka, S. Roth, and B. Schiele. 2008. People-tracking-by-detection and people-detection-by-tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, and P.HS. Torr. 2016. Fully-convolutional siamese networks for object tracking Proc. of European Conference on Computer Vision. 850--865.Google Scholar
- D.S. Bolme, J.R. Beveridge, B.A. Draper, and Y. Lui. 2010. Visual object tracking using adaptive correlation filters Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2544--2550.Google Scholar
- Z. Chi, H. Li, H. Lu, and M. Yang. 2017. Dual Deep Network for Visual Tracking. IEEE Trans. on Image Processing (2017). Google ScholarDigital Library
- Z. Cui, S. Xiao, J. Feng, and S. Yan. 2016. Recurrently target-attending tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1449--1458.Google Scholar
- N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 886--893. Google ScholarDigital Library
- M. Danelljan, G. Bhat, F. Khan, and M. Felsberg. 2016. ECO: Efficient Convolution Operators for Tracking. arXiv preprint arXiv:1611.09224 (2016).Google Scholar
- M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2014. Accurate scale estimation for robust visual tracking Proc. of British Machine Vision Conference.Google Scholar
- M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking Proc. of IEEE International Conference on Computer Vision. 4310--4318. Google ScholarDigital Library
- M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2016. Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- M. Danelljan, F.S. Khan, M. Felsberg, and J. van de Weijer. 2014. Adaptive color attributes for real-time visual tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1090--1097. Google ScholarDigital Library
- M. Danelljan, A. Robinson, F.S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking Proc. of European Conference on Computer Vision. 472--488.Google Scholar
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. Imagenet: A large-scale hierarchical image database Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
- A. Emami, F. Dadgostar, A. Bigdeli, and B.C. Lovell. 2012. Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance Proc. of International Conference on Advanced Video and Signal-Based Surveillance. 349--354. Google ScholarDigital Library
- J. Gao, T. Zhang, X. Yang, and C. Xu. 2017. Deep Relative Tracking. IEEE Trans. on Image Processing (2017). Google ScholarDigital Library
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 580--587. Google ScholarDigital Library
- S. Hare, A. Saffari, and P.H.S. Torr. 2011. Struck: Structured output tracking with kernels. Proc. of IEEE International Conference on Computer Vision. 263--270. Google ScholarDigital Library
- J. Henriques, R. Caseiro, P. Martins, and J. Batista. 2015. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 583--596.Google ScholarDigital Library
- S. Hong, T. You, S. Kwak, and B. Han. 2015 b. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. Proc. of International Conference on Machine Learning. 597--606. Google ScholarDigital Library
- Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. 2015 a. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 749--758.Google Scholar
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proc. of ACM international conference on Multimedia. 675--678. Google ScholarDigital Library
- Z. Kalal, J. Matas, and K. Mikolajczyk. 2010. PN learning: Bootstrapping binary classifiers by structural constraints Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 49--56.Google Scholar
- G.H. Kiani, T. Sim, and S. Lucey. 2013. Multi-channel correlation filters. In Proc. of IEEE International Conference on Computer Vision. 3072--3079. Google ScholarDigital Library
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. of the IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Y. Li and J. Zhu. 2012. A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration Proc. of European Conference on Computer Vision Workshops. 254--265.Google Scholar
- L. Liu, J. Xing, H. Ai, and X. Ruan. 2012. Hand posture recognition using finger geometric feature Proc. of IEEE International Conference on Pattern Recognition. 565--568.Google Scholar
- C. Ma, J. Huang, X. Yang, and M.-H. Yang. 2015. Hierarchical Convolutional Features for Visual Tracking Proc. of IEEE International Conference on Computer Vision. 3074--3082. Google ScholarDigital Library
- H. Nam and B. Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. 4293--4302.Google Scholar
- J. Ning, J. Yang, S. Jiang, L. Zhang, and M.H. Yang. 2016. Object tracking via dual linear structured SVM and explicit feature map Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4266--4274.Google Scholar
- Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang. 2016. Hedged deep tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4303--4311.Google Scholar
- D.A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision Vol. 77, 1--3 (2008), 125--141. Google ScholarDigital Library
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google Scholar
- M. Tang and J. Feng. 2015. Multi-kernel correlation filter for visual tracking Proc. of IEEE International Conference on Computer Vision. 3038--3046. Google ScholarDigital Library
- R. Tao, E. Gavves, and A. Smeulders. 2016. Siamese instance search for tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1420--1429.Google Scholar
- J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus. 2009. Learning Color Names for Real-World Applications. IEEE Trans. on Image Processing Vol. 18, 7 (2009), 1512--1523. Google ScholarDigital Library
- D. Wang and H. Lu. 2014. Visual tracking via probability continuous outlier model Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 3478--3485. Google ScholarDigital Library
- L. Wang, W. Ouyang, X. Wang, and H. Lu. 2015. Visual tracking with fully convolutional networks. Proc. of IEEE International Conference on Computer Vision. 3119--3127. Google ScholarDigital Library
- Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. arXiv preprint arXiv:1703.08448 (2017).Google Scholar
- Yunchao Wei, Wei Xia, Min Lin, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2016. HCP: A flexible CNN framework for multi-label image classification. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 9 (2016), 1901--1907.Google Scholar
- Y. Wu, J. Lim, and M.-H. Yang. 2013. Online object tracking: A benchmark. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2411--2418. Google ScholarDigital Library
- Y. Wu, J. Lim, and M.-H. Yang. 2015. Object tracking benchmark. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.Google ScholarDigital Library
- J. Zhang, S. Ma, and S. Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. Proc. of European Conference on Computer Vision. 188--203.Google Scholar
- M. Zhang, J. Xing, J. Gao, and W. Hu. 2015. Robust visual tracking using joint scale-spatial correlation filters Proc. of IEEE International Conference on Image Processing. 1468--1472.Google Scholar
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2921--2929.Google Scholar
Index Terms
- Robust Visual Object Tracking with Top-down Reasoning
Recommendations
Visual object tracking: A survey
AbstractVisual object tracking is an important area in computer vision, and many tracking algorithms have been proposed with promising results. Existing object tracking approaches can be categorized into generative trackers, discriminative trackers, and ...
Graphical abstractDisplay Omitted
Highlights- Comprehensive overview of state-of-the-art tracking frameworks and datasets.
- Detailed evaluation conducted on five tracking benchmarks with quantitative and qualitative results.
- Comprehensive summary of trackers with different ...
Review of recent advances in visual tracking techniques
AbstractVisual tracking is the widely emerging research in computer vision applications. Nowadays, researchers have proposed various novel tracking methodologies to attain the excellence in terms of performance. In this review, several recent visual ...
Active stereo tracking of N ≤ 3 targets using line scan cameras
This paper presents a general approach for the simultaneous tracking of multiple moving targets using a generic active stereo setup. The problem is formulated on the plane, where cameras are modeled as "line scan cameras," and targets are described as ...
Comments