research-article

Robust Visual Object Tracking with Top-down Reasoning

Authors:
Mengdan Zhang

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Jiashi Feng

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Weiming Hu

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 226–234https://doi.org/10.1145/3123266.3123449

Published:19 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 226–234

ABSTRACT

In generic visual tracking, traditional appearance based trackers suffer from distracting factors like bad lighting or major target deformation, etc., as well as insufficiency of training data. In this work, we propose to exploit the category-specific semantics to boost visual object tracking, and develop a new visual tracking model that augments the appearance based tracker with a top-down reasoning component. The continuous feedback from this reasoning component guides the tracker to reliably identify candidate regions with consistent semantics across frames and localize the target object instance more robustly and accurately. Specifically, a generic object recognition model and a semantic activation map method are deployed to provide effective top-down reasoning about object locations for the tracker. In addition, we develop a voting based scheme for the reasoning component to infer the object semantics. Therefore, even without sufficient training data, the tracker can still obtain reliable top-down clues about the objects. Together with the appearance clues, the tracker can localize objects accurately even in presence of various major distracting factors. Extensive evaluations on two large-scale benchmark datasets, OTB2013 and OTB2015, clearly demonstrate that the top-down reasoning substantially enhances the robustness of the tracker and provides state-of-the-art performance.

References

M. Andriluka, S. Roth, and B. Schiele. 2008. People-tracking-by-detection and people-detection-by-tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, and P.HS. Torr. 2016. Fully-convolutional siamese networks for object tracking Proc. of European Conference on Computer Vision. 850--865.Google Scholar
D.S. Bolme, J.R. Beveridge, B.A. Draper, and Y. Lui. 2010. Visual object tracking using adaptive correlation filters Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2544--2550.Google Scholar
Z. Chi, H. Li, H. Lu, and M. Yang. 2017. Dual Deep Network for Visual Tracking. IEEE Trans. on Image Processing (2017). Google ScholarDigital Library
Z. Cui, S. Xiao, J. Feng, and S. Yan. 2016. Recurrently target-attending tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1449--1458.Google Scholar
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 886--893. Google ScholarDigital Library
M. Danelljan, G. Bhat, F. Khan, and M. Felsberg. 2016. ECO: Efficient Convolution Operators for Tracking. arXiv preprint arXiv:1611.09224 (2016).Google Scholar
M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2014. Accurate scale estimation for robust visual tracking Proc. of British Machine Vision Conference.Google Scholar
M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking Proc. of IEEE International Conference on Computer Vision. 4310--4318. Google ScholarDigital Library
M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2016. Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
M. Danelljan, F.S. Khan, M. Felsberg, and J. van de Weijer. 2014. Adaptive color attributes for real-time visual tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1090--1097. Google ScholarDigital Library
M. Danelljan, A. Robinson, F.S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking Proc. of European Conference on Computer Vision. 472--488.Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. Imagenet: A large-scale hierarchical image database Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
A. Emami, F. Dadgostar, A. Bigdeli, and B.C. Lovell. 2012. Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance Proc. of International Conference on Advanced Video and Signal-Based Surveillance. 349--354. Google ScholarDigital Library
J. Gao, T. Zhang, X. Yang, and C. Xu. 2017. Deep Relative Tracking. IEEE Trans. on Image Processing (2017). Google ScholarDigital Library
R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 580--587. Google ScholarDigital Library
S. Hare, A. Saffari, and P.H.S. Torr. 2011. Struck: Structured output tracking with kernels. Proc. of IEEE International Conference on Computer Vision. 263--270. Google ScholarDigital Library
J. Henriques, R. Caseiro, P. Martins, and J. Batista. 2015. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 583--596.Google ScholarDigital Library
S. Hong, T. You, S. Kwak, and B. Han. 2015 b. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. Proc. of International Conference on Machine Learning. 597--606. Google ScholarDigital Library
Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. 2015 a. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 749--758.Google Scholar
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proc. of ACM international conference on Multimedia. 675--678. Google ScholarDigital Library
Z. Kalal, J. Matas, and K. Mikolajczyk. 2010. PN learning: Bootstrapping binary classifiers by structural constraints Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 49--56.Google Scholar
G.H. Kiani, T. Sim, and S. Lucey. 2013. Multi-channel correlation filters. In Proc. of IEEE International Conference on Computer Vision. 3072--3079. Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. of the IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Y. Li and J. Zhu. 2012. A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration Proc. of European Conference on Computer Vision Workshops. 254--265.Google Scholar
L. Liu, J. Xing, H. Ai, and X. Ruan. 2012. Hand posture recognition using finger geometric feature Proc. of IEEE International Conference on Pattern Recognition. 565--568.Google Scholar
C. Ma, J. Huang, X. Yang, and M.-H. Yang. 2015. Hierarchical Convolutional Features for Visual Tracking Proc. of IEEE International Conference on Computer Vision. 3074--3082. Google ScholarDigital Library
H. Nam and B. Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. 4293--4302.Google Scholar
J. Ning, J. Yang, S. Jiang, L. Zhang, and M.H. Yang. 2016. Object tracking via dual linear structured SVM and explicit feature map Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4266--4274.Google Scholar
Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang. 2016. Hedged deep tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4303--4311.Google Scholar
D.A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision Vol. 77, 1--3 (2008), 125--141. Google ScholarDigital Library
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google Scholar
M. Tang and J. Feng. 2015. Multi-kernel correlation filter for visual tracking Proc. of IEEE International Conference on Computer Vision. 3038--3046. Google ScholarDigital Library
R. Tao, E. Gavves, and A. Smeulders. 2016. Siamese instance search for tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1420--1429.Google Scholar
J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus. 2009. Learning Color Names for Real-World Applications. IEEE Trans. on Image Processing Vol. 18, 7 (2009), 1512--1523. Google ScholarDigital Library
D. Wang and H. Lu. 2014. Visual tracking via probability continuous outlier model Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 3478--3485. Google ScholarDigital Library
L. Wang, W. Ouyang, X. Wang, and H. Lu. 2015. Visual tracking with fully convolutional networks. Proc. of IEEE International Conference on Computer Vision. 3119--3127. Google ScholarDigital Library
Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. arXiv preprint arXiv:1703.08448 (2017).Google Scholar
Yunchao Wei, Wei Xia, Min Lin, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2016. HCP: A flexible CNN framework for multi-label image classification. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 9 (2016), 1901--1907.Google Scholar
Y. Wu, J. Lim, and M.-H. Yang. 2013. Online object tracking: A benchmark. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2411--2418. Google ScholarDigital Library
Y. Wu, J. Lim, and M.-H. Yang. 2015. Object tracking benchmark. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.Google ScholarDigital Library
J. Zhang, S. Ma, and S. Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. Proc. of European Conference on Computer Vision. 188--203.Google Scholar
M. Zhang, J. Xing, J. Gao, and W. Hu. 2015. Robust visual tracking using joint scale-spatial correlation filters Proc. of IEEE International Conference on Image Processing. 1468--1472.Google Scholar
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2921--2929.Google Scholar

Index Terms

Robust Visual Object Tracking with Top-down Reasoning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
        Tracking
  2. Machine learning
    1. Machine learning approaches
      1. Instance-based learning
      2. Neural networks

Recommendations

Visual object tracking: A survey
Abstract
Visual object tracking is an important area in computer vision, and many tracking algorithms have been proposed with promising results. Existing object tracking approaches can be categorized into generative trackers, discriminative trackers, and ...
Graphical abstract

Display Omitted
Highlights
- Comprehensive overview of state-of-the-art tracking frameworks and datasets.
- Detailed evaluation conducted on five tracking benchmarks with quantitative and qualitative results.
- Comprehensive summary of trackers with different ...
Read More
Review of recent advances in visual tracking techniques
Abstract
Visual tracking is the widely emerging research in computer vision applications. Nowadays, researchers have proposed various novel tracking methodologies to attain the excellence in terms of performance. In this review, several recent visual ...
Read More
Active stereo tracking of N ≤ 3 targets using line scan cameras

This paper presents a general approach for the simultaneous tracking of multiple moving targets using a generic active stereo setup. The problem is formulated on the plane, where cameras are modeled as "line scan cameras," and targets are described as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
deep learning
visual tracking
Qualifiers
- research-article
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 264
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust Visual Object Tracking with Top-down Reasoning

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Visual object tracking: A survey

Review of recent advances in visual tracking techniques

Active stereo tracking of N ≤ 3 targets using line scan cameras