skip to main content
10.1145/3123266.3123449acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Robust Visual Object Tracking with Top-down Reasoning

Authors Info & Claims
Published:19 October 2017Publication History

ABSTRACT

In generic visual tracking, traditional appearance based trackers suffer from distracting factors like bad lighting or major target deformation, etc., as well as insufficiency of training data. In this work, we propose to exploit the category-specific semantics to boost visual object tracking, and develop a new visual tracking model that augments the appearance based tracker with a top-down reasoning component. The continuous feedback from this reasoning component guides the tracker to reliably identify candidate regions with consistent semantics across frames and localize the target object instance more robustly and accurately. Specifically, a generic object recognition model and a semantic activation map method are deployed to provide effective top-down reasoning about object locations for the tracker. In addition, we develop a voting based scheme for the reasoning component to infer the object semantics. Therefore, even without sufficient training data, the tracker can still obtain reliable top-down clues about the objects. Together with the appearance clues, the tracker can localize objects accurately even in presence of various major distracting factors. Extensive evaluations on two large-scale benchmark datasets, OTB2013 and OTB2015, clearly demonstrate that the top-down reasoning substantially enhances the robustness of the tracker and provides state-of-the-art performance.

References

  1. M. Andriluka, S. Roth, and B. Schiele. 2008. People-tracking-by-detection and people-detection-by-tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  2. L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, and P.HS. Torr. 2016. Fully-convolutional siamese networks for object tracking Proc. of European Conference on Computer Vision. 850--865.Google ScholarGoogle Scholar
  3. D.S. Bolme, J.R. Beveridge, B.A. Draper, and Y. Lui. 2010. Visual object tracking using adaptive correlation filters Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2544--2550.Google ScholarGoogle Scholar
  4. Z. Chi, H. Li, H. Lu, and M. Yang. 2017. Dual Deep Network for Visual Tracking. IEEE Trans. on Image Processing (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Cui, S. Xiao, J. Feng, and S. Yan. 2016. Recurrently target-attending tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1449--1458.Google ScholarGoogle Scholar
  6. N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Danelljan, G. Bhat, F. Khan, and M. Felsberg. 2016. ECO: Efficient Convolution Operators for Tracking. arXiv preprint arXiv:1611.09224 (2016).Google ScholarGoogle Scholar
  8. M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2014. Accurate scale estimation for robust visual tracking Proc. of British Machine Vision Conference.Google ScholarGoogle Scholar
  9. M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking Proc. of IEEE International Conference on Computer Vision. 4310--4318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Danelljan, G. H"ager, F.S. Khan, and M. Felsberg. 2016. Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  11. M. Danelljan, F.S. Khan, M. Felsberg, and J. van de Weijer. 2014. Adaptive color attributes for real-time visual tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1090--1097. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Danelljan, A. Robinson, F.S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking Proc. of European Conference on Computer Vision. 472--488.Google ScholarGoogle Scholar
  13. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. Imagenet: A large-scale hierarchical image database Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle Scholar
  14. A. Emami, F. Dadgostar, A. Bigdeli, and B.C. Lovell. 2012. Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance Proc. of International Conference on Advanced Video and Signal-Based Surveillance. 349--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gao, T. Zhang, X. Yang, and C. Xu. 2017. Deep Relative Tracking. IEEE Trans. on Image Processing (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Hare, A. Saffari, and P.H.S. Torr. 2011. Struck: Structured output tracking with kernels. Proc. of IEEE International Conference on Computer Vision. 263--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Henriques, R. Caseiro, P. Martins, and J. Batista. 2015. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 583--596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Hong, T. You, S. Kwak, and B. Han. 2015 b. Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network. Proc. of International Conference on Machine Learning. 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. 2015 a. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 749--758.Google ScholarGoogle Scholar
  21. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proc. of ACM international conference on Multimedia. 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Kalal, J. Matas, and K. Mikolajczyk. 2010. PN learning: Bootstrapping binary classifiers by structural constraints Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 49--56.Google ScholarGoogle Scholar
  23. G.H. Kiani, T. Sim, and S. Lucey. 2013. Multi-channel correlation filters. In Proc. of IEEE International Conference on Computer Vision. 3072--3079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. of the IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Li and J. Zhu. 2012. A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration Proc. of European Conference on Computer Vision Workshops. 254--265.Google ScholarGoogle Scholar
  26. L. Liu, J. Xing, H. Ai, and X. Ruan. 2012. Hand posture recognition using finger geometric feature Proc. of IEEE International Conference on Pattern Recognition. 565--568.Google ScholarGoogle Scholar
  27. C. Ma, J. Huang, X. Yang, and M.-H. Yang. 2015. Hierarchical Convolutional Features for Visual Tracking Proc. of IEEE International Conference on Computer Vision. 3074--3082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Nam and B. Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. 4293--4302.Google ScholarGoogle Scholar
  29. J. Ning, J. Yang, S. Jiang, L. Zhang, and M.H. Yang. 2016. Object tracking via dual linear structured SVM and explicit feature map Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4266--4274.Google ScholarGoogle Scholar
  30. Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang. 2016. Hedged deep tracking Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 4303--4311.Google ScholarGoogle Scholar
  31. D.A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision Vol. 77, 1--3 (2008), 125--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle Scholar
  33. M. Tang and J. Feng. 2015. Multi-kernel correlation filter for visual tracking Proc. of IEEE International Conference on Computer Vision. 3038--3046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Tao, E. Gavves, and A. Smeulders. 2016. Siamese instance search for tracking. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 1420--1429.Google ScholarGoogle Scholar
  35. J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus. 2009. Learning Color Names for Real-World Applications. IEEE Trans. on Image Processing Vol. 18, 7 (2009), 1512--1523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Wang and H. Lu. 2014. Visual tracking via probability continuous outlier model Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 3478--3485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Wang, W. Ouyang, X. Wang, and H. Lu. 2015. Visual tracking with fully convolutional networks. Proc. of IEEE International Conference on Computer Vision. 3119--3127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. 2017. Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. arXiv preprint arXiv:1703.08448 (2017).Google ScholarGoogle Scholar
  39. Yunchao Wei, Wei Xia, Min Lin, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2016. HCP: A flexible CNN framework for multi-label image classification. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 9 (2016), 1901--1907.Google ScholarGoogle Scholar
  40. Y. Wu, J. Lim, and M.-H. Yang. 2013. Online object tracking: A benchmark. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2411--2418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Wu, J. Lim, and M.-H. Yang. 2015. Object tracking benchmark. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, 9 (2015), 1834--1848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Zhang, S. Ma, and S. Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. Proc. of European Conference on Computer Vision. 188--203.Google ScholarGoogle Scholar
  43. M. Zhang, J. Xing, J. Gao, and W. Hu. 2015. Robust visual tracking using joint scale-spatial correlation filters Proc. of IEEE International Conference on Image Processing. 1468--1472.Google ScholarGoogle Scholar
  44. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. 2016. Learning deep features for discriminative localization Proc. of IEEE Conference on Computer Vision and Pattern Recognition. 2921--2929.Google ScholarGoogle Scholar

Index Terms

  1. Robust Visual Object Tracking with Top-down Reasoning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MM '17: Proceedings of the 25th ACM international conference on Multimedia
            October 2017
            2028 pages
            ISBN:9781450349062
            DOI:10.1145/3123266

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 October 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader