Skip to main content
Top
Published in: International Journal of Computer Vision 1/2021

19-08-2020

Recursive Context Routing for Object Detection

Authors: Zhe Chen, Jing Zhang, Dacheng Tao

Published in: International Journal of Computer Vision | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent studies have confirmed that modeling contexts is important for object detection. However, current context modeling approaches still have limited expressive capacity and dynamics to encode contextual relationships and model contexts, deteriorating their effectiveness. In this paper, we instead seek to recast the current context modeling framework and perform more dynamic context modeling for object detection. In particular, we devise a novel Recursive Context Routing (ReCoR) mechanism to encode contextual relationships and model contexts more effectively. The ReCoR progressively models more contexts through a recursive structure, providing a more feasible and more comprehensive method to utilize complicated contexts and contextual relationships. For each recursive stage, we further decompose the modeling of contexts and contextual relationships into a spatial modeling process and a channel-wise modeling process, avoiding the need for exhaustive modeling of all the potential pair-wise contextual relationships with more dynamics in a single pass. The spatial modeling process focuses on spatial contexts and gradually involves more spatial contexts according to the recursive architecture. In the channel-wise modeling process, we introduce a context routing algorithm to improve the efficacy of modeling channel-wise contextual relationships dynamically. We perform a comprehensive evaluation of the proposed ReCoR on the popular MS COCO dataset and PASCAL VOC dataset. The effectiveness of the ReCoR can be validated on both datasets according to the consistent performance gains of applying our method on different baseline object detectors. For example, on MS COCO dataset, our approach can respectively deliver around 10% relative improvements for a Mask RCNN detector on the bounding box task, and 7% relative improvements on the instance segmentation task, surpassing existing context modeling approaches with a great margin. State-of-the-art detection performance can also be accessed by applying the ReCoR on the Cascade Mask RCNN detector, illustrating the great benefits of our method for improving context modeling and object detection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
3
FLOPs: floating point operations.
 
4
GMAC:giga multiply-accumulate operations per second.
 
Literature
go back to reference Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Nontarget objects can influence perceptual processes during object recognition. Psychonomic Bulletin Review, 14(2), 332–337.CrossRef Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Nontarget objects can influence perceptual processes during object recognition. Psychonomic Bulletin Review, 14(2), 332–337.CrossRef
go back to reference Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR (pp. 2874–2883). IEEE. Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR (pp. 2874–2883). IEEE.
go back to reference Biederman, I., Rabinowitz, J. C., Glass, A. L., & Stacy, E. W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103(3), 597.CrossRef Biederman, I., Rabinowitz, J. C., Glass, A. L., & Stacy, E. W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103(3), 597.CrossRef
go back to reference Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef
go back to reference Boyce, S. J., Pollatsek, A., & Rayner, K. (1989). Effect of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 556. Boyce, S. J., Pollatsek, A., & Rayner, K. (1989). Effect of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 556.
go back to reference Brockmole, J. R., Castelhano, M. S., & Henderson, J. M. (2006). Contextual cueing in naturalistic scenes: Global and local contexts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 699. Brockmole, J. R., Castelhano, M. S., & Henderson, J. M. (2006). Contextual cueing in naturalistic scenes: Global and local contexts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 699.
go back to reference Brockmole, J. R., Hambrick, D. Z., Windisch, D. J., & Henderson, J. M. (2008). The role of meaning in contextual cueing: Evidence from chess expertise. The Quarterly Journal of Experimental Psychology, 61(12), 1886–1896.CrossRef Brockmole, J. R., Hambrick, D. Z., Windisch, D. J., & Henderson, J. M. (2008). The role of meaning in contextual cueing: Evidence from chess expertise. The Quarterly Journal of Experimental Psychology, 61(12), 1886–1896.CrossRef
go back to reference Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In CVPR. IEEE. Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In CVPR. IEEE.
go back to reference Cao, Y., Xu, J., Lin, S., Wei, F., & Hu, H. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492. Cao, Y., Xu, J., Lin, S., Wei, F., & Hu, H. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:​1904.​11492.
go back to reference Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., et al. (2019a). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4974–4983). Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., et al. (2019a). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4974–4983).
go back to reference Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRef Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRef
go back to reference Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:​1706.​05587.
go back to reference Chen, X., & Gupta, A. (2017). Spatial memory for context reasoning in object detection. In ICCV (pp. 4106–4116). IEEE. Chen, X., & Gupta, A. (2017). Spatial memory for context reasoning in object detection. In ICCV (pp. 4106–4116). IEEE.
go back to reference Chen, X., Li, L. J., Fei-Fei, L., & Gupta, A. (2018a). Iterative visual reasoning beyond convolutions. In CVPR. IEEE. Chen, X., Li, L. J., Fei-Fei, L., & Gupta, A. (2018a). Iterative visual reasoning beyond convolutions. In CVPR. IEEE.
go back to reference Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019b). Graph-based global reasoning networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 433–442). Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019b). Graph-based global reasoning networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 433–442).
go back to reference Chen, Z., Huang, S., & Tao, D. (2018b). Context refinement for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 71–86). Springer, Berlin. Chen, Z., Huang, S., & Tao, D. (2018b). Context refinement for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 71–86). Springer, Berlin.
go back to reference Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In: CVPR (pp. 129–136). IEEE. Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In: CVPR (pp. 129–136). IEEE.
go back to reference Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.MathSciNetCrossRef Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.MathSciNetCrossRef
go back to reference Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning of visual covariation. Psychological Science, 10(4), 360–365.CrossRef Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning of visual covariation. Psychological Science, 10(4), 360–365.CrossRef
go back to reference Chun, M. M., & Jiang, Y. (2003). Implicit, long-term spatial contextual memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(2), 224. Chun, M. M., & Jiang, Y. (2003). Implicit, long-term spatial contextual memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(2), 224.
go back to reference Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387). Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387).
go back to reference Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., et al. (2017). Deformable convolutional networks. In ICCV. IEEE. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., et al. (2017). Deformable convolutional networks. In ICCV. IEEE.
go back to reference Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564.CrossRef Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564.CrossRef
go back to reference De Graef, P., De Troy, A., & d’Ydewalle, G. (1992). Local and global contextual constraints on the identification of objects in scenes. Canadian Journal of Psychology/Revue canadienne de psychologie, 46(3), 489.CrossRef De Graef, P., De Troy, A., & d’Ydewalle, G. (1992). Local and global contextual constraints on the identification of objects in scenes. Canadian Journal of Psychology/Revue canadienne de psychologie, 46(3), 489.CrossRef
go back to reference Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In CVPR (pp. 1271–1278). IEEE. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In CVPR (pp. 1271–1278). IEEE.
go back to reference Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2007). The pascal visual object classes challenge 2007 (voc2007) results. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2007). The pascal visual object classes challenge 2007 (voc2007) results.
go back to reference Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge (Vol. 88, pp. 303–338). Berlin: Springer. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge (Vol. 88, pp. 303–338). Berlin: Springer.
go back to reference Galleguillos, C., & Belongie, S. (2010). Context based object categorization: A critical survey. CVIU, 114(6), 712–722. Galleguillos, C., & Belongie, S. (2010). Context based object categorization: A critical survey. CVIU, 114(6), 712–722.
go back to reference Galleguillos, C., Rabinovich, A., & Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR (pp. 1–8). IEEE. Galleguillos, C., Rabinovich, A., & Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR (pp. 1–8). IEEE.
go back to reference Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. IJRR, 32, 1231–1237. Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. IJRR, 32, 1231–1237.
go back to reference Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems (pp. 10727–10737). Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems (pp. 10727–10737).
go back to reference Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV (pp. 1134–1142). IEEE. Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV (pp. 1134–1142). IEEE.
go back to reference Girshick, R. (2015). Fast R-CNN. In: ICCV (pp. 1440–1448). IEEE. Girshick, R. (2015). Fast R-CNN. In: ICCV (pp. 1440–1448). IEEE.
go back to reference Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV (pp. 237–244). IEEE. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV (pp. 237–244). IEEE.
go back to reference He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask R-CNN. ICCV. He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask R-CNN. ICCV.
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). IEEE. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). IEEE.
go back to reference Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In ECCV (pp. 30–43). Springer, Berlin. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In ECCV (pp. 30–43). Springer, Berlin.
go back to reference Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review of Psychology, 50(1), 243–271.CrossRef Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review of Psychology, 50(1), 243–271.CrossRef
go back to reference Hinton, G. E., Sabour, S., & Frosst, N. (2018). Matrix capsules with EM routing. In ICLR. Hinton, G. E., Sabour, S., & Frosst, N. (2018). Matrix capsules with EM routing. In ICLR.
go back to reference Hollingworth, A. (1998). Does consistent scene context facilitate object perception? Journal of Experimental Psychology: General, 127(4), 398.CrossRef Hollingworth, A. (1998). Does consistent scene context facilitate object perception? Journal of Experimental Psychology: General, 127(4), 398.CrossRef
go back to reference Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018a). Relation networks for object detection. In CVPR (Vol. 2). IEEE. Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018a). Relation networks for object detection. In CVPR (Vol. 2). IEEE.
go back to reference Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
go back to reference Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR (pp. 845–853). IEEE. Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR (pp. 845–853). IEEE.
go back to reference Li, H., Guo, X., Dai, B., Ouyang, W., & Wang, X. (2018). Neural network encapsulation. In Proceedings of the European conference on computer vision (ECCV) (pp. 252–267). Li, H., Guo, X., Dai, B., Ouyang, W., & Wang, X. (2018). Neural network encapsulation. In Proceedings of the European conference on computer vision (ECCV) (pp. 252–267).
go back to reference Li, H., Liu, Y., Ouyang, W., & Wang, X. (2019). Zoom out-and-in network with map attention decision for region proposal and object detection. International Journal of Computer Vision, 127(3), 225–238.CrossRef Li, H., Liu, Y., Ouyang, W., & Wang, X. (2019). Zoom out-and-in network with map attention decision for region proposal and object detection. International Journal of Computer Vision, 127(3), 225–238.CrossRef
go back to reference Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR. IEEE. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR. IEEE.
go back to reference Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988). Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
go back to reference Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal loss for dense object detection. In TPAMI. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal loss for dense object detection. In TPAMI.
go back to reference Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer, Berlin. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer, Berlin.
go back to reference Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2019). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128, 261–318.CrossRef Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2019). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128, 261–318.CrossRef
go back to reference Liu, S., Huang, D., & Wang, A. (2018a). Receptive field block net for accurate and fast object detection. In ECCV. Springer, Berlin. Liu, S., Huang, D., & Wang, A. (2018a). Receptive field block net for accurate and fast object detection. In ECCV. Springer, Berlin.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer, Berlin. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer, Berlin.
go back to reference Liu, Y., Wang, R., Shan, S., & Chen, X. (2018b). Structure inference net: Object detection using scene-level context and instance-level relationships. In CVPR (pp. 6985–6994). Liu, Y., Wang, R., Shan, S., & Chen, X. (2018b). Structure inference net: Object detection using scene-level context and instance-level relationships. In CVPR (pp. 6985–6994).
go back to reference Modolo, D., Vezhnevets, A., & Ferrari, V. (2015). Context forest for object class detection (Vol. 1, p. 6). In BMVC. Modolo, D., Vezhnevets, A., & Ferrari, V. (2015). Context forest for object class detection (Vol. 1, p. 6). In BMVC.
go back to reference Mordan, T., Thome, N., Henaff, G., & Cord, M. (2019). End-to-end learning of latent deformable part-based representations for object detection. International Journal of Computer Vision, 127(11–12), 1659–1679.CrossRef Mordan, T., Thome, N., Henaff, G., & Cord, M. (2019). End-to-end learning of latent deformable part-based representations for object detection. International Journal of Computer Vision, 127(11–12), 1659–1679.CrossRef
go back to reference Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W., Fidler, S., et al. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR (pp. 891–898). IEEE. Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W., Fidler, S., et al. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR (pp. 891–898). IEEE.
go back to reference Ouyang, W., Wang, K., Zhu, X., & Wang, X. (2017). Learning chained deep features and classifiers for cascade in object detection. In ICCV. Ouyang, W., Wang, K., Zhu, X., & Wang, X. (2017). Learning chained deep features and classifiers for cascade in object detection. In ICCV.
go back to reference Ouyang, W., Zeng, X., & Wang, X. (2016). Learning mutual visibility relationship for pedestrian detection with a deep model. International Journal of Computer Vision, 120(1), 14–27.MathSciNetCrossRef Ouyang, W., Zeng, X., & Wang, X. (2016). Learning mutual visibility relationship for pedestrian detection with a deep model. International Journal of Computer Vision, 120(1), 14–27.MathSciNetCrossRef
go back to reference Palmer, T. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519–526.CrossRef Palmer, T. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519–526.CrossRef
go back to reference Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV (pp. 1–8). IEEE. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV (pp. 1–8). IEEE.
go back to reference Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In CVPR (pp. 779–788). IEEE. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In CVPR (pp. 779–788). IEEE.
go back to reference Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., et al. (2017). Accurate single stage detector using recurrent rolling convolution. In CVPR. Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., et al. (2017). Accurate single stage detector using recurrent rolling convolution. In CVPR.
go back to reference Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
go back to reference Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866). Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866).
go back to reference Shen, Z., Liu, Z., Li, J., Jiang, Y. G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR (pp. 1919–1927). IEEE. Shen, Z., Liu, Z., Li, J., Jiang, Y. G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR (pp. 1919–1927). IEEE.
go back to reference Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
go back to reference Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., et al. (2003). Context-based vision system for place and object recognition. In ICCV (Vol. 3, pp. 273–280). IEEE. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., et al. (2003). Context-based vision system for place and object recognition. In ICCV (Vol. 3, pp. 273–280). IEEE.
go back to reference Tu, Z., & Bai, X. (2010). Auto-context and its application to high-level vision tasks and 3d brain image segmentation. TPAMI, 32(10), 1744–1757.CrossRef Tu, Z., & Bai, X. (2010). Auto-context and its application to high-level vision tasks and 3d brain image segmentation. TPAMI, 32(10), 1744–1757.CrossRef
go back to reference Vondrick, C., Khosla, A., Pirsiavash, H., Malisiewicz, T., & Torralba, A. (2016). Visualizing object detection features. International Journal of Computer Vision, 119(2), 145–158.MathSciNetCrossRef Vondrick, C., Khosla, A., Pirsiavash, H., Malisiewicz, T., & Torralba, A. (2016). Visualizing object detection features. International Journal of Computer Vision, 119(2), 145–158.MathSciNetCrossRef
go back to reference Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., et al. (2018a). Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1451–1460). IEEE. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., et al. (2018a). Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1451–1460). IEEE.
go back to reference Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In CVPR. IEEE. Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In CVPR. IEEE.
go back to reference Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–19). Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–19).
go back to reference Wu, Y., & He, K. (2018). Group normalization. In ECCV. Springer, Berlin. Wu, Y., & He, K. (2018). Group normalization. In ECCV. Springer, Berlin.
go back to reference Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500). Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500).
go back to reference Yu, R. R., Chen, X. S., Morariu, V. I., Davis, L. S., & Redmond, W. (2010). The role of context selection in object detection. T-PAMI, 32(9), 1627–1645.CrossRef Yu, R. R., Chen, X. S., Morariu, V. I., Davis, L. S., & Redmond, W. (2010). The role of context selection in object detection. T-PAMI, 32(9), 1627–1645.CrossRef
go back to reference Zagoruyko, S., Lerer, A., Lin, T. Y., Pinheiro, P. O., Gross, S., Chintala, S., et al. (2016). A multipath network for object detection. arXiv preprint arXiv:1604.02135. Zagoruyko, S., Lerer, A., Lin, T. Y., Pinheiro, P. O., Gross, S., Chintala, S., et al. (2016). A multipath network for object detection. arXiv preprint arXiv:​1604.​02135.
go back to reference Zeng, X., Ouyang, W., Yan, J., Li, H., Xiao, T., Wang, K., et al. (2017). Crafting gbd-net for object detection. T-PAMI, 40, 2109–2123.CrossRef Zeng, X., Ouyang, W., Yan, J., Li, H., Xiao, T., Wang, K., et al. (2017). Crafting gbd-net for object detection. T-PAMI, 40, 2109–2123.CrossRef
go back to reference Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., et al. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7151–7160). Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., et al. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7151–7160).
go back to reference Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890). Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890).
Metadata
Title
Recursive Context Routing for Object Detection
Authors
Zhe Chen
Jing Zhang
Dacheng Tao
Publication date
19-08-2020
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 1/2021
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01370-7

Other articles of this Issue 1/2021

International Journal of Computer Vision 1/2021 Go to the issue

Premium Partner