Skip to main content
Erschienen in: International Journal of Computer Vision 1/2021

19.08.2020

Recursive Context Routing for Object Detection

verfasst von: Zhe Chen, Jing Zhang, Dacheng Tao

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent studies have confirmed that modeling contexts is important for object detection. However, current context modeling approaches still have limited expressive capacity and dynamics to encode contextual relationships and model contexts, deteriorating their effectiveness. In this paper, we instead seek to recast the current context modeling framework and perform more dynamic context modeling for object detection. In particular, we devise a novel Recursive Context Routing (ReCoR) mechanism to encode contextual relationships and model contexts more effectively. The ReCoR progressively models more contexts through a recursive structure, providing a more feasible and more comprehensive method to utilize complicated contexts and contextual relationships. For each recursive stage, we further decompose the modeling of contexts and contextual relationships into a spatial modeling process and a channel-wise modeling process, avoiding the need for exhaustive modeling of all the potential pair-wise contextual relationships with more dynamics in a single pass. The spatial modeling process focuses on spatial contexts and gradually involves more spatial contexts according to the recursive architecture. In the channel-wise modeling process, we introduce a context routing algorithm to improve the efficacy of modeling channel-wise contextual relationships dynamically. We perform a comprehensive evaluation of the proposed ReCoR on the popular MS COCO dataset and PASCAL VOC dataset. The effectiveness of the ReCoR can be validated on both datasets according to the consistent performance gains of applying our method on different baseline object detectors. For example, on MS COCO dataset, our approach can respectively deliver around 10% relative improvements for a Mask RCNN detector on the bounding box task, and 7% relative improvements on the instance segmentation task, surpassing existing context modeling approaches with a great margin. State-of-the-art detection performance can also be accessed by applying the ReCoR on the Cascade Mask RCNN detector, illustrating the great benefits of our method for improving context modeling and object detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
3
FLOPs: floating point operations.
 
4
GMAC:giga multiply-accumulate operations per second.
 
Literatur
Zurück zum Zitat Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Nontarget objects can influence perceptual processes during object recognition. Psychonomic Bulletin Review, 14(2), 332–337.CrossRef Auckland, M. E., Cave, K. R., & Donnelly, N. (2007). Nontarget objects can influence perceptual processes during object recognition. Psychonomic Bulletin Review, 14(2), 332–337.CrossRef
Zurück zum Zitat Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR (pp. 2874–2883). IEEE. Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR (pp. 2874–2883). IEEE.
Zurück zum Zitat Biederman, I., Rabinowitz, J. C., Glass, A. L., & Stacy, E. W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103(3), 597.CrossRef Biederman, I., Rabinowitz, J. C., Glass, A. L., & Stacy, E. W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103(3), 597.CrossRef
Zurück zum Zitat Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177.CrossRef
Zurück zum Zitat Boyce, S. J., Pollatsek, A., & Rayner, K. (1989). Effect of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 556. Boyce, S. J., Pollatsek, A., & Rayner, K. (1989). Effect of background information on object identification. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 556.
Zurück zum Zitat Brockmole, J. R., Castelhano, M. S., & Henderson, J. M. (2006). Contextual cueing in naturalistic scenes: Global and local contexts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 699. Brockmole, J. R., Castelhano, M. S., & Henderson, J. M. (2006). Contextual cueing in naturalistic scenes: Global and local contexts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 699.
Zurück zum Zitat Brockmole, J. R., Hambrick, D. Z., Windisch, D. J., & Henderson, J. M. (2008). The role of meaning in contextual cueing: Evidence from chess expertise. The Quarterly Journal of Experimental Psychology, 61(12), 1886–1896.CrossRef Brockmole, J. R., Hambrick, D. Z., Windisch, D. J., & Henderson, J. M. (2008). The role of meaning in contextual cueing: Evidence from chess expertise. The Quarterly Journal of Experimental Psychology, 61(12), 1886–1896.CrossRef
Zurück zum Zitat Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In CVPR. IEEE. Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In CVPR. IEEE.
Zurück zum Zitat Cao, Y., Xu, J., Lin, S., Wei, F., & Hu, H. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492. Cao, Y., Xu, J., Lin, S., Wei, F., & Hu, H. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:​1904.​11492.
Zurück zum Zitat Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., et al. (2019a). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4974–4983). Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., et al. (2019a). Hybrid task cascade for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4974–4983).
Zurück zum Zitat Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRef Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRef
Zurück zum Zitat Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:​1706.​05587.
Zurück zum Zitat Chen, X., & Gupta, A. (2017). Spatial memory for context reasoning in object detection. In ICCV (pp. 4106–4116). IEEE. Chen, X., & Gupta, A. (2017). Spatial memory for context reasoning in object detection. In ICCV (pp. 4106–4116). IEEE.
Zurück zum Zitat Chen, X., Li, L. J., Fei-Fei, L., & Gupta, A. (2018a). Iterative visual reasoning beyond convolutions. In CVPR. IEEE. Chen, X., Li, L. J., Fei-Fei, L., & Gupta, A. (2018a). Iterative visual reasoning beyond convolutions. In CVPR. IEEE.
Zurück zum Zitat Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019b). Graph-based global reasoning networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 433–442). Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., & Kalantidis, Y. (2019b). Graph-based global reasoning networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 433–442).
Zurück zum Zitat Chen, Z., Huang, S., & Tao, D. (2018b). Context refinement for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 71–86). Springer, Berlin. Chen, Z., Huang, S., & Tao, D. (2018b). Context refinement for object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 71–86). Springer, Berlin.
Zurück zum Zitat Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In: CVPR (pp. 129–136). IEEE. Choi, M. J., Lim, J. J., Torralba, A., & Willsky, A. S. (2010). Exploiting hierarchical context on a large database of object categories. In: CVPR (pp. 129–136). IEEE.
Zurück zum Zitat Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.MathSciNetCrossRef Chun, M. M., & Jiang, Y. (1998). Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36(1), 28–71.MathSciNetCrossRef
Zurück zum Zitat Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning of visual covariation. Psychological Science, 10(4), 360–365.CrossRef Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning of visual covariation. Psychological Science, 10(4), 360–365.CrossRef
Zurück zum Zitat Chun, M. M., & Jiang, Y. (2003). Implicit, long-term spatial contextual memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(2), 224. Chun, M. M., & Jiang, Y. (2003). Implicit, long-term spatial contextual memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(2), 224.
Zurück zum Zitat Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387). Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In NIPS (pp. 379–387).
Zurück zum Zitat Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., et al. (2017). Deformable convolutional networks. In ICCV. IEEE. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., et al. (2017). Deformable convolutional networks. In ICCV. IEEE.
Zurück zum Zitat Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564.CrossRef Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564.CrossRef
Zurück zum Zitat De Graef, P., De Troy, A., & d’Ydewalle, G. (1992). Local and global contextual constraints on the identification of objects in scenes. Canadian Journal of Psychology/Revue canadienne de psychologie, 46(3), 489.CrossRef De Graef, P., De Troy, A., & d’Ydewalle, G. (1992). Local and global contextual constraints on the identification of objects in scenes. Canadian Journal of Psychology/Revue canadienne de psychologie, 46(3), 489.CrossRef
Zurück zum Zitat Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In CVPR (pp. 1271–1278). IEEE. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In CVPR (pp. 1271–1278). IEEE.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2007). The pascal visual object classes challenge 2007 (voc2007) results. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2007). The pascal visual object classes challenge 2007 (voc2007) results.
Zurück zum Zitat Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge (Vol. 88, pp. 303–338). Berlin: Springer. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge (Vol. 88, pp. 303–338). Berlin: Springer.
Zurück zum Zitat Galleguillos, C., & Belongie, S. (2010). Context based object categorization: A critical survey. CVIU, 114(6), 712–722. Galleguillos, C., & Belongie, S. (2010). Context based object categorization: A critical survey. CVIU, 114(6), 712–722.
Zurück zum Zitat Galleguillos, C., Rabinovich, A., & Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR (pp. 1–8). IEEE. Galleguillos, C., Rabinovich, A., & Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR (pp. 1–8). IEEE.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. IJRR, 32, 1231–1237. Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. IJRR, 32, 1231–1237.
Zurück zum Zitat Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems (pp. 10727–10737). Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems (pp. 10727–10737).
Zurück zum Zitat Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV (pp. 1134–1142). IEEE. Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware CNN model. In ICCV (pp. 1134–1142). IEEE.
Zurück zum Zitat Girshick, R. (2015). Fast R-CNN. In: ICCV (pp. 1440–1448). IEEE. Girshick, R. (2015). Fast R-CNN. In: ICCV (pp. 1440–1448). IEEE.
Zurück zum Zitat Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV (pp. 237–244). IEEE. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV (pp. 237–244). IEEE.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask R-CNN. ICCV. He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask R-CNN. ICCV.
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). IEEE. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). IEEE.
Zurück zum Zitat Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In ECCV (pp. 30–43). Springer, Berlin. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In ECCV (pp. 30–43). Springer, Berlin.
Zurück zum Zitat Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review of Psychology, 50(1), 243–271.CrossRef Henderson, J. M., & Hollingworth, A. (1999). High-level scene perception. Annual Review of Psychology, 50(1), 243–271.CrossRef
Zurück zum Zitat Hinton, G. E., Sabour, S., & Frosst, N. (2018). Matrix capsules with EM routing. In ICLR. Hinton, G. E., Sabour, S., & Frosst, N. (2018). Matrix capsules with EM routing. In ICLR.
Zurück zum Zitat Hollingworth, A. (1998). Does consistent scene context facilitate object perception? Journal of Experimental Psychology: General, 127(4), 398.CrossRef Hollingworth, A. (1998). Does consistent scene context facilitate object perception? Journal of Experimental Psychology: General, 127(4), 398.CrossRef
Zurück zum Zitat Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018a). Relation networks for object detection. In CVPR (Vol. 2). IEEE. Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018a). Relation networks for object detection. In CVPR (Vol. 2). IEEE.
Zurück zum Zitat Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). Hu, J., Shen, L., & Sun, G. (2018b). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
Zurück zum Zitat Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR (pp. 845–853). IEEE. Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR (pp. 845–853). IEEE.
Zurück zum Zitat Li, H., Guo, X., Dai, B., Ouyang, W., & Wang, X. (2018). Neural network encapsulation. In Proceedings of the European conference on computer vision (ECCV) (pp. 252–267). Li, H., Guo, X., Dai, B., Ouyang, W., & Wang, X. (2018). Neural network encapsulation. In Proceedings of the European conference on computer vision (ECCV) (pp. 252–267).
Zurück zum Zitat Li, H., Liu, Y., Ouyang, W., & Wang, X. (2019). Zoom out-and-in network with map attention decision for region proposal and object detection. International Journal of Computer Vision, 127(3), 225–238.CrossRef Li, H., Liu, Y., Ouyang, W., & Wang, X. (2019). Zoom out-and-in network with map attention decision for region proposal and object detection. International Journal of Computer Vision, 127(3), 225–238.CrossRef
Zurück zum Zitat Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR. IEEE. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR. IEEE.
Zurück zum Zitat Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988). Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Zurück zum Zitat Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal loss for dense object detection. In TPAMI. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal loss for dense object detection. In TPAMI.
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer, Berlin. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft coco: Common objects in context. In ECCV (pp. 740–755). Springer, Berlin.
Zurück zum Zitat Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2019). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128, 261–318.CrossRef Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2019). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128, 261–318.CrossRef
Zurück zum Zitat Liu, S., Huang, D., & Wang, A. (2018a). Receptive field block net for accurate and fast object detection. In ECCV. Springer, Berlin. Liu, S., Huang, D., & Wang, A. (2018a). Receptive field block net for accurate and fast object detection. In ECCV. Springer, Berlin.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer, Berlin. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., et al. (2016). SSD: Single shot multibox detector. In ECCV (pp. 21–37). Springer, Berlin.
Zurück zum Zitat Liu, Y., Wang, R., Shan, S., & Chen, X. (2018b). Structure inference net: Object detection using scene-level context and instance-level relationships. In CVPR (pp. 6985–6994). Liu, Y., Wang, R., Shan, S., & Chen, X. (2018b). Structure inference net: Object detection using scene-level context and instance-level relationships. In CVPR (pp. 6985–6994).
Zurück zum Zitat Modolo, D., Vezhnevets, A., & Ferrari, V. (2015). Context forest for object class detection (Vol. 1, p. 6). In BMVC. Modolo, D., Vezhnevets, A., & Ferrari, V. (2015). Context forest for object class detection (Vol. 1, p. 6). In BMVC.
Zurück zum Zitat Mordan, T., Thome, N., Henaff, G., & Cord, M. (2019). End-to-end learning of latent deformable part-based representations for object detection. International Journal of Computer Vision, 127(11–12), 1659–1679.CrossRef Mordan, T., Thome, N., Henaff, G., & Cord, M. (2019). End-to-end learning of latent deformable part-based representations for object detection. International Journal of Computer Vision, 127(11–12), 1659–1679.CrossRef
Zurück zum Zitat Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W., Fidler, S., et al. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR (pp. 891–898). IEEE. Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W., Fidler, S., et al. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR (pp. 891–898). IEEE.
Zurück zum Zitat Ouyang, W., Wang, K., Zhu, X., & Wang, X. (2017). Learning chained deep features and classifiers for cascade in object detection. In ICCV. Ouyang, W., Wang, K., Zhu, X., & Wang, X. (2017). Learning chained deep features and classifiers for cascade in object detection. In ICCV.
Zurück zum Zitat Ouyang, W., Zeng, X., & Wang, X. (2016). Learning mutual visibility relationship for pedestrian detection with a deep model. International Journal of Computer Vision, 120(1), 14–27.MathSciNetCrossRef Ouyang, W., Zeng, X., & Wang, X. (2016). Learning mutual visibility relationship for pedestrian detection with a deep model. International Journal of Computer Vision, 120(1), 14–27.MathSciNetCrossRef
Zurück zum Zitat Palmer, T. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519–526.CrossRef Palmer, T. E. (1975). The effects of contextual scenes on the identification of objects. Memory and Cognition, 3, 519–526.CrossRef
Zurück zum Zitat Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV (pp. 1–8). IEEE. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In ICCV (pp. 1–8). IEEE.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In CVPR (pp. 779–788). IEEE. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In CVPR (pp. 779–788). IEEE.
Zurück zum Zitat Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., et al. (2017). Accurate single stage detector using recurrent rolling convolution. In CVPR. Ren, J., Chen, X., Liu, J., Sun, W., Pang, J., Yan, Q., et al. (2017). Accurate single stage detector using recurrent rolling convolution. In CVPR.
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Zurück zum Zitat Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866). Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In NIPS (pp. 3856–3866).
Zurück zum Zitat Shen, Z., Liu, Z., Li, J., Jiang, Y. G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR (pp. 1919–1927). IEEE. Shen, Z., Liu, Z., Li, J., Jiang, Y. G., Chen, Y., & Xue, X. (2017). Dsod: Learning deeply supervised object detectors from scratch. In CVPR (pp. 1919–1927). IEEE.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Zurück zum Zitat Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., et al. (2003). Context-based vision system for place and object recognition. In ICCV (Vol. 3, pp. 273–280). IEEE. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., et al. (2003). Context-based vision system for place and object recognition. In ICCV (Vol. 3, pp. 273–280). IEEE.
Zurück zum Zitat Tu, Z., & Bai, X. (2010). Auto-context and its application to high-level vision tasks and 3d brain image segmentation. TPAMI, 32(10), 1744–1757.CrossRef Tu, Z., & Bai, X. (2010). Auto-context and its application to high-level vision tasks and 3d brain image segmentation. TPAMI, 32(10), 1744–1757.CrossRef
Zurück zum Zitat Vondrick, C., Khosla, A., Pirsiavash, H., Malisiewicz, T., & Torralba, A. (2016). Visualizing object detection features. International Journal of Computer Vision, 119(2), 145–158.MathSciNetCrossRef Vondrick, C., Khosla, A., Pirsiavash, H., Malisiewicz, T., & Torralba, A. (2016). Visualizing object detection features. International Journal of Computer Vision, 119(2), 145–158.MathSciNetCrossRef
Zurück zum Zitat Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., et al. (2018a). Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1451–1460). IEEE. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., et al. (2018a). Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1451–1460). IEEE.
Zurück zum Zitat Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In CVPR. IEEE. Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In CVPR. IEEE.
Zurück zum Zitat Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–19). Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (pp. 3–19).
Zurück zum Zitat Wu, Y., & He, K. (2018). Group normalization. In ECCV. Springer, Berlin. Wu, Y., & He, K. (2018). Group normalization. In ECCV. Springer, Berlin.
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500). Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500).
Zurück zum Zitat Yu, R. R., Chen, X. S., Morariu, V. I., Davis, L. S., & Redmond, W. (2010). The role of context selection in object detection. T-PAMI, 32(9), 1627–1645.CrossRef Yu, R. R., Chen, X. S., Morariu, V. I., Davis, L. S., & Redmond, W. (2010). The role of context selection in object detection. T-PAMI, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Zagoruyko, S., Lerer, A., Lin, T. Y., Pinheiro, P. O., Gross, S., Chintala, S., et al. (2016). A multipath network for object detection. arXiv preprint arXiv:1604.02135. Zagoruyko, S., Lerer, A., Lin, T. Y., Pinheiro, P. O., Gross, S., Chintala, S., et al. (2016). A multipath network for object detection. arXiv preprint arXiv:​1604.​02135.
Zurück zum Zitat Zeng, X., Ouyang, W., Yan, J., Li, H., Xiao, T., Wang, K., et al. (2017). Crafting gbd-net for object detection. T-PAMI, 40, 2109–2123.CrossRef Zeng, X., Ouyang, W., Yan, J., Li, H., Xiao, T., Wang, K., et al. (2017). Crafting gbd-net for object detection. T-PAMI, 40, 2109–2123.CrossRef
Zurück zum Zitat Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., et al. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7151–7160). Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., et al. (2018). Context encoding for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7151–7160).
Zurück zum Zitat Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890). Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890).
Metadaten
Titel
Recursive Context Routing for Object Detection
verfasst von
Zhe Chen
Jing Zhang
Dacheng Tao
Publikationsdatum
19.08.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01370-7

Weitere Artikel der Ausgabe 1/2021

International Journal of Computer Vision 1/2021 Zur Ausgabe