Skip to main content
Erschienen in: International Journal of Computer Vision 5/2020

27.03.2020

Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image

verfasst von: Matthieu Grard, Emmanuel Dellandréa, Liming Chen

Erschienen in: International Journal of Computer Vision | Ausgabe 5/2020

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Occlusion-aware instance-sensitive segmentation is a complex task generally split into region-based segmentations, by approximating instances as their bounding box. We address the showcase scenario of dense homogeneous layouts in which this approximation does not hold. In this scenario, outlining unoccluded instances by decoding a deep encoder becomes difficult, due to the translation invariance of convolutional layers and the lack of complexity in the decoder. We therefore propose a multicameral design composed of subtask-specific lightweight decoder and encoder–decoder units, coupled in cascade to encourage subtask-specific feature reuse and enforce a learning path within the decoding process. Furthermore, the state-of-the-art datasets for occlusion-aware instance segmentation contain real images with few instances and occlusions mostly due to objects occluding the background, unlike dense object layouts. We thus also introduce a synthetic dataset of dense homogeneous object layouts, namely Mikado, which extensibly contains more instances and inter-instance occlusions per image than these public datasets. Our extensive experiments on Mikado and public datasets show that ordinal multiscale units within the decoding process prove more effective than state-of-the-art design patterns for capturing position-sensitive representations. We also show that Mikado is plausible with respect to real-world problems, in the sense that it enables the learning of performance-enhancing representations transferable to real images, while drastically reducing the need of hand-made annotations for finetuning. The proposed dataset will be made publicly available.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Antoniou, A., Storkey, A. J., & Edwards, H. (2018). Augmenting image classifiers using data augmentation generative adversarial networks. In International conference on artificial neural networks and machine learning (ICANN) (Vol. 11141, pp. 594–603). Lecture notes in computer science, Springer. Antoniou, A., Storkey, A. J., & Edwards, H. (2018). Augmenting image classifiers using data augmentation generative adversarial networks. In International conference on artificial neural networks and machine learning (ICANN) (Vol. 11141, pp. 594–603). Lecture notes in computer science, Springer.
Zurück zum Zitat Ayvaci, A., Raptis, M., & Soatto, S. (2010). Occlusion detection and motion estimation with convex optimization. In Advances in neural information processing systems (NIPS) (pp. 100–108). Ayvaci, A., Raptis, M., & Soatto, S. (2010). Occlusion detection and motion estimation with convex optimization. In Advances in neural information processing systems (NIPS) (pp. 100–108).
Zurück zum Zitat Ayvaci, A., Raptis, M., & Soatto, S. (2012). Sparse occlusion detection with optical flow. International Journal of Computer Vision (IJCV), 97(3), 322–338.MathSciNetCrossRef Ayvaci, A., Raptis, M., & Soatto, S. (2012). Sparse occlusion detection with optical flow. International Journal of Computer Vision (IJCV), 97(3), 322–338.MathSciNetCrossRef
Zurück zum Zitat Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(12), 2481–2495.CrossRef Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder–decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(12), 2481–2495.CrossRef
Zurück zum Zitat Bai, M., Urtasun, R. (2017). Deep watershed transform for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2858–2866). IEEE Computer Society. Bai, M., Urtasun, R. (2017). Deep watershed transform for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2858–2866). IEEE Computer Society.
Zurück zum Zitat Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., & Paluri, M. (2019). Improved road connectivity by joint learning of orientation and segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 10385–10393). Computer Vision Foundation/IEEE. Batra, A., Singh, S., Pang, G., Basu, S., Jawahar, C., & Paluri, M. (2019). Improved road connectivity by joint learning of orientation and segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 10385–10393). Computer Vision Foundation/IEEE.
Zurück zum Zitat Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010a). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.MathSciNetCrossRef Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010a). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.MathSciNetCrossRef
Zurück zum Zitat Ben-David, S., Lu, T., Luu, T., Pál, D. (2010b). Impossibility theorems for domain adaptation. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 129–136). Ben-David, S., Lu, T., Luu, T., Pál, D. (2010b). Impossibility theorems for domain adaptation. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 129–136).
Zurück zum Zitat Brégier, R., Devernay, F., Leyrit, L., & Crowley, J. L. (2017). Symmetry aware evaluation of 3d object detection and pose estimation in scenes of many parts in bulk. In International conference on computer vision workshops (ICCVW) (pp. 2209–2218). IEEE Computer Society. Brégier, R., Devernay, F., Leyrit, L., & Crowley, J. L. (2017). Symmetry aware evaluation of 3d object detection and pose estimation in scenes of many parts in bulk. In International conference on computer vision workshops (ICCVW) (pp. 2209–2218). IEEE Computer Society.
Zurück zum Zitat Caesar, H., Uijlings, J. R. R., Ferrari, V. (2018). COCO-Stuff: Thing and stuff classes in context. In Conference on computer vision and pattern recognition (CVPR) (pp. 1209–1218). IEEE Computer Society. Caesar, H., Uijlings, J. R. R., Ferrari, V. (2018). COCO-Stuff: Thing and stuff classes in context. In Conference on computer vision and pattern recognition (CVPR) (pp. 1209–1218). IEEE Computer Society.
Zurück zum Zitat Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In International conference on learning representations (ICLR). Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In International conference on learning representations (ICLR).
Zurück zum Zitat Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on computer vision (ECCV) part VII (Vol. 11211, pp. 833–851). Lecture notes in computer science, Springer. Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In European conference on computer vision (ECCV) part VII (Vol. 11211, pp. 833–851). Lecture notes in computer science, Springer.
Zurück zum Zitat Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). AutoAugment: learning augmentation strategies from data. In Conference on computer vision and pattern recognition (CVPR) (pp. 113–123). Computer Vision Foundation/IEEE. Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). AutoAugment: learning augmentation strategies from data. In Conference on computer vision and pattern recognition (CVPR) (pp. 113–123). Computer Vision Foundation/IEEE.
Zurück zum Zitat Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Conference on computer vision and pattern recognition (CVPR) (pp. 3150–3158). IEEE Computer Society. Dai, J., He, K., & Sun, J. (2016). Instance-aware semantic segmentation via multi-task network cascades. In Conference on computer vision and pattern recognition (CVPR) (pp. 3150–3158). IEEE Computer Society.
Zurück zum Zitat Deng, R., Shen, C., Liu, S., Wang, H., & Liu, X. (2018). Learning to predict crisp boundaries. In European conference on computer vision (ECCV) part VI (Vol. 11210, pp. 570–586). Lecture notes in computer science, Springer. Deng, R., Shen, C., Liu, S., Wang, H., & Liu, X. (2018). Learning to predict crisp boundaries. In European conference on computer vision (ECCV) part VI (Vol. 11210, pp. 570–586). Lecture notes in computer science, Springer.
Zurück zum Zitat Do, T. T., Nguyen, A., & Reid, I. D. (2018). AffordanceNet: An end-to-end deep learning approach for object affordance detection. In International conference on robotics and automation (ICRA) (pp. 1–5). IEEE. Do, T. T., Nguyen, A., & Reid, I. D. (2018). AffordanceNet: An end-to-end deep learning approach for object affordance detection. In International conference on robotics and automation (ICRA) (pp. 1–5). IEEE.
Zurück zum Zitat Dong, X., Yan, Y., Ouyang, W., Yang, Y. (2018). Style aggregated network for facial landmark detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 379–388). IEEE Computer Society. Dong, X., Yan, Y., Ouyang, W., Yang, Y. (2018). Style aggregated network for facial landmark detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 379–388). IEEE Computer Society.
Zurück zum Zitat Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS) (pp. 2366–2374). Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (NIPS) (pp. 2366–2374).
Zurück zum Zitat Everingham, M., Eslami, S. M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef Everingham, M., Eslami, S. M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV), 111(1), 98–136.CrossRef
Zurück zum Zitat Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4Net: Single stage salient-instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 6103–6112). Computer Vision Foundation/IEEE. Fan, R., Cheng, M. M., Hou, Q., Mu, T. J., Wang, J., & Hu, S. M. (2019). S4Net: Single stage salient-instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 6103–6112). Computer Vision Foundation/IEEE.
Zurück zum Zitat Follmann, P., Böttger, T., Härtinger, P., König, R., & Ulrich, M. (2018). MVTec D2S: Densely segmented supermarket dataset. In European conference on computer vision (ECCV) part X (Vol. 11214, pp. 581–597). Lecture notes in computer science, Springer. Follmann, P., Böttger, T., Härtinger, P., König, R., & Ulrich, M. (2018). MVTec D2S: Densely segmented supermarket dataset. In European conference on computer vision (ECCV) part X (Vol. 11214, pp. 581–597). Lecture notes in computer science, Springer.
Zurück zum Zitat Follmann, P., König, R., Härtinger, P., Klostermann, M., & Böttger, T. (2019). Learning to see the invisible: End-to-end trainable amodal instance segmentation. In Winter conference on applications of computer vision, (WACV) (pp. 1328–1336). IEEE. Follmann, P., König, R., Härtinger, P., Klostermann, M., & Böttger, T. (2019). Learning to see the invisible: End-to-end trainable amodal instance segmentation. In Winter conference on applications of computer vision, (WACV) (pp. 1328–1336). IEEE.
Zurück zum Zitat Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2002–2011). IEEE Computer Society. Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In Conference on computer vision and pattern recognition (CVPR) (pp. 2002–2011). IEEE Computer Society.
Zurück zum Zitat Fu, H., Wang, C., Tao, D., & Black, M. J. (2016). Occlusion boundary detection via deep exploration of context. In Conference on computer vision and pattern recognition (CVPR) (pp. 241–250). IEEE Computer Society. Fu, H., Wang, C., Tao, D., & Black, M. J. (2016). Occlusion boundary detection via deep exploration of context. In Conference on computer vision and pattern recognition (CVPR) (pp. 241–250). IEEE Computer Society.
Zurück zum Zitat Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society. Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society.
Zurück zum Zitat Gan, Y., Xu, X., Sun, W., & Lin, L. (2018). Monocular depth estimation with affinity, vertical pooling, and label enhancement. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 232–247). Lecture notes in computer science, Springer. Gan, Y., Xu, X., Sun, W., & Lin, L. (2018). Monocular depth estimation with affinity, vertical pooling, and label enhancement. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 232–247). Lecture notes in computer science, Springer.
Zurück zum Zitat Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR), 32(11), 1231–1237.CrossRef Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR), 32(11), 1231–1237.CrossRef
Zurück zum Zitat Geiger, D., Ladendorf, B., & Yuille, A. L. (1995). Occlusions and binocular stereo. International Journal of Computer Vision (IJCV), 14(3), 211–226.CrossRef Geiger, D., Ladendorf, B., & Yuille, A. L. (1995). Occlusions and binocular stereo. International Journal of Computer Vision (IJCV), 14(3), 211–226.CrossRef
Zurück zum Zitat Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 249–256) Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics (AISTATS), JMLR.org, JMLR proceedings (Vol. 9, pp. 249–256)
Zurück zum Zitat Grammalidis, N., & Strintzis, M. G. (1998). Disparity and occlusion estimation in multiocular systems and their coding for the communication of multiview image sequences. Transactions on Circuits and Systems for Video Technology (TCSVT), 8(3), 328–344.CrossRef Grammalidis, N., & Strintzis, M. G. (1998). Disparity and occlusion estimation in multiocular systems and their coding for the communication of multiview image sequences. Transactions on Circuits and Systems for Video Technology (TCSVT), 8(3), 328–344.CrossRef
Zurück zum Zitat Grard, M., Brégier, R., Sella, F., Dellandréa, E., & Chen, L. (2018). Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network. In 2017 international workshop on human-friendly robotics (Vol. 7, pp. 207–221). Springer proceedings in advanced robotics, Springer. Grard, M., Brégier, R., Sella, F., Dellandréa, E., & Chen, L. (2018). Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network. In 2017 international workshop on human-friendly robotics (Vol. 7, pp. 207–221). Springer proceedings in advanced robotics, Springer.
Zurück zum Zitat Guan, S., Khan, A. A., Sikdar, S., Chitnis, P. V. (2018). Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. Journal of Biomedical and Health Informatics. Guan, S., Khan, A. A., Sikdar, S., Chitnis, P. V. (2018). Fully dense UNet for 2D sparse photoacoustic tomography artifact removal. Journal of Biomedical and Health Informatics.
Zurück zum Zitat Hayder, Z., He, X., & Salzmann, M. (2017). Boundary-aware instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 587–595). IEEE Computer Society. Hayder, Z., He, X., & Salzmann, M. (2017). Boundary-aware instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 587–595). IEEE Computer Society.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., & Girshick, R. B. (2017). Mask R-CNN. In International conference on computer vision (ICCV) (pp. 2980–2988). IEEE Computer Society. He, K., Gkioxari, G., Dollár, P., & Girshick, R. B. (2017). Mask R-CNN. In International conference on computer vision (ICCV) (pp. 2980–2988). IEEE Computer Society.
Zurück zum Zitat He, X., & Yuille, A. (2010). Occlusion boundary detection using pseudo-depth. In European conference on computer vision (ECCV) part IV (Vol. 6314, pp. 539–552). Lecture notes in computer science, Springer. He, X., & Yuille, A. (2010). Occlusion boundary detection using pseudo-depth. In European conference on computer vision (ECCV) part IV (Vol. 6314, pp. 539–552). Lecture notes in computer science, Springer.
Zurück zum Zitat Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Conference on computer vision and pattern recognition (CVPR) (pp. 2261–2269). IEEE Computer Society. Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Conference on computer vision and pattern recognition (CVPR) (pp. 2261–2269). IEEE Computer Society.
Zurück zum Zitat Humayun, A., Mac Aodha, O., Brostow, G. J. (2011). Learning to find occlusion regions. In Conference on computer vision and pattern recognition (CVPR) (pp. 2161–2168). IEEE Computer Society. Humayun, A., Mac Aodha, O., Brostow, G. J. (2011). Learning to find occlusion regions. In Conference on computer vision and pattern recognition (CVPR) (pp. 2161–2168). IEEE Computer Society.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In International conference on multimedia (pp. 675–678). ACM, MM’14. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In International conference on multimedia (pp. 675–678). ACM, MM’14.
Zurück zum Zitat Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Conference on computer vision and pattern recognition (CVPR) (pp. 7482–7491). IEEE Computer Society. Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Conference on computer vision and pattern recognition (CVPR) (pp. 7482–7491). IEEE Computer Society.
Zurück zum Zitat Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (ICLR). Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).
Zurück zum Zitat Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., & Rother, C. (2017). InstanceCut: From edges to instances with multicut. In Conference on computer vision and pattern recognition (CVPR) (pp. 7322–7331). IEEE Computer Society. Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., & Rother, C. (2017). InstanceCut: From edges to instances with multicut. In Conference on computer vision and pattern recognition (CVPR) (pp. 7322–7331). IEEE Computer Society.
Zurück zum Zitat Kong, S., & Fowlkes, C. C. (2018). Recurrent pixel embedding for instance grouping. In Conference on computer vision and pattern recognition (CVPR) (pp. 9018–9028). IEEE Computer Society. Kong, S., & Fowlkes, C. C. (2018). Recurrent pixel embedding for instance grouping. In Conference on computer vision and pattern recognition (CVPR) (pp. 9018–9028). IEEE Computer Society.
Zurück zum Zitat Lee, W., Na, J., & Kim, G. (2019). Multi-task self-supervised object detection via recycling of bounding box annotations. In Conference on computer vision and pattern recognition (CVPR) (pp. 4984–4993). Computer Vision Foundation/IEEE. Lee, W., Na, J., & Kim, G. (2019). Multi-task self-supervised object detection via recycling of bounding box annotations. In Conference on computer vision and pattern recognition (CVPR) (pp. 4984–4993). Computer Vision Foundation/IEEE.
Zurück zum Zitat Li, B., Shen, C., Dai, Y., van den Hengel, A., & He, M. (2015). Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Conference on computer vision and pattern recognition (CVPR) (pp. 1119–1127). IEEE Computer Society. Li, B., Shen, C., Dai, Y., van den Hengel, A., & He, M. (2015). Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Conference on computer vision and pattern recognition (CVPR) (pp. 1119–1127). IEEE Computer Society.
Zurück zum Zitat Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 247–256). IEEE Computer Society. Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 247–256). IEEE Computer Society.
Zurück zum Zitat Lin, T. Y., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In International conference on computer vision (ICCV) (pp. 2999–3007). IEEE Computer Society. Lin, T. Y., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In International conference on computer vision (ICCV) (pp. 2999–3007). IEEE Computer Society.
Zurück zum Zitat Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV) Part V (Vol. 8693, pp. 740–755). Lecture notes in computer science, Springer. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV) Part V (Vol. 8693, pp. 740–755). Lecture notes in computer science, Springer.
Zurück zum Zitat Liu, F., Shen, C., Lin, G., & Reid, I. D. (2016). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 38(10), 2024–2039.CrossRef Liu, F., Shen, C., Lin, G., & Reid, I. D. (2016). Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 38(10), 2024–2039.CrossRef
Zurück zum Zitat Liu, G., Si, J., Hu, Y., & Li, S. (2018a). Photographic image synthesis with improved U-net. In International conference on advanced computational intelligence (ICACI) (pp. 402–407). IEEE. Liu, G., Si, J., Hu, Y., & Li, S. (2018a). Photographic image synthesis with improved U-net. In International conference on advanced computational intelligence (ICACI) (pp. 402–407). IEEE.
Zurück zum Zitat Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., & Yosinski, J. (2018b). An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in neural information processing systems (NeurIPS) (pp. 9628–9639). Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., & Yosinski, J. (2018b). An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in neural information processing systems (NeurIPS) (pp. 9628–9639).
Zurück zum Zitat Liu, S., Johns, E., & Davison, A. J. (2019). End-to-end multi-task learning with attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 1871–1880). Computer Vision Foundation/IEEE. Liu, S., Johns, E., & Davison, A. J. (2019). End-to-end multi-task learning with attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 1871–1880). Computer Vision Foundation/IEEE.
Zurück zum Zitat Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018c). Path aggregation network for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 8759–8768). IEEE Computer Society. Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018c). Path aggregation network for instance segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 8759–8768). IEEE Computer Society.
Zurück zum Zitat Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 5872—5881). IEEE Computer Society. Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 5872—5881). IEEE Computer Society.
Zurück zum Zitat Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In International conference on computer vision (ICCV) (pp. 2737–2745). IEEE Computer Society. Luo, P., Wang, G., Lin, L., & Wang, X. (2017). Deep dual learning for semantic image segmentation. In International conference on computer vision (ICCV) (pp. 2737–2745). IEEE Computer Society.
Zurück zum Zitat Maninis, K. K., Pont-Tuset, J., Arbeláez, P. A., & Gool, L. J. V. (2016). Convolutional oriented boundaries. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 580–596). Lecture notes in computer science, Springer. Maninis, K. K., Pont-Tuset, J., Arbeláez, P. A., & Gool, L. J. V. (2016). Convolutional oriented boundaries. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 580–596). Lecture notes in computer science, Springer.
Zurück zum Zitat Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision (ICCV) (pp. 416–423). IEEE Computer Society. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision (ICCV) (pp. 416–423). IEEE Computer Society.
Zurück zum Zitat McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2017). SceneNet RGB-D: Can 5M synthetic images beat generic imagenet pre-training on indoor segmentation? In International conference on computer vision (ICCV) (pp. 2697–2706). IEEE Computer Society. McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2017). SceneNet RGB-D: Can 5M synthetic images beat generic imagenet pre-training on indoor segmentation? In International conference on computer vision (ICCV) (pp. 2697–2706). IEEE Computer Society.
Zurück zum Zitat Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Conference on computer vision and pattern recognition (CVPR) (pp. 3994–4003). IEEE Computer Society. Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Conference on computer vision and pattern recognition (CVPR) (pp. 3994–4003). IEEE Computer Society.
Zurück zum Zitat Novotný, D., Albanie, S., Larlus, D., & Vedaldi, A. (2018). Semi-convolutional operators for instance segmentation. In European conference on computer vision (ECCV) part I (Vol. 11205, pp. 89–105). Lecture notes in computer science, Springer. Novotný, D., Albanie, S., Larlus, D., & Vedaldi, A. (2018). Semi-convolutional operators for instance segmentation. In European conference on computer vision (ECCV) part I (Vol. 11205, pp. 89–105). Lecture notes in computer science, Springer.
Zurück zum Zitat Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marqués, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(1), 128–140.CrossRef Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marqués, F., & Malik, J. (2017). Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(1), 128–140.CrossRef
Zurück zum Zitat Qi, L., Jiang, L., Liu, S., Shen, X., & Jia, J. (2019). Amodal instance segmentation with KINS dataset. In Conference on computer vision and pattern recognition (CVPR) (pp. 3014–3023). Computer Vision Foundation/IEEE. Qi, L., Jiang, L., Liu, S., Shen, X., & Jia, J. (2019). Amodal instance segmentation with KINS dataset. In Conference on computer vision and pattern recognition (CVPR) (pp. 3014–3023). Computer Vision Foundation/IEEE.
Zurück zum Zitat Ren, M., & Zemel, R. S. (2017). End-to-end instance segmentation with recurrent attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 293–301). IEEE Computer Society. Ren, M., & Zemel, R. S. (2017). End-to-end instance segmentation with recurrent attention. In Conference on computer vision and pattern recognition (CVPR) (pp. 293–301). IEEE Computer Society.
Zurück zum Zitat Ren, X., Fowlkes, C. C., Malik, J. (2006). Figure/ground assignment in natural images. In European conference on computer vision (ECCV) part II (Vol. 3952, pp. 614–627). Lecture notes in computer science, Springer. Ren, X., Fowlkes, C. C., Malik, J. (2006). Figure/ground assignment in natural images. In European conference on computer vision (ECCV) part II (Vol. 3952, pp. 614–627). Lecture notes in computer science, Springer.
Zurück zum Zitat Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent instance segmentation. In European conference on computer vision (ECCV) part VI (Vol. 9910, pp. 312–329). Lecture notes in computer science, Springer. Romera-Paredes, B., & Torr, P. H. S. (2016). Recurrent instance segmentation. In European conference on computer vision (ECCV) part VI (Vol. 9910, pp. 312–329). Lecture notes in computer science, Springer.
Zurück zum Zitat Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Lecture notes in computer science (pp. 234–241). Springer. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Lecture notes in computer science (pp. 234–241). Springer.
Zurück zum Zitat Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243). IEEE Computer Society. Ros, G., Sellart, L., Materzynska, J., Vázquez, D., & López, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Conference on computer vision and pattern recognition (CVPR) (pp. 3234–3243). IEEE Computer Society.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.MathSciNetCrossRef
Zurück zum Zitat Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Conference on computer vision and pattern recognition (CVPR) (pp. 1874–1883). IEEE Computer Society. Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Conference on computer vision and pattern recognition (CVPR) (pp. 1874–1883). IEEE Computer Society.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR), IEEE Computer Society. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR), IEEE Computer Society.
Zurück zum Zitat Stein, A., & Hebert, M. (2006). Local detection of occlusion boundaries in video. In British machine vision conference (BMVC). Stein, A., & Hebert, M. (2006). Local detection of occlusion boundaries in video. In British machine vision conference (BMVC).
Zurück zum Zitat Sun, D., Liu, C., & Pfister, H. (2014). Local layering for joint motion estimation and occlusion detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 1098–1105). IEEE Computer Society. Sun, D., Liu, C., & Pfister, H. (2014). Local layering for joint motion estimation and occlusion detection. In Conference on computer vision and pattern recognition (CVPR) (pp. 1098–1105). IEEE Computer Society.
Zurück zum Zitat Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., & Metaxas, D. N. (2018). Quantized densely connected U-Nets for efficient landmark localization. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 348–364). Lecture notes in computer science, Springer. Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., & Metaxas, D. N. (2018). Quantized densely connected U-Nets for efficient landmark localization. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 348–364). Lecture notes in computer science, Springer.
Zurück zum Zitat Wang, G., Wang, X., Li, F. W. B., & Liang, X. (2018a). DOOBNet: Deep object occlusion boundary detection from an image. In Asian conference on computer vision (ACCV) part VI (Vol. 11366, pp. 686–702). Lecture notes in computer science, Springer. Wang, G., Wang, X., Li, F. W. B., & Liang, X. (2018a). DOOBNet: Deep object occlusion boundary detection from an image. In Asian conference on computer vision (ACCV) part VI (Vol. 11366, pp. 686–702). Lecture notes in computer science, Springer.
Zurück zum Zitat Wang, P., & Yuille, A. L. (2016). DOC: Deep occlusion estimation from a single image. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 545–561). Lecture notes in computer science, Springer. Wang, P., & Yuille, A. L. (2016). DOC: Deep occlusion estimation from a single image. In European conference on computer vision (ECCV) part I (Vol. 9905, pp. 545–561). Lecture notes in computer science, Springer.
Zurück zum Zitat Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., & Cottrell, G. W. (2018b). Understanding convolution for semantic segmentation. In Winter conference on applications of computer vision (WACV) (pp. 1451–1460). Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., & Cottrell, G. W. (2018b). Understanding convolution for semantic segmentation. In Winter conference on applications of computer vision (WACV) (pp. 1451–1460).
Zurück zum Zitat Wang, Y., Zhao, X., & Huang, K. (2017). Deep crisp boundaries. In Conference on computer vision and pattern recognition (CVPR) (pp. 1724–1732). IEEE Computer Society. Wang, Y., Zhao, X., & Huang, K. (2017). Deep crisp boundaries. In Conference on computer vision and pattern recognition (CVPR) (pp. 1724–1732). IEEE Computer Society.
Zurück zum Zitat Williams, O., Isard, M., & MacCormick., J. (2011). Estimating disparity and occlusions in stereo video sequences. In Conference on computer vision and pattern recognition (CVPR) (pp. 250–257). IEEE Computer Society. Williams, O., Isard, M., & MacCormick., J. (2011). Estimating disparity and occlusions in stereo video sequences. In Conference on computer vision and pattern recognition (CVPR) (pp. 250–257). IEEE Computer Society.
Zurück zum Zitat Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In International conference on computer vision (ICCV) (pp. 1395–1403). IEEE Computer Society. Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In International conference on computer vision (ICCV) (pp. 1395–1403). IEEE Computer Society.
Zurück zum Zitat Yang, J., Price, B. L., Cohen, S., Lee, H., & Yang, M. H. (2016). Object contour detection with a fully convolutional encoder–decoder network. In Conference on computer vision and pattern recognition (CVPR) Yang, J., Price, B. L., Cohen, S., Lee, H., & Yang, M. H. (2016). Object contour detection with a fully convolutional encoder–decoder network. In Conference on computer vision and pattern recognition (CVPR)
Zurück zum Zitat Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (NIPS) (pp. 3320–3328). Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (NIPS) (pp. 3320–3328).
Zurück zum Zitat Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In International conference on learning representations (ICLR). Yu, F., & Koltun, V. (2016). Multi-scale context aggregation by dilated convolutions. In International conference on learning representations (ICLR).
Zurück zum Zitat Yu, J., Yang, L., Xu, N., Yang, J., & Huang, T. (2019). Slimmable neural networks. In International conference on learning representations (ICLR). Yu, J., Yang, L., Xu, N., Yang, J., & Huang, T. (2019). Slimmable neural networks. In International conference on learning representations (ICLR).
Zurück zum Zitat Yu, Z., Liu, W., Zou, Y., Feng, C., Ramalingam, S., Kumar, B. V. K. V., & Kautz, J. (2018). Simultaneous edge alignment and learning. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 400–417). Lecture notes in computer science, Springer. Yu, Z., Liu, W., Zou, Y., Feng, C., Ramalingam, S., Kumar, B. V. K. V., & Kautz, J. (2018). Simultaneous edge alignment and learning. In European conference on computer vision (ECCV) part III (Vol. 11207, pp. 400–417). Lecture notes in computer science, Springer.
Zurück zum Zitat Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., & Torr, P. H. (2019). Dual graph convolutional network for semantic segmentation. In British machine vision conference (BMVC). Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., & Torr, P. H. (2019). Dual graph convolutional network for semantic segmentation. In British machine vision conference (BMVC).
Zurück zum Zitat Zhu, Y., Tian, Y., Metaxas, D. N., Dollár, P. (2017). Semantic amodal segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 3001–3009). IEEE Computer Society. Zhu, Y., Tian, Y., Metaxas, D. N., Dollár, P. (2017). Semantic amodal segmentation. In Conference on computer vision and pattern recognition (CVPR) (pp. 3001–3009). IEEE Computer Society.
Zurück zum Zitat Zitnick, C. L., & Kanade, T. (2000). A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 22(7), 675–684.CrossRef Zitnick, C. L., & Kanade, T. (2000). A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis Machine Intelligence (TPAMI), 22(7), 675–684.CrossRef
Metadaten
Titel
Deep Multicameral Decoding for Localizing Unoccluded Object Instances from a Single RGB Image
verfasst von
Matthieu Grard
Emmanuel Dellandréa
Liming Chen
Publikationsdatum
27.03.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 5/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01323-0

Weitere Artikel der Ausgabe 5/2020

International Journal of Computer Vision 5/2020 Zur Ausgabe