Skip to main content
Erschienen in: International Journal of Computer Vision 7/2021

05.05.2021

CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse

verfasst von: Hao Chen, Youfu Li, Yongjian Deng, Guosheng Lin

Erschienen in: International Journal of Computer Vision | Ausgabe 7/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which we use the progressive predictions from the well-learned source modality to supervise learning feature hierarchies and inference in the new modality. To better select complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal cross-level interactions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in learning from a new modality, the advantages of the proposed multi-modal fusion pattern in selecting and fusing cross-modal complements, and the generalization of the proposed designs in different tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1–8). Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1–8).
Zurück zum Zitat Borji, A., Cheng, M. M., Jiang, H., & Li, J. (2015). Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12), 5706–5722.MathSciNetCrossRef Borji, A., Cheng, M. M., Jiang, H., & Li, J. (2015). Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12), 5706–5722.MathSciNetCrossRef
Zurück zum Zitat Camplani, M., Hannuna, S.L., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., & Burghardt, T. (2015). Real-time rgb-d tracking with depth scaling kernelised correlation filters and occlusion handling. In Proceedings of the British machine vision conference (pp. 145–1). Camplani, M., Hannuna, S.L., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., & Burghardt, T. (2015). Real-time rgb-d tracking with depth scaling kernelised correlation filters and occlusion handling. In Proceedings of the British machine vision conference (pp. 145–1).
Zurück zum Zitat Chen, H., & Li, Y. (2018). Progressively complementarity-aware fusion network for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3051–3060). Chen, H., & Li, Y. (2018). Progressively complementarity-aware fusion network for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3051–3060).
Zurück zum Zitat Chen, H., Li, Y., & Su, D. (2018). Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition. Chen, H., Li, Y., & Su, D. (2018). Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition.
Zurück zum Zitat Chen, H., & Li, Y. (2019). Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, 28(6), 2825–2835.MathSciNetCrossRef Chen, H., & Li, Y. (2019). Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, 28(6), 2825–2835.MathSciNetCrossRef
Zurück zum Zitat Cheng, Y., Cai, R., Li, Z., Zhao, X., & Huang, K. (2017). Locality sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 3). Cheng, Y., Cai, R., Li, Z., Zhao, X., & Huang, K. (2017). Locality sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 3).
Zurück zum Zitat Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014) Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service (ICIMCS) (pp. 23–27). Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014) Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service (ICIMCS) (pp. 23–27).
Zurück zum Zitat Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2015). Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 569–582.CrossRef Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2015). Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 569–582.CrossRef
Zurück zum Zitat Christoudias, C.M., Urtasun, R., Salzmann, M., & Darrell, T. (2010). Learning to recognize objects from unseen modalities. In Proceedings of European conference on computer vision (pp. 677–691). Christoudias, C.M., Urtasun, R., Salzmann, M., & Darrell, T. (2010). Learning to recognize objects from unseen modalities. In Proceedings of European conference on computer vision (pp. 677–691).
Zurück zum Zitat Ciptadi, A., Hermans, T., & Rehg, J. M. (2013). An in depth view of saliency. In Proceedings of the British machine vision conference. Ciptadi, A., Hermans, T., & Rehg, J. M. (2013). An in depth view of saliency. In Proceedings of the British machine vision conference.
Zurück zum Zitat Cong, R., Lei, J., Fu, H., Lin, W., Huang, Q., Cao, X., & Hou, C. (2017). An iterative co-saliency framework for RGBD images. IEEE Transactions on Cybernetics Cong, R., Lei, J., Fu, H., Lin, W., Huang, Q., Cao, X., & Hou, C. (2017). An iterative co-saliency framework for RGBD images. IEEE Transactions on Cybernetics
Zurück zum Zitat Cong, R., Lei, J., Fu, H., Huang, Q., Cao, X., & Hou, C. (2018). Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Transactions on Image Processing, 27(2), 568–579.MathSciNetCrossRef Cong, R., Lei, J., Fu, H., Huang, Q., Cao, X., & Hou, C. (2018). Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Transactions on Image Processing, 27(2), 568–579.MathSciNetCrossRef
Zurück zum Zitat Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., & Hou, C. (2016). Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. Signal Processing Letters, 23(6), 819–823.CrossRef Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., & Hou, C. (2016). Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. Signal Processing Letters, 23(6), 819–823.CrossRef
Zurück zum Zitat Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In Proceedings of the British machine vision conference. Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In Proceedings of the British machine vision conference.
Zurück zum Zitat Du, D., Wang, L., Wang, H., Zhao, K, & Wu, G. (2019). Translate-to-recognize networks for RGB-D scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 11836–11845. Du, D., Wang, L., Wang, H., Zhao, K, & Wu, G. (2019). Translate-to-recognize networks for RGB-D scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 11836–11845.
Zurück zum Zitat Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE computer society conference on computer vision (pp. 4548–4557). Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE computer society conference on computer vision (pp. 4548–4557).
Zurück zum Zitat Fan, D. P., Gong, C., Cao, Y., Ren, B., Cheng, M. M., & Borji, A. (2018). Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of IJCAI. Fan, D. P., Gong, C., Cao, Y., Ren, B., Cheng, M. M., & Borji, A. (2018). Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of IJCAI.
Zurück zum Zitat Fan, D. P., Lin, Z., Zhang, Z., Zhu, M., & Cheng, M. M. (2020). Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networking Learning Systems. Fan, D. P., Lin, Z., Zhang, Z., Zhu, M., & Cheng, M. M. (2020). Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networking Learning Systems.
Zurück zum Zitat Fan, X., Liu, Z., & Sun, G. (2014). Salient region detection for stereoscopic images. In Proceedings of international conference on digital signal process (pp. 454–458). Fan, X., Liu, Z., & Sun, G. (2014). Salient region detection for stereoscopic images. In Proceedings of international conference on digital signal process (pp. 454–458).
Zurück zum Zitat Feng, D., Barnes, N., You, S., & McCarthy, C. (2016). Local background enclosure for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2343–2350). Feng, D., Barnes, N., You, S., & McCarthy, C. (2016). Local background enclosure for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2343–2350).
Zurück zum Zitat Fu, H., Xu, D., Lin, S., & Liu, J. (2015). Object-based rgbd image co-segmentation with mutex constraint. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4428–4436). Fu, H., Xu, D., Lin, S., & Liu, J. (2015). Object-based rgbd image co-segmentation with mutex constraint. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4428–4436).
Zurück zum Zitat Fu, H., Xu, D., & Lin, S. (2017). Object-based multiple foreground segmentation in RGBD video. IEEE Transactions on Image Processing, 26(3), 1418–1427.MathSciNetCrossRef Fu, H., Xu, D., & Lin, S. (2017). Object-based multiple foreground segmentation in RGBD video. IEEE Transactions on Image Processing, 26(3), 1418–1427.MathSciNetCrossRef
Zurück zum Zitat Garcia, N.C., Morerio, P., & Murino, V. (2018). Modality distillation with multiple stream networks for action recognition. In Proceedings of European conference on computer vision (pp. 103–118). Garcia, N.C., Morerio, P., & Murino, V. (2018). Modality distillation with multiple stream networks for action recognition. In Proceedings of European conference on computer vision (pp. 103–118).
Zurück zum Zitat Guo, J., Ren, T., & Bei, J. (2016). Salient object detection for RGB-D image via saliency evolution. In Proceedings of IEEE international conference on multimedia and expo (pp. 1–6). Guo, J., Ren, T., & Bei, J. (2016). Salient object detection for RGB-D image via saliency evolution. In Proceedings of IEEE international conference on multimedia and expo (pp. 1–6).
Zurück zum Zitat Gupta, S., Girshick, R., Arbeláez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of European conference on computer vision (pp. 345–360). Gupta, S., Girshick, R., Arbeláez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of European conference on computer vision (pp. 345–360).
Zurück zum Zitat Gupta, S., Hoffman, J., & Malik, J. (2016). Cross modal distillation for supervision transfer. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2827–2836). Gupta, S., Hoffman, J., & Malik, J. (2016). Cross modal distillation for supervision transfer. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2827–2836).
Zurück zum Zitat Gupta, S., Arbeláez, P., Girshick, R., & Malik, J. (2015). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2), 133–149.MathSciNetCrossRef Gupta, S., Arbeláez, P., Girshick, R., & Malik, J. (2015). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2), 133–149.MathSciNetCrossRef
Zurück zum Zitat Han, J., Chen, H., Liu, N., Yan, C., & Li, X. (2017). Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics. Han, J., Chen, H., Liu, N., Yan, C., & Li, X. (2017). Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics.
Zurück zum Zitat Han, J., Shao, L., Xu, D., & Shotton, J. (2013). Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics, 43(5), 1318–1334.CrossRef Han, J., Shao, L., Xu, D., & Shotton, J. (2013). Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics, 43(5), 1318–1334.CrossRef
Zurück zum Zitat Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Proceedings of advances in neural information processing systems (pp. 545–552). Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Proceedings of advances in neural information processing systems (pp. 545–552).
Zurück zum Zitat Hazirbas, C., Ma, L., Domokos, C., & Cremers, D. (2016). Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of Asian conference computer vision (pp. 213–228). Springer. Hazirbas, C., Ma, L., Domokos, C., & Cremers, D. (2016). Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of Asian conference computer vision (pp. 213–228). Springer.
Zurück zum Zitat Hoffman, J., Gupta, S., & Darrell, T. (2016). Learning with side information through modality hallucination. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 826–834). Hoffman, J., Gupta, S., & Darrell, T. (2016). Learning with side information through modality hallucination. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 826–834).
Zurück zum Zitat Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2017). Deeply supervised salient object detection with short connections. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 5300–5309). Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2017). Deeply supervised salient object detection with short connections. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 5300–5309).
Zurück zum Zitat Hou, J., Dai, A., & Nießner, M. (2019). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4421–4430). Hou, J., Dai, A., & Nießner, M. (2019). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4421–4430).
Zurück zum Zitat Huang, Z., & Wang, N. (2017). Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219. Huang, Z., & Wang, N. (2017). Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:​1707.​01219.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM international conference on multimedia (pp. 675–678). Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM international conference on multimedia (pp. 675–678).
Zurück zum Zitat Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In Proceedings of European conference on image process (pp. 1115–1119). Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In Proceedings of European conference on image process (pp. 1115–1119).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of advances in neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of advances in neural information processing systems (pp. 1097–1105).
Zurück zum Zitat Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., & Yan, S. (2012). Depth matters: Influence of depth cues on visual saliency. In Proceedings of European conference on computer vision (pp. 101–115). Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., & Yan, S. (2012). Depth matters: Influence of depth cues on visual saliency. In Proceedings of European conference on computer vision (pp. 101–115).
Zurück zum Zitat Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 991–999). Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 991–999).
Zurück zum Zitat Li, Q., Jin, S., & Yan, J. (2017). Mimicking very efficient network for object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7341–7349). Li, Q., Jin, S., & Yan, J. (2017). Mimicking very efficient network for object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7341–7349).
Zurück zum Zitat Li, J., Liu, Y., Gong, D., Shi, Q., Yuan, X., Zhao, C., & Reid, I. (2019). RGBD based dimensional decomposition residual network for 3D semantic scene completion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7693–7702). Li, J., Liu, Y., Gong, D., Shi, Q., Yuan, X., Zhao, C., & Reid, I. (2019). RGBD based dimensional decomposition residual network for 3D semantic scene completion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7693–7702).
Zurück zum Zitat Li, G., Liu, Z., Ye, L., Wang, Y., & Ling, H. (2020b). Cross-modal weighting network for RGB-D salient object detection. In Proceedings of European conference on computer vision (pp. 665–681). Li, G., Liu, Z., Ye, L., Wang, Y., & Ling, H. (2020b). Cross-modal weighting network for RGB-D salient object detection. In Proceedings of European conference on computer vision (pp. 665–681).
Zurück zum Zitat Li, N., Ye, J., Ji, Y., Ling, H., & Yu, J. (2014). Saliency detection on light field. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2806–2813). Li, N., Ye, J., Ji, Y., Ling, H., & Yu, J. (2014). Saliency detection on light field. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2806–2813).
Zurück zum Zitat Li, G., Gan, Y., Wu, H., Xiao, N., & Lin, L. (2018). Cross-modal attentional context learning for RGB-D object detection. IEEE Transactions on Image Processing, 28(4), 1591–1601.MathSciNetCrossRef Li, G., Gan, Y., Wu, H., Xiao, N., & Lin, L. (2018). Cross-modal attentional context learning for RGB-D object detection. IEEE Transactions on Image Processing, 28(4), 1591–1601.MathSciNetCrossRef
Zurück zum Zitat Li, G., Liu, Z., & Ling, H. (2020a). ICNet: Information conversion network for RGB-D based salient object detection. IEEE Transactions on Image Processing, 29, 4873–4884.CrossRef Li, G., Liu, Z., & Ling, H. (2020a). ICNet: Information conversion network for RGB-D based salient object detection. IEEE Transactions on Image Processing, 29, 4873–4884.CrossRef
Zurück zum Zitat Lin, D., Chen, G., Cohen-Or. D., Heng. P. A., & Huang, H. (2017a). Cascaded feature network for semantic segmentation of RGB-D images. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1320–1328). Lin, D., Chen, G., Cohen-Or. D., Heng. P. A., & Huang, H. (2017a). Cascaded feature network for semantic segmentation of RGB-D images. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1320–1328).
Zurück zum Zitat Lin, G., Milan, A., Shen, C., & Reid, I. (2017b). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 1, p. 3). Lin, G., Milan, A., Shen, C., & Reid, I. (2017b). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 1, p. 3).
Zurück zum Zitat Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3431–3440). Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3431–3440).
Zurück zum Zitat Mahadevan, V., Vasconcelos, N., et al. (2013). Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 541–554.CrossRef Mahadevan, V., Vasconcelos, N., et al. (2013). Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 541–554.CrossRef
Zurück zum Zitat Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 248–255). Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 248–255).
Zurück zum Zitat Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3994–4003). Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3994–4003).
Zurück zum Zitat Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of international conference on machine learning (pp. 689–696). Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of international conference on machine learning (pp. 689–696).
Zurück zum Zitat Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 454–461). Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 454–461).
Zurück zum Zitat Park, S.J., Hong, K.S., & Lee, S. (2017). RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Park, S.J., Hong, K.S., & Lee, S. (2017). RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition.
Zurück zum Zitat Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014). RGBD salient object detection: a benchmark and algorithms. In Proceedings of European conference on computer vision (pp. 92–109). Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014). RGBD salient object detection: a benchmark and algorithms. In Proceedings of European conference on computer vision (pp. 92–109).
Zurück zum Zitat Piao, Y., Ji, W., Li, J., Zhang, M., & Lu, H. (2019). Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE computer society conference on computer vision (pp. 7254–7263). Piao, Y., Ji, W., Li, J., Zhang, M., & Lu, H. (2019). Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE computer society conference on computer vision (pp. 7254–7263).
Zurück zum Zitat Piao, Y., Rong, Z., Zhang, M., & Lu, H. (2020). Exploit and replace: An asymmetrical two-stream architecture for versatile light field saliency detection. In AAAI (pp. 11865–11873). Piao, Y., Rong, Z., Zhang, M., & Lu, H. (2020). Exploit and replace: An asymmetrical two-stream architecture for versatile light field saliency detection. In AAAI (pp. 11865–11873).
Zurück zum Zitat Qi, X., Liao, R., Jia, J., Fidler, S., & Urtasun, R. (2017). 3D graph neural networks for RGBD semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 5199–5208). Qi, X., Liao, R., Jia, J., Fidler, S., & Urtasun, R. (2017). 3D graph neural networks for RGBD semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 5199–5208).
Zurück zum Zitat Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., & Yang, Q. (2017). Rgbd salient object detection via deep fusion. IEEE Transactions on Image Processing, 26(5), 2274–2285.MathSciNetCrossRef Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., & Yang, Q. (2017). Rgbd salient object detection via deep fusion. IEEE Transactions on Image Processing, 26(5), 2274–2285.MathSciNetCrossRef
Zurück zum Zitat Ren, J., Gong, X., Yu, L., Zhou, W., & Ying Yang, M. (2015). Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 25–32). Ren, J., Gong, X., Yu, L., Zhou, W., & Ying Yang, M. (2015). Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 25–32).
Zurück zum Zitat Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550. Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:​1412.​6550.
Zurück zum Zitat Shao, L., & Brady, M. (2006). Specific object retrieval based on salient regions. Pattern Recognition, 39(10), 1932–1948.CrossRef Shao, L., & Brady, M. (2006). Specific object retrieval based on salient regions. Pattern Recognition, 39(10), 1932–1948.CrossRef
Zurück zum Zitat Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Proceedings of advances in neural information processing systems (pp. 935–943). Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Proceedings of advances in neural information processing systems (pp. 935–943).
Zurück zum Zitat Song, S., Lichtenberg, S.P., & Xiao, J. (2015). SUN RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 567–576). Song, S., Lichtenberg, S.P., & Xiao, J. (2015). SUN RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 567–576).
Zurück zum Zitat Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., & Ren, T. (2017). Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing, 26(9), 4204–4216.MathSciNetCrossRef Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., & Ren, T. (2017). Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing, 26(9), 4204–4216.MathSciNetCrossRef
Zurück zum Zitat Wang, W., & Neumann, U. (2018). Depth-aware CNN for RGB-D segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 135–150). Wang, W., & Neumann, U. (2018). Depth-aware CNN for RGB-D segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 135–150).
Zurück zum Zitat Wang, A., Cai, J., Lu, J., & Cham, T. J. (2015). Mmss: Multi-modal sharable and specific feature learning for RGB-D object recognition. In Proceedings of the IEEE computer society conference on computer vision (pp. 1125–1133). Wang, A., Cai, J., Lu, J., & Cham, T. J. (2015). Mmss: Multi-modal sharable and specific feature learning for RGB-D object recognition. In Proceedings of the IEEE computer society conference on computer vision (pp. 1125–1133).
Zurück zum Zitat Xu, X., Li, Y., Wu, G., & Luo, J. (2017). Multi-modal deep feature learning for RGB-D object detection. Pattern Recognition, 72, 300–313.CrossRef Xu, X., Li, Y., Wu, G., & Luo, J. (2017). Multi-modal deep feature learning for RGB-D object detection. Pattern Recognition, 72, 300–313.CrossRef
Zurück zum Zitat Yan, Q., Xu, L., Shi, J., & Jia, J. (2013). Hierarchical saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1155–1162). Yan, Q., Xu, L., Shi, J., & Jia, J. (2013). Hierarchical saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1155–1162).
Zurück zum Zitat Yang, J., & Yang, M. H. (2017). Top-down visual saliency via joint crf and dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(3), 576–588.CrossRef Yang, J., & Yang, M. H. (2017). Top-down visual saliency via joint crf and dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(3), 576–588.CrossRef
Zurück zum Zitat Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep surface normal estimation with hierarchical RGB-D fusion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 6153–6162). Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep surface normal estimation with hierarchical RGB-D fusion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 6153–6162).
Zurück zum Zitat Zhang, M., Li, J., Wei, J., Piao, Y., & Lu, H. (2019). Memory-oriented decoder for light field salient object detection. In Proceedings of advances in neural information processing systems (pp. 898–908). Zhang, M., Li, J., Wei, J., Piao, Y., & Lu, H. (2019). Memory-oriented decoder for light field salient object detection. In Proceedings of advances in neural information processing systems (pp. 898–908).
Zurück zum Zitat Zhang, M., Ji, W., Piao, Y., Li, J., Zhang, Y., Xu, S., et al. (2020). LFNet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing, 29, 6276–6287.CrossRef Zhang, M., Ji, W., Piao, Y., Li, J., Zhang, Y., Xu, S., et al. (2020). LFNet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing, 29, 6276–6287.CrossRef
Zurück zum Zitat Zhao, J. X., Cao, Y., Fan, D. P., Cheng, M. M., Li, X. Y., & Zhang, L. (2019). Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3927–3936). Zhao, J. X., Cao, Y., Fan, D. P., Cheng, M. M., Li, X. Y., & Zhang, L. (2019). Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3927–3936).
Zurück zum Zitat Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L. (2020a). Suppress and balance: A simple gated network for salient object detection. In Proceedings of European conference on computer vision. Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L. (2020a). Suppress and balance: A simple gated network for salient object detection. In Proceedings of European conference on computer vision.
Zurück zum Zitat Zhao, X., Zhang, L., Pang, Y., Lu, H., & Zhang, L. (2020b). A single stream network for robust and real-time RGB-D salient object detection. In Proceedings of European conference on computer vision. Zhao, X., Zhang, L., Pang, Y., Lu, H., & Zhang, L. (2020b). A single stream network for robust and real-time RGB-D salient object detection. In Proceedings of European conference on computer vision.
Zurück zum Zitat Zhou, T., Fan, D. P., Cheng, M. M., Shen, J., & Shao, L. (2020). RGB-D salient object detection: A survey. Computational Visual Media, pp. 1–33 Zhou, T., Fan, D. P., Cheng, M. M., Shen, J., & Shao, L. (2020). RGB-D salient object detection: A survey. Computational Visual Media, pp. 1–33
Zurück zum Zitat Zhu, C., & Li, G. (2017). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE computer society conference on computer vision (pp. 3008–3014). Zhu, C., & Li, G. (2017). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE computer society conference on computer vision (pp. 3008–3014).
Zurück zum Zitat Zhu, H., Weibel, J. B., & Lu, S. (2016). Discriminative multi-modal feature fusion for RGBD indoor scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2969–2976). Zhu, H., Weibel, J. B., & Lu, S. (2016). Discriminative multi-modal feature fusion for RGBD indoor scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2969–2976).
Metadaten
Titel
CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse
verfasst von
Hao Chen
Youfu Li
Yongjian Deng
Guosheng Lin
Publikationsdatum
05.05.2021
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 7/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-021-01452-0

Weitere Artikel der Ausgabe 7/2021

International Journal of Computer Vision 7/2021 Zur Ausgabe