nach oben

International Journal of Computer Vision

Erschienen in:

05.05.2021

CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse

verfasst von: Hao Chen, Youfu Li, Yongjian Deng, Guosheng Lin

Erschienen in: International Journal of Computer Vision | Ausgabe 7/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which we use the progressive predictions from the well-learned source modality to supervise learning feature hierarchies and inference in the new modality. To better select complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal cross-level interactions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in learning from a new modality, the advantages of the proposed multi-modal fusion pattern in selecting and fusing cross-modal complements, and the generalization of the proposed designs in different tasks.

Vorheriger Artikel A Shape-Aware Retargeting Approach to Transfer Human Motion and Appearance in Monocular Videos

Nächster Artikel Quo Vadis, Skeleton Action Recognition?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alpert, S., Galun, M., Basri, R., & Brandt, A. (2007). Image segmentation by probabilistic bottom-up aggregation and cue integration. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1–8).

Borji, A., Cheng, M. M., Jiang, H., & Li, J. (2015). Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12), 5706–5722.MathSciNetCrossRef

Camplani, M., Hannuna, S.L., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., & Burghardt, T. (2015). Real-time rgb-d tracking with depth scaling kernelised correlation filters and occlusion handling. In Proceedings of the British machine vision conference (pp. 145–1).

Chen, H., & Li, Y. (2018). Progressively complementarity-aware fusion network for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3051–3060).

Chen, H., Li, Y., & Su, D. (2018). Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition.

Chen, H., & Li, Y. (2019). Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, 28(6), 2825–2835.MathSciNetCrossRef

Cheng, Y., Cai, R., Li, Z., Zhao, X., & Huang, K. (2017). Locality sensitive deconvolution networks with gated fusion for RGB-D indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 3).

Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014) Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service (ICIMCS) (pp. 23–27).

Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2015). Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 569–582.CrossRef

Christoudias, C.M., Urtasun, R., Salzmann, M., & Darrell, T. (2010). Learning to recognize objects from unseen modalities. In Proceedings of European conference on computer vision (pp. 677–691).

Ciptadi, A., Hermans, T., & Rehg, J. M. (2013). An in depth view of saliency. In Proceedings of the British machine vision conference.

Cong, R., Lei, J., Fu, H., Lin, W., Huang, Q., Cao, X., & Hou, C. (2017). An iterative co-saliency framework for RGBD images. IEEE Transactions on Cybernetics

Cong, R., Lei, J., Fu, H., Huang, Q., Cao, X., & Hou, C. (2018). Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Transactions on Image Processing, 27(2), 568–579.MathSciNetCrossRef

Cong, R., Lei, J., Zhang, C., Huang, Q., Cao, X., & Hou, C. (2016). Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. Signal Processing Letters, 23(6), 819–823.CrossRef

Desingh, K., Krishna, K. M., Rajan, D., & Jawahar, C. (2013). Depth really matters: Improving visual salient region detection with depth. In Proceedings of the British machine vision conference.

Du, D., Wang, L., Wang, H., Zhao, K, & Wu, G. (2019). Translate-to-recognize networks for RGB-D scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 11836–11845.

Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE computer society conference on computer vision (pp. 4548–4557).

Fan, D. P., Gong, C., Cao, Y., Ren, B., Cheng, M. M., & Borji, A. (2018). Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of IJCAI.

Fan, D. P., Lin, Z., Zhang, Z., Zhu, M., & Cheng, M. M. (2020). Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networking Learning Systems.

Fan, X., Liu, Z., & Sun, G. (2014). Salient region detection for stereoscopic images. In Proceedings of international conference on digital signal process (pp. 454–458).

Feng, D., Barnes, N., You, S., & McCarthy, C. (2016). Local background enclosure for rgb-d salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2343–2350).

Fu, H., Xu, D., Lin, S., & Liu, J. (2015). Object-based rgbd image co-segmentation with mutex constraint. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4428–4436).

Fu, H., Xu, D., & Lin, S. (2017). Object-based multiple foreground segmentation in RGBD video. IEEE Transactions on Image Processing, 26(3), 1418–1427.MathSciNetCrossRef

Garcia, N.C., Morerio, P., & Murino, V. (2018). Modality distillation with multiple stream networks for action recognition. In Proceedings of European conference on computer vision (pp. 103–118).

Guo, J., Ren, T., & Bei, J. (2016). Salient object detection for RGB-D image via saliency evolution. In Proceedings of IEEE international conference on multimedia and expo (pp. 1–6).

Gupta, S., Girshick, R., Arbeláez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of European conference on computer vision (pp. 345–360).

Gupta, S., Hoffman, J., & Malik, J. (2016). Cross modal distillation for supervision transfer. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2827–2836).

Gupta, S., Arbeláez, P., Girshick, R., & Malik, J. (2015). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2), 133–149.MathSciNetCrossRef

Han, J., Chen, H., Liu, N., Yan, C., & Li, X. (2017). Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics.

Han, J., Shao, L., Xu, D., & Shotton, J. (2013). Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics, 43(5), 1318–1334.CrossRef

Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Proceedings of advances in neural information processing systems (pp. 545–552).

Hazirbas, C., Ma, L., Domokos, C., & Cremers, D. (2016). Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In Proceedings of Asian conference computer vision (pp. 213–228). Springer.

Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.

Hoffman, J., Gupta, S., & Darrell, T. (2016). Learning with side information through modality hallucination. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 826–834).

Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2017). Deeply supervised salient object detection with short connections. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 5300–5309).

Hou, J., Dai, A., & Nießner, M. (2019). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4421–4430).

Huang, Z., & Wang, N. (2017). Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM international conference on multimedia (pp. 675–678).

Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In Proceedings of European conference on image process (pp. 1115–1119).

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of advances in neural information processing systems (pp. 1097–1105).

Lang, C., Nguyen, T. V., Katti, H., Yadati, K., Kankanhalli, M., & Yan, S. (2012). Depth matters: Influence of depth cues on visual saliency. In Proceedings of European conference on computer vision (pp. 101–115).

Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 991–999).

Li, Q., Jin, S., & Yan, J. (2017). Mimicking very efficient network for object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7341–7349).

Li, J., Liu, Y., Gong, D., Shi, Q., Yuan, X., Zhao, C., & Reid, I. (2019). RGBD based dimensional decomposition residual network for 3D semantic scene completion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7693–7702).

Li, G., Liu, Z., Ye, L., Wang, Y., & Ling, H. (2020b). Cross-modal weighting network for RGB-D salient object detection. In Proceedings of European conference on computer vision (pp. 665–681).

Li, N., Ye, J., Ji, Y., Ling, H., & Yu, J. (2014). Saliency detection on light field. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2806–2813).

Li, G., Gan, Y., Wu, H., Xiao, N., & Lin, L. (2018). Cross-modal attentional context learning for RGB-D object detection. IEEE Transactions on Image Processing, 28(4), 1591–1601.MathSciNetCrossRef

Li, G., Liu, Z., & Ling, H. (2020a). ICNet: Information conversion network for RGB-D based salient object detection. IEEE Transactions on Image Processing, 29, 4873–4884.CrossRef

Lin, D., Chen, G., Cohen-Or. D., Heng. P. A., & Huang, H. (2017a). Cascaded feature network for semantic segmentation of RGB-D images. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1320–1328).

Lin, G., Milan, A., Shen, C., & Reid, I. (2017b). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 1, p. 3).

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3431–3440).

Mahadevan, V., Vasconcelos, N., et al. (2013). Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 541–554.CrossRef

Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 248–255).

Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3994–4003).

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of international conference on machine learning (pp. 689–696).

Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 454–461).

Park, S.J., Hong, K.S., & Lee, S. (2017). RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition.

Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014). RGBD salient object detection: a benchmark and algorithms. In Proceedings of European conference on computer vision (pp. 92–109).

Piao, Y., Ji, W., Li, J., Zhang, M., & Lu, H. (2019). Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE computer society conference on computer vision (pp. 7254–7263).

Piao, Y., Rong, Z., Zhang, M., & Lu, H. (2020). Exploit and replace: An asymmetrical two-stream architecture for versatile light field saliency detection. In AAAI (pp. 11865–11873).

Qi, X., Liao, R., Jia, J., Fidler, S., & Urtasun, R. (2017). 3D graph neural networks for RGBD semantic segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 5199–5208).

Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., & Yang, Q. (2017). Rgbd salient object detection via deep fusion. IEEE Transactions on Image Processing, 26(5), 2274–2285.MathSciNetCrossRef

Ren, J., Gong, X., Yu, L., Zhou, W., & Ying Yang, M. (2015). Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 25–32).

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.

Shao, L., & Brady, M. (2006). Specific object retrieval based on salient regions. Pattern Recognition, 39(10), 1932–1948.CrossRef

Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Proceedings of advances in neural information processing systems (pp. 935–943).

Song, S., Lichtenberg, S.P., & Xiao, J. (2015). SUN RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 567–576).

Song, H., Liu, Z., Du, H., Sun, G., Le Meur, O., & Ren, T. (2017). Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing, 26(9), 4204–4216.MathSciNetCrossRef

Wang, W., & Neumann, U. (2018). Depth-aware CNN for RGB-D segmentation. In Proceedings of the IEEE computer society conference on computer vision (pp. 135–150).

Wang, A., Cai, J., Lu, J., & Cham, T. J. (2015). Mmss: Multi-modal sharable and specific feature learning for RGB-D object recognition. In Proceedings of the IEEE computer society conference on computer vision (pp. 1125–1133).

Xu, X., Li, Y., Wu, G., & Luo, J. (2017). Multi-modal deep feature learning for RGB-D object detection. Pattern Recognition, 72, 300–313.CrossRef

Yan, Q., Xu, L., Shi, J., & Jia, J. (2013). Hierarchical saliency detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1155–1162).

Yang, J., & Yang, M. H. (2017). Top-down visual saliency via joint crf and dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(3), 576–588.CrossRef

Zeng, J., Tong, Y., Huang, Y., Yan, Q., Sun, W., Chen, J., & Wang, Y. (2019). Deep surface normal estimation with hierarchical RGB-D fusion. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 6153–6162).

Zhang, M., Li, J., Wei, J., Piao, Y., & Lu, H. (2019). Memory-oriented decoder for light field salient object detection. In Proceedings of advances in neural information processing systems (pp. 898–908).

Zhang, M., Ji, W., Piao, Y., Li, J., Zhang, Y., Xu, S., et al. (2020). LFNet: Light field fusion network for salient object detection. IEEE Transactions on Image Processing, 29, 6276–6287.CrossRef

Zhao, J. X., Cao, Y., Fan, D. P., Cheng, M. M., Li, X. Y., & Zhang, L. (2019). Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3927–3936).

Zhao, X., Pang, Y., Zhang, L., Lu, H., & Zhang, L. (2020a). Suppress and balance: A simple gated network for salient object detection. In Proceedings of European conference on computer vision.

Zhao, X., Zhang, L., Pang, Y., Lu, H., & Zhang, L. (2020b). A single stream network for robust and real-time RGB-D salient object detection. In Proceedings of European conference on computer vision.

Zhou, T., Fan, D. P., Cheng, M. M., Shen, J., & Shao, L. (2020). RGB-D salient object detection: A survey. Computational Visual Media, pp. 1–33

Zhu, C., & Li, G. (2017). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE computer society conference on computer vision (pp. 3008–3014).

Zhu, H., Weibel, J. B., & Lu, S. (2016). Discriminative multi-modal feature fusion for RGBD indoor scene recognition. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2969–2976).

Titel: CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse
verfasst von: Hao Chen
Youfu Li
Yongjian Deng
Guosheng Lin
Publikationsdatum: 05.05.2021
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 7/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-021-01452-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 7/2021

The Isowarp: The Template-Based Visual Geometry of Isometric Surfaces

Guest Editorial: Special Issue on Performance Evaluation in Computer Vision

Scale-Aware Domain Adaptive Faster R-CNN

Guest Editorial: Special Issue on “Computer Vision for All Seasons: Adverse Weather and Lighting Conditions”

Mitigating Demographic Bias in Facial Datasets with Style-Based Multi-attribute Transfer

Quo Vadis, Skeleton Action Recognition?