Skip to main content

2018 | OriginalPaper | Buchkapitel

Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection

verfasst von : Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM). A Pyramid Dilated Convolution (PDC) module is first designed for simultaneously extracting spatial features at multiple scales. These spatial features are then concatenated and fed into an extended Deeper Bidirectional ConvLSTM (DB-ConvLSTM) to learn spatiotemporal information. Forward and backward ConvLSTM units are placed in two layers and connected in a cascaded way, encouraging information flow between the bi-directional streams and leading to deeper feature extraction. We further augment DB-ConvLSTM with a PDC-like structure, by adopting several dilated DB-ConvLSTMs to extract multi-scale spatiotemporal information. Extensive experimental results show that our method outperforms previous video saliency models in a large margin, with a real-time speed of 20 fps on a single GPU. With unsupervised video object segmentation as an example application, the proposed model (with a CRF-based post-process) achieves state-of-the-art results on two popular benchmarks, well demonstrating its superior performance and high applicability.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE TPAMI 33(3), 500–513 (2011)CrossRef Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE TPAMI 33(3), 500–513 (2011)CrossRef
3.
Zurück zum Zitat Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpixels. In: IEEE CVPR, pp. 2051–2058 (2013) Chang, J., Wei, D., Fisher, J.W.: A video representation using temporal superpixels. In: IEEE CVPR, pp. 2051–2058 (2013)
4.
Zurück zum Zitat Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40(4), 834–848 (2018)CrossRef Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI 40(4), 834–848 (2018)CrossRef
5.
Zurück zum Zitat Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: Joint learning for video object segmentation and optical flow. In: IEEE CVPR, pp. 686–695 (2017) Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: Joint learning for video object segmentation and optical flow. In: IEEE CVPR, pp. 686–695 (2017)
6.
Zurück zum Zitat Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.M.: Global contrast based salient region detection. IEEE TPAMI 37(3), 569–582 (2015)CrossRef Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.M.: Global contrast based salient region detection. IEEE TPAMI 37(3), 569–582 (2015)CrossRef
7.
Zurück zum Zitat Cong, R., Lei, J., Fu, H., Huang, Q., Cao, X., Hou, C.: Co-saliency detection for rgbd images based on multi-constraint feature matching and cross label propagation. IEEE TIP 27(2), 568–579 (2018)MathSciNet Cong, R., Lei, J., Fu, H., Huang, Q., Cao, X., Hou, C.: Co-saliency detection for rgbd images based on multi-constraint feature matching and cross label propagation. IEEE TIP 27(2), 568–579 (2018)MathSciNet
8.
Zurück zum Zitat Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE TIP 23(9), 3910–3921 (2014)MathSciNetMATH Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE TIP 23(9), 3910–3921 (2014)MathSciNetMATH
9.
Zurück zum Zitat Fu, H., Cao, X., Tu, Z.: Cluster-based co-saliency detection. IEEE TIP 22(10), 3766–3778 (2013)MathSciNetMATH Fu, H., Cao, X., Tu, Z.: Cluster-based co-saliency detection. IEEE TIP 22(10), 3766–3778 (2013)MathSciNetMATH
10.
Zurück zum Zitat Fu, H., Xu, D., Zhang, B., Lin, S., Ward, R.K.: Object-based multiple foreground video co-segmentation via multi-state selection graph. IEEE TIP 24(11), 3415–3424 (2015)MathSciNet Fu, H., Xu, D., Zhang, B., Lin, S., Ward, R.K.: Object-based multiple foreground video co-segmentation via multi-state selection graph. IEEE TIP 24(11), 3415–3424 (2015)MathSciNet
11.
Zurück zum Zitat Guo, F., et al.: Video saliency detection using object proposals. IEEE TCYB (2018) Guo, F., et al.: Video saliency detection using object proposals. IEEE TCYB (2018)
12.
Zurück zum Zitat Guo, J., Chao, H.: Building an end-to-end spatial-temporal convolutional network for video super-resolutiong. In: AAAI, pp. 4053–4060 (2017) Guo, J., Chao, H.: Building an end-to-end spatial-temporal convolutional network for video super-resolutiong. In: AAAI, pp. 4053–4060 (2017)
13.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI 37(9), 1904–1916 (2015)CrossRef He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI 37(9), 1904–1916 (2015)CrossRef
14.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
15.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
16.
Zurück zum Zitat Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. In: IEEE CVPR, pp. 5300–5309 (2017) Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. In: IEEE CVPR, pp. 5300–5309 (2017)
17.
Zurück zum Zitat Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE CVPR, pp. 2462–2470 (2017) Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE CVPR, pp. 2462–2470 (2017)
18.
Zurück zum Zitat Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)CrossRef Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)CrossRef
19.
Zurück zum Zitat Jain, S., Xiong, B., Grauman, K.: Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: IEEE CVPR (2017) Jain, S., Xiong, B., Grauman, K.: Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: IEEE CVPR (2017)
20.
Zurück zum Zitat Koh, Y.J., Kim, C.S.: Primary object segmentation in videos based on region augmentation and reduction. In: IEEE CVPR, pp. 7417–7425 (2017) Koh, Y.J., Kim, C.S.: Primary object segmentation in videos based on region augmentation and reduction. In: IEEE CVPR, pp. 7417–7425 (2017)
21.
Zurück zum Zitat Krahenbuhl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011) Krahenbuhl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)
22.
Zurück zum Zitat Lee, G., Tai, Y.W., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: IEEE CVPR, pp. 660–668 (2016) Lee, G., Tai, Y.W., Kim, J.: Deep saliency with encoded low level distance map and high level features. In: IEEE CVPR, pp. 660–668 (2016)
23.
Zurück zum Zitat Li, G., Xie, Y., Lin, L., Yu, Y.: Instance-level salient object segmentation. In: IEEE CVPR, pp. 247–256 (2017) Li, G., Xie, Y., Lin, L., Yu, Y.: Instance-level salient object segmentation. In: IEEE CVPR, pp. 247–256 (2017)
24.
Zurück zum Zitat Li, G., Xie, Y., Wei, T., Wang, K., Lin, L.: Flow guided recurrent neural encoder for video salient object detection. In: IEEE CVPR, pp. 3243–3252 (2018) Li, G., Xie, Y., Wei, T., Wang, K., Lin, L.: Flow guided recurrent neural encoder for video salient object detection. In: IEEE CVPR, pp. 3243–3252 (2018)
25.
Zurück zum Zitat Li, G., Yu, Y.: Deep contrast learning for salient object detection. In: IEEE CVPR, pp. 478–487 (2016) Li, G., Yu, Y.: Deep contrast learning for salient object detection. In: IEEE CVPR, pp. 478–487 (2016)
26.
Zurück zum Zitat Liu, N., Han, J.: Dhsnet: Deep hierarchical saliency network for salient object detection. In: IEEE CVPR, pp. 678–686 (2016) Liu, N., Han, J.: Dhsnet: Deep hierarchical saliency network for salient object detection. In: IEEE CVPR, pp. 678–686 (2016)
27.
Zurück zum Zitat Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE TCSVT 27(12), 2527–2542 (2017) Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE TCSVT 27(12), 2527–2542 (2017)
28.
Zurück zum Zitat Liu, Z., Zhang, X., Luo, S., Meur, O.L.: Superpixel-based spatiotemporal saliency detection. IEEE TCSVT 24(9), 1522–1540 (2014) Liu, Z., Zhang, X., Luo, S., Meur, O.L.: Superpixel-based spatiotemporal saliency detection. IEEE TCSVT 24(9), 1522–1540 (2014)
29.
Zurück zum Zitat Luo, Z., Mishra, A.K., Achkar, A., Eichel, J.A., Li, S., Jodoin, P.M.: Non-local deep features for salient object detection. In: IEEE CVPR, pp. 6593–6601 (2017) Luo, Z., Mishra, A.K., Achkar, A., Eichel, J.A., Li, S., Jodoin, P.M.: Non-local deep features for salient object detection. In: IEEE CVPR, pp. 6593–6601 (2017)
30.
Zurück zum Zitat Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: IEEE ICCV, pp. 1777–1784 (2013) Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: IEEE ICCV, pp. 1777–1784 (2013)
31.
Zurück zum Zitat Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.J.V., Gross, M.H., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE CVPR, pp. 724–732 (2016) Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.J.V., Gross, M.H., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE CVPR, pp. 724–732 (2016)
32.
Zurück zum Zitat Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015) Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)
33.
Zurück zum Zitat Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: IEEE CVPR, pp. 531–539 (2017) Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: IEEE CVPR, pp. 531–539 (2017)
34.
Zurück zum Zitat Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: IEEE ICCV (2017) Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: IEEE ICCV (2017)
36.
Zurück zum Zitat Wang, T., Borji, A., Zhang, L., Zhang, P., Lu, H.: A stagewise refinement model for detecting salient objects in images. In: IEEE ICCV, pp. 4039–4048 (2017) Wang, T., Borji, A., Zhang, L., Zhang, P., Lu, H.: A stagewise refinement model for detecting salient objects in images. In: IEEE ICCV, pp. 4039–4048 (2017)
38.
Zurück zum Zitat Wang, W., Shen, J., Li, X., Porikli, F.: Robust video object cosegmentation. IEEE TIP 24(10), 3137–3148 (2015)MathSciNet Wang, W., Shen, J., Li, X., Porikli, F.: Robust video object cosegmentation. IEEE TIP 24(10), 3137–3148 (2015)MathSciNet
39.
Zurück zum Zitat Wang, W., Shen, J.: Deep visual attention prediction. IEEE TIP 27(5), 2368–2378 (2018)MathSciNet Wang, W., Shen, J.: Deep visual attention prediction. IEEE TIP 27(5), 2368–2378 (2018)MathSciNet
40.
Zurück zum Zitat Wang, W., Shen, J., Dong, X., Borji, A.: Salient object detection driven by fixation prediction. In: IEEE CVPR, pp. 1171–1720 (2018) Wang, W., Shen, J., Dong, X., Borji, A.: Salient object detection driven by fixation prediction. In: IEEE CVPR, pp. 1171–1720 (2018)
41.
Zurück zum Zitat Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: IEEE CVPR, pp. 4894–4903 (2018) Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: a large-scale benchmark and a new model. In: IEEE CVPR, pp. 4894–4903 (2018)
42.
Zurück zum Zitat Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: IEEE CVPR, pp. 3395–3402 (2015) Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: IEEE CVPR, pp. 3395–3402 (2015)
43.
Zurück zum Zitat Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE TIP 24(11), 4185–4196 (2015)MathSciNet Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE TIP 24(11), 4185–4196 (2015)MathSciNet
44.
Zurück zum Zitat Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE TIP 27(1), 38–49 (2018)MathSciNet Wang, W., Shen, J., Shao, L.: Video salient object detection via fully convolutional networks. IEEE TIP 27(1), 38–49 (2018)MathSciNet
45.
Zurück zum Zitat Wang, W., Shen, J., Shao, L., Porikli, F.: Correspondence driven saliency transfer. IEEE TIP 25(11), 5025–5034 (2016)MathSciNet Wang, W., Shen, J., Shao, L., Porikli, F.: Correspondence driven saliency transfer. IEEE TIP 25(11), 5025–5034 (2016)MathSciNet
46.
Zurück zum Zitat Wang, W., Shen, J., Xie, J., Fatih, P.: Super-trajectory for video segmentation. In: IEEE ICCV, pp. 1671–1679 (2017) Wang, W., Shen, J., Xie, J., Fatih, P.: Super-trajectory for video segmentation. In: IEEE ICCV, pp. 1671–1679 (2017)
47.
Zurück zum Zitat Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE TPAMI 40(1), 20–33 (2018)CrossRef Wang, W., Shen, J., Yang, R., Porikli, F.: Saliency-aware video object segmentation. IEEE TPAMI 40(1), 20–33 (2018)CrossRef
48.
Zurück zum Zitat Xu, C., Corso, J.J.: Evaluation of super-voxel methods for early video processing. In: IEEE CVPR, pp. 1202–1209 (2012) Xu, C., Corso, J.J.: Evaluation of super-voxel methods for early video processing. In: IEEE CVPR, pp. 1202–1209 (2012)
49.
Zurück zum Zitat Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: IEEE CVPR, pp. 3166–3173 (2013) Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: IEEE CVPR, pp. 3166–3173 (2013)
50.
Zurück zum Zitat Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016) Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
51.
Zurück zum Zitat Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE ICCV, pp. 202–211 (2017) Zhang, P., Wang, D., Lu, H., Wang, H., Ruan, X.: Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE ICCV, pp. 202–211 (2017)
52.
Zurück zum Zitat Zhang, P., Wang, D., Lu, H., Wang, H., Yin, B.: Learning uncertain convolutional features for accurate saliency detection. In: IEEE ICCV, pp. 212–221 (2017) Zhang, P., Wang, D., Lu, H., Wang, H., Yin, B.: Learning uncertain convolutional features for accurate saliency detection. In: IEEE ICCV, pp. 212–221 (2017)
53.
Zurück zum Zitat Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE CVPR, pp. 6230–6239 (2017) Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE CVPR, pp. 6230–6239 (2017)
Metadaten
Titel
Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection
verfasst von
Hongmei Song
Wenguan Wang
Sanyuan Zhao
Jianbing Shen
Kin-Man Lam
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01252-6_44

Premium Partner