Skip to main content
Log in

Survey of recent progress in semantic image segmentation with CNNs

  • Review
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

In recent years, convolutional neural networks (CNNs) are leading the way in many computer vision tasks, such as image classification, object detection, and face recognition. In order to produce more refined semantic image segmentation, we survey the powerful CNNs and novel elaborate layers, structures and strategies, especially including those that have achieved the state-of-the-art results on the Pascal VOC 2012 semantic segmentation challenge. Moreover, we discuss their different working stages and various mechanisms to utilize the structural and contextual information in the image and feature spaces. Finally, combining some popular underlying referential methods in homologous problems, we propose several possible directions and approaches to incorporate existing effective methods as components to enhance CNNs for the segmentation of specific semantic objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Liang G Q, Ca J N, Liu X F, et al. Smart world: a better world. Sci China Inf Sci, 2016, 59: 043401

    Article  Google Scholar 

  2. Wang J L, Lu Y H, Liu J B, et al. A robust three-stage approach to large-scale urban scene recognition. Sci China Inf Sci, 2017, 60: 103101

    Article  Google Scholar 

  3. Wang W, Hu L H, Hu Z Y. Energy-based multi-view piecewise planar stereo. Sci China Inf Sci, 2017, 60: 032101

    Article  Google Scholar 

  4. Hoiem D, Efros A A, Herbert M. Recovering surface layout from an image. Int J Comput Vis, 2007, 75: 151–172

    Article  Google Scholar 

  5. Saxena A, Sun M, Ng A Y. Make3d: learning 3d scene structure from a single still image. IEEE Trans Patt Anal Mach Intell, 2009, 31: 824–840

    Article  Google Scholar 

  6. Gould S, Fulton R, Koller D. Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the IEEE International Conference on Computer Vision, Kyoto, 2009. 1–8

    Google Scholar 

  7. Gupta A, Efros A A, Hebert M. Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings of European Conference on Computer Vision, Crete, 2010. 482–496

    Google Scholar 

  8. Zhao Y B, Zhu S C. Image parsing via stochastic scene grammar. In: Proceedings of the Conference and Workshop on Neural Information Processing System, Granada, 2011. 73–81

    Google Scholar 

  9. Liu C, Yuen J, Torralba A. Nonparametric scene parsing via label transfer. IEEE Trans Patt Anal Mach Intell, 2011, 33: 2368–2382

    Article  Google Scholar 

  10. Stella X Y, Zhang H, Malik J. Inferring spatial layout from a single image via depth-ordered grouping. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, 2008

    Google Scholar 

  11. Lee D C, Hebert M, Kanade T. Geometric reasoning for single image structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2136–2143

    Google Scholar 

  12. Zheng Y, Byeungwoo J, Xu D, et al. Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst, 2015, 28: 4024–4028

    Google Scholar 

  13. Liu C, Yuen J, Torralba A. SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Softw Eng, 2010, 33: 978–994

    Google Scholar 

  14. Papandreou G, Chen L C, Murphy K P, et al. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1742–1750

    Google Scholar 

  15. Ghiasi G, Fowlkes C C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 519–534

    Google Scholar 

  16. Peng C, Zhang X Y, Yu G, et al. Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 4353–4361

    Google Scholar 

  17. Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge. Int J Comput Vis, 2010, 88: 303–338

    Article  Google Scholar 

  18. Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1529–1537

    Google Scholar 

  19. Lin G S, Shen C H, van den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3194–3203

    Google Scholar 

  20. Liu Z W, Li X X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1377–1385

    Google Scholar 

  21. Lin G S, Shen C H, Reid I, et al. Deeply learning the messages in message passing inference. Comput Sci, 2015, 71: 866–872

    Google Scholar 

  22. Shuai B, Zuo Z, Wang B, et al. Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3620–3629

    Google Scholar 

  23. Kuen J, Wang Z H, Wang G. Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3668–3677

    Google Scholar 

  24. Liang X D, Shen X H, Xiang D L, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3185–3193

    Google Scholar 

  25. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528

    Google Scholar 

  26. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, San Juan, 2016

    Google Scholar 

  27. Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915, 2016

    Google Scholar 

  28. Sermanet P, Fergus R, LeCun Y, et al. Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of International Conference on Learning Representations, Banff, 2014

    Google Scholar 

  29. Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 818–833

    Google Scholar 

  30. Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, Granada, 2011. 109–117

    Google Scholar 

  31. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego. 2015

    Google Scholar 

  32. He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770–778

    Google Scholar 

  33. Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104

    Article  Google Scholar 

  34. Wu Z F, Shen C H, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339, 2016

    Google Scholar 

  35. Hariharan B, Arbeláez P, Girshick R, et al. Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 447–456

    Google Scholar 

  36. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440

    Google Scholar 

  37. Xie S N, Tu Z W. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1395–1403

    Google Scholar 

  38. Lin G S, Milan A, Shen C H, et al. RefineNet: multi-path refinement networks with identity mappings for highresolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 1925–1934

    Google Scholar 

  39. Wu Z F, Shen C H, Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition. arXiv:1611.10080, 2016

    Google Scholar 

  40. Hong S, Oh J, Lee H, et al. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3204–3212

    Google Scholar 

  41. Chen L C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3640–3649

    Google Scholar 

  42. Liu S, Qi X J, Shi J P, et al. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3141–3149

    Google Scholar 

  43. Bertasius G, Shi J, Torresani L. Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3602–3610

    Google Scholar 

  44. Mostajabi M, Yadollahpour P, Shakhnarovich G. Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3376–3385

    Google Scholar 

  45. Hong S, Noh H, Han B. Decoupled deep neural network for semi-supervised semantic segmentation. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 1495–1503

    Google Scholar 

  46. Arnab A, Jayasumana S, Zheng S, et al. Higher order conditional random fields in deep neural networks. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 524–540

    Google Scholar 

  47. Vemulapalli R, Tuzel O, Liu M Y, et al. Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3224–3233

    Google Scholar 

  48. Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 2881–2890

    Google Scholar 

  49. Yang J, Price B, Cohen S, et al. Object contour detection with a fully convolutional encoder-decoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 193–202

    Google Scholar 

  50. Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of Artificial Intelligence and Statistics, San Diego, 2015. 562–570

    Google Scholar 

  51. Kokkinos I. Pushing the boundaries of boundary detection using deep learning. In: Proceedings of International Conference on Learning Representations, San Juan, 2016

    Google Scholar 

  52. Giusti A, Ciresan D C, Masci J, et al. Fast image scanning with deep max-pooling convolutional neural networks. In: Proceedings of the 20th IEEE International Conference on Image Processing (ICIP), Melbourne, 2013. 4034–4038

    Google Scholar 

  53. Sutton C, Mccallum A. Piecewise training for undirected models. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. Edinburgh: AUAI Press, 2005. 568–575

    Google Scholar 

  54. Adams A, Baek J, Davis M A. Fast high-dimensional filtering using the permutohedral lattice. Comput Graph Forum, 2010, 29: 753–762

    Article  Google Scholar 

  55. Dai J F, He K M, Sun J. Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1635–1643

    Google Scholar 

  56. Rother C, Kolmogorov V, Blake A. Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph, 2004, 23: 309–314

    Article  Google Scholar 

  57. Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition. Int J Comput Vis, 2013, 104: 154–171

    Article  Google Scholar 

  58. Arbeláez P, Pont-Tuset J, Barron J, et al. Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 328–335

    Google Scholar 

  59. Krahenbhl P, Koltun V. Geodesic object proposals. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 725–739

    Google Scholar 

  60. Lin D, Dai J F, Jia J Y, et al. Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3159–3167

    Google Scholar 

  61. Romera-Paredes B, Torr P H S. Recurrent instance segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 312–329

    Google Scholar 

  62. Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3150–3158

    Google Scholar 

Download references

Acknowledgements

This work was supported by National High-tech R&D Program of China (863 Program) (Grant No. 2015AA016403) and National Natural Science Foundation of China (Grant Nos. 61572061, 61472020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Zhou.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Geng, Q., Zhou, Z. & Cao, X. Survey of recent progress in semantic image segmentation with CNNs. Sci. China Inf. Sci. 61, 051101 (2018). https://doi.org/10.1007/s11432-017-9189-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-017-9189-6

Keywords

Navigation