Skip to main content

2018 | OriginalPaper | Buchkapitel

Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation

verfasst von : Qibin Hou, Daniela Massiceti, Puneet Kumar Dokania, Yunchao Wei, Ming-Ming Cheng, Philip H. S. Torr

Erschienen in: Energy Minimization Methods in Computer Vision and Pattern Recognition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the task of learning a classifier for semantic segmentation using weak supervision in the form of image labels specifying objects present in the image. Our method uses deep convolutional neural networks (cnns) and adopts an Expectation-Maximization (EM) based approach. We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step). We show that saliency and attention maps, bottom-up and top-down cues respectively, of images with single objects (simple images) provide highly reliable cues to learn an initialization for the EM. Intuitively, given weak supervisions, we first learn to segment simple images and then move towards the complex ones. Next, for updating the parameters (M step), we propose to minimize the combination of the standard softmax loss and the KL divergence between the latent posterior distribution (obtained using the E-step) and the likelihood given by the cnn. This combination is more robust to wrong predictions made by the E step of the EM algorithm. Extensive experiments and discussions show that our method is very simple and intuitive, and outperforms the state-of-the-art method with a very high margin of 3.7% and 3.9% on the PASCAL VOC12 train and test sets respectively, thus setting new state-of-the-art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The softmax function is defined as \(\sigma (f_k) = \frac{e^{f_k}}{\sum _{j=0}^c e^{f_j}}\).
 
Literatur
1.
Zurück zum Zitat Ahmed, F., Tarlow, D., Batra, D.: Optimizing expected intersection-over-union with candidate-constrained CRFs. In: ICCV (2015) Ahmed, F., Tarlow, D., Batra, D.: Optimizing expected intersection-over-union with candidate-constrained CRFs. In: ICCV (2015)
2.
Zurück zum Zitat Alexe, B., Deselares, T., Ferrari, V.: Measuring the objectness of image windows. PAMI 34(11), 2189–2202 (2012)CrossRef Alexe, B., Deselares, T., Ferrari, V.: Measuring the objectness of image windows. PAMI 34(11), 2189–2202 (2012)CrossRef
3.
Zurück zum Zitat Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014) Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)
4.
5.
6.
Zurück zum Zitat Chen, L.-G., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected. In: ICLR (2015) Chen, L.-G., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected. In: ICLR (2015)
7.
Zurück zum Zitat Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300fps. In: CVPR (2014) Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300fps. In: CVPR (2014)
8.
Zurück zum Zitat Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE TPAMI 37(3), 569–582 (2015)CrossRef Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE TPAMI 37(3), 569–582 (2015)CrossRef
9.
Zurück zum Zitat Cogswell, M., Lin, X., Purushwalkam, S., Batra, D.: Combining the best of graphical models and convnets for semantic segmentation (2014). arXiv:1412.4313 Cogswell, M., Lin, X., Purushwalkam, S., Batra, D.: Combining the best of graphical models and convnets for semantic segmentation (2014). arXiv:​1412.​4313
10.
Zurück zum Zitat Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. 39(1), 1–38 (1977)MathSciNetMATH
11.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
12.
Zurück zum Zitat Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The Pascal visual object classes challenge a retrospective. IJCV 111(1), 98–136 (2015)CrossRef Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The Pascal visual object classes challenge a retrospective. IJCV 111(1), 98–136 (2015)CrossRef
13.
Zurück zum Zitat Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph based image segmentation. IJCV 59(2), 167–181 (2004)CrossRef Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph based image segmentation. IJCV 59(2), 167–181 (2004)CrossRef
14.
Zurück zum Zitat Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011) Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: ICCV (2011)
15.
Zurück zum Zitat Hou, Q., Cheng, M.-M., Hu, X.-W., Borji, A., Tu, Z., Torr, P.: Deeply supervised salient object detection with short connections. In: IEEE CVPR (2017) Hou, Q., Cheng, M.-M., Hu, X.-W., Borji, A., Tu, Z., Torr, P.: Deeply supervised salient object detection with short connections. In: IEEE CVPR (2017)
17.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)
18.
Zurück zum Zitat Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., Li, S.: Salient object detection: a discriminative regional feature integration approach. In: CVPR (2013) Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., Li, S.: Salient object detection: a discriminative regional feature integration approach. In: CVPR (2013)
20.
Zurück zum Zitat Krahenbuhl P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011) Krahenbuhl P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)
22.
Zurück zum Zitat Lin, D., Dai, J., Jia, J., He, K., Sun, J.: ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: CVPR (2016) Lin, D., Dai, J., Jia, J., He, K., Sun, J.: ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: CVPR (2016)
24.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
25.
Zurück zum Zitat McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, Hoboken (1997)MATH McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, Hoboken (1997)MATH
26.
Zurück zum Zitat Nowozin, S.: Optimal decisions from probabilistic models: the intersection-over-union case. In: CVPR (2014) Nowozin, S.: Optimal decisions from probabilistic models: the intersection-over-union case. In: CVPR (2014)
27.
Zurück zum Zitat Papandreou, G., Chen, L.-C., Murphy, K.P., Yuille, A.L.: Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. In: ICCV (2015) Papandreou, G., Chen, L.-C., Murphy, K.P., Yuille, A.L.: Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. In: ICCV (2015)
28.
Zurück zum Zitat Pathak, D., Krahenbuhl, P., Darrell, T.: Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV (2015) Pathak, D., Krahenbuhl, P., Darrell, T.: Constrained convolutional neural networks for weakly supervised segmentation. In: ICCV (2015)
29.
Zurück zum Zitat Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. In: ICLR (2014) Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. In: ICLR (2014)
30.
Zurück zum Zitat Pinheiro, P.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: CVPR (2015) Pinheiro, P.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: CVPR (2015)
32.
Zurück zum Zitat Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014) Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014)
33.
Zurück zum Zitat Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRef Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRef
34.
Zurück zum Zitat Wei, Y., Feng, J., Liang, X., Cheng, M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR (2017) Wei, Y., Feng, J., Liang, X., Cheng, M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR (2017)
35.
Zurück zum Zitat Xu, J., Schwing, A., Urtasun, R.: Learning to segment under various forms of weak supervision. In: CVPR (2015) Xu, J., Schwing, A., Urtasun, R.: Learning to segment under various forms of weak supervision. In: CVPR (2015)
36.
Zurück zum Zitat Yunchao, W., Xiaodan, L., Yunpeng, C., Xiaohui, S., Cheng, M.-M., Yao, Z., Shuicheng, Y.: STC: a simple to complex framework for weakly-supervised semantic segmentation (2015). arXiv:1509.03150 Yunchao, W., Xiaodan, L., Yunpeng, C., Xiaohui, S., Cheng, M.-M., Yao, Z., Shuicheng, Y.: STC: a simple to complex framework for weakly-supervised semantic segmentation (2015). arXiv:​1509.​03150
38.
Zurück zum Zitat Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: ICCV (2015) Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: ICCV (2015)
39.
Zurück zum Zitat Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016) Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)
Metadaten
Titel
Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation
verfasst von
Qibin Hou
Daniela Massiceti
Puneet Kumar Dokania
Yunchao Wei
Ming-Ming Cheng
Philip H. S. Torr
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-78199-0_18