Skip to main content
Top

2016 | OriginalPaper | Chapter

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

Authors : Saining Xie, Xun Huang, Zhuowen Tu

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Current practice in convolutional neural networks (CNN) remains largely bottom-up and the role of top-down process in CNN for pattern analysis and visual inference is not very clear. In this paper, we propose a new method for structured labeling by developing convolutional pseudoprior (ConvPP) on the ground-truth labels. Our method has several interesting properties: (1) compared with classic machine learning algorithms like CRFs and Structural SVM, ConvPP automatically learns rich convolutional kernels to capture both short- and long- range contexts; (2) compared with cascade classifiers like Auto-Context, ConvPP avoids the iterative steps of learning a series of discriminative classifiers and automatically learns contextual configurations; (3) compared with recent efforts combining CNN models with CRFs and RNNs, ConvPP learns convolution in the labeling space with improved modeling capability and less manual specification; (4) compared with Bayesian models like MRFs, ConvPP capitalizes on the rich representation power of convolution by automatically learning priors built on convolutional filters. We accomplish our task using pseudo-likelihood approximation to the prior under a novel fixed-point network structure that facilitates an end-to-end learning process. We show state-of-the-art results on sequential labeling and image labeling benchmarks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
2.
go back to reference Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields. In: ICML (2001) Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields. In: ICML (2001)
3.
go back to reference Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006). doi:10.1007/11744023_1 CrossRef Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006). doi:10.​1007/​11744023_​1 CrossRef
4.
go back to reference Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE PAMI 6(6), 721–741 (1984)CrossRefMATH Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE PAMI 6(6), 721–741 (1984)CrossRefMATH
5.
go back to reference Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)MathSciNetMATH Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)MathSciNetMATH
6.
go back to reference Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003) Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)
7.
go back to reference Finley, T., Joachims, T.: Training structural SVMs when exact inference is intractable. In: ICML (2008) Finley, T., Joachims, T.: Training structural SVMs when exact inference is intractable. In: ICML (2008)
8.
go back to reference Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008) Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)
9.
go back to reference Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models. In: NIPS (2008) Heitz, G., Gould, S., Saxena, A., Koller, D.: Cascaded classification models. In: NIPS (2008)
10.
go back to reference Daumé, H.I., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)CrossRef Daumé, H.I., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)CrossRef
12.
13.
go back to reference David, M.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982) David, M.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982)
14.
go back to reference Gibson, J.J.: A theory of direct visual perception. In: Vision and Mind: Selected Readings in the Philosophy of Perception, pp. 77–90 (2002) Gibson, J.J.: A theory of direct visual perception. In: Vision and Mind: Selected Readings in the Philosophy of Perception, pp. 77–90 (2002)
15.
go back to reference Kersten, D., Mamassian, P., Yuille, A.: Object perception as Bayesian inference. Ann. Rev. Psychol. 55, 271–304 (2004)CrossRef Kersten, D., Mamassian, P., Yuille, A.: Object perception as Bayesian inference. Ann. Rev. Psychol. 55, 271–304 (2004)CrossRef
16.
go back to reference Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. IJCV 63(2), 113–140 (2005)CrossRef Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. IJCV 63(2), 113–140 (2005)CrossRef
17.
go back to reference Borenstein, E., Ullman, S.: Combined top-down/bottom-up segmentation. IEEE PAMI 30(12), 2109–2125 (2008)CrossRef Borenstein, E., Ullman, S.: Combined top-down/bottom-up segmentation. IEEE PAMI 30(12), 2109–2125 (2008)CrossRef
18.
go back to reference Wu, T., Zhu, S.C.: A numerical study of the bottom-up and top-down inference processes in and-or graphs. IJCV 93(2), 226–252 (2011)MathSciNetCrossRefMATH Wu, T., Zhu, S.C.: A numerical study of the bottom-up and top-down inference processes in and-or graphs. IJCV 93(2), 226–252 (2011)MathSciNetCrossRefMATH
19.
go back to reference Krahenbuhl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011) Krahenbuhl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)
20.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
21.
go back to reference Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks (2015). arXiv preprint arXiv:1502.03240 Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks (2015). arXiv preprint arXiv:​1502.​03240
22.
go back to reference Lin, G., Shen, C., Reid, I., Hengel, A.v.d.: Deeply learning the messages in message passing inference. In: NIPS (2015) Lin, G., Shen, C., Reid, I., Hengel, A.v.d.: Deeply learning the messages in message passing inference. In: NIPS (2015)
24.
go back to reference Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2(4), 259–362 (2006)CrossRefMATH Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends Comput. Graph. Vis. 2(4), 259–362 (2006)CrossRefMATH
25.
go back to reference He, X., Zemel, R.S., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR (2004) He, X., Zemel, R.S., Carreira-Perpiñán, M.: Multiscale conditional random fields for image labeling. In: CVPR (2004)
26.
go back to reference Kae, A., Sohn, K., Lee, H., Learned-Miller, E.: Augmenting crfs with Boltzmann machine shape priors for image labeling. In: CVPR (2013) Kae, A., Sohn, K., Lee, H., Learned-Miller, E.: Augmenting crfs with Boltzmann machine shape priors for image labeling. In: CVPR (2013)
27.
28.
go back to reference Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). MIT PressMathSciNetCrossRefMATH Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). MIT PressMathSciNetCrossRefMATH
29.
go back to reference Snoek, J., Adams, R.P., Larochelle, H.: Nonparametric guidance of autoencoder representations using label information. JMLR 13, 2567–2588 (2012)MathSciNetMATH Snoek, J., Adams, R.P., Larochelle, H.: Nonparametric guidance of autoencoder representations using label information. JMLR 13, 2567–2588 (2012)MathSciNetMATH
30.
go back to reference Bengio, Y., Thibodeau-Laufer, E., Alain, G., Yosinski, J.: Deep generative stochastic networks trainable by backprop (2013). arXiv preprint arXiv:1306.1091 Bengio, Y., Thibodeau-Laufer, E., Alain, G., Yosinski, J.: Deep generative stochastic networks trainable by backprop (2013). arXiv preprint arXiv:​1306.​1091
31.
go back to reference Tu, Z., Narr, K.L., Dollár, P., Dinov, I., Thompson, P.M., Toga, A.W.: Brain anatomical structure segmentation by hybrid discriminative/generative models. IEEE Trans. Med. Imaging 27(4), 495–508 (2008)CrossRef Tu, Z., Narr, K.L., Dollár, P., Dinov, I., Thompson, P.M., Toga, A.W.: Brain anatomical structure segmentation by hybrid discriminative/generative models. IEEE Trans. Med. Imaging 27(4), 495–508 (2008)CrossRef
32.
go back to reference Li, Q., Wang, J., Wipf, D., Tu, Z.: Fixed-point model for structured labeling. ICML 28, 214–221 (2013) Li, Q., Wang, J., Wipf, D., Tu, Z.: Fixed-point model for structured labeling. ICML 28, 214–221 (2013)
33.
go back to reference Makhzani, A., Frey, B.: Winner-take-all autoencoders. In: NIPS (2015) Makhzani, A., Frey, B.: Winner-take-all autoencoders. In: NIPS (2015)
34.
go back to reference McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. In: ICML (2000) McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy Markov models for information extraction and segmentation. In: ICML (2000)
35.
go back to reference LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)
36.
go back to reference Do, T., Arti, T.: Neural conditional random fields. In: AISTATS (2010) Do, T., Arti, T.: Neural conditional random fields. In: AISTATS (2010)
37.
go back to reference Hoefel, G., Elkan, C.: Learning a two-stage SVM/CRF sequence classifier. In: CIKM, ACM (2008) Hoefel, G., Elkan, C.: Learning a two-stage SVM/CRF sequence classifier. In: CIKM, ACM (2008)
38.
go back to reference van der Maaten, L., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: AISTATS (2011) van der Maaten, L., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: AISTATS (2011)
40.
go back to reference Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014) Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)
41.
go back to reference Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009) Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)
42.
go back to reference Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015) Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)
43.
go back to reference Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_32 CrossRef Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-33786-4_​32 CrossRef
44.
go back to reference Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015) Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)
45.
go back to reference Dai, J., He, K., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV (2015) Dai, J., He, K., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV (2015)
46.
go back to reference Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_53 Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10590-1_​53
Metadata
Title
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior
Authors
Saining Xie
Xun Huang
Zhuowen Tu
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46493-0_19

Premium Partner