Skip to main content
Top
Published in: Soft Computing 7/2019

21-11-2017 | Methodologies and Application

Scalable scene understanding via saliency consensus

Authors: Bharath Ramesh, Nicholas Lim Zhi Jian, Liang Chen, Cheng Xiang, Zhi Gao

Published in: Soft Computing | Issue 7/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Given a single image, we propose a scene understanding framework that segments and categorizes the objects in the scene, and classifies the overall scene. A handful of frameworks already exist to perform these tasks coherently, but training of these models is time-consuming, thereby limiting their scalability. This paper presents a scalable framework by adopting an object-based approach, which sequentially performs unsupervised object discovery using multiple saliency detection algorithms, object segmentation by graph-cut, object classification using the bag-of-features model, and lastly, scene classification by binary decision trees. A novel region-of-interest (ROI) detector, based on morphological image processing techniques, is proposed to automatically provide object location priors from saliency maps. Additionally, for improving object discovery, multiple saliency detectors are combined using a novel method to produce the ROI map, which is then used to obtain the segmentation. We tested our system on a novel object-based scene dataset and obtained a high classification accuracy using the proposed object discovery step. Unlike other existing frameworks, the proposed algorithm maintains scalability due to the fully unsupervised object discovery step, and therefore it can easily accommodate more objects and scene categories.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef
go back to reference Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579CrossRef Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579CrossRef
go back to reference Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429 Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429
go back to reference Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530 Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530
go back to reference Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8 Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8
go back to reference Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239CrossRef Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239CrossRef
go back to reference Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950CrossRef Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950CrossRef
go back to reference Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463CrossRef Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463CrossRef
go back to reference Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118MathSciNetCrossRefMATH Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118MathSciNetCrossRefMATH
go back to reference Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef
go back to reference Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416 Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416
go back to reference Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136 Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136
go back to reference Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408CrossRef Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408CrossRef
go back to reference Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22 Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
go back to reference Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
go back to reference Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72 Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72
go back to reference Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction
go back to reference Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70CrossRef Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70CrossRef
go back to reference Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef
go back to reference Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB
go back to reference Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545 Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545
go back to reference Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8 Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8
go back to reference Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688 Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688
go back to reference Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201CrossRef Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201CrossRef
go back to reference Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRef Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRef
go back to reference Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef
go back to reference Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377 Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377
go back to reference Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL
go back to reference Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437 Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437
go back to reference Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178 Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178
go back to reference Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308CrossRef Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308CrossRef
go back to reference Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043 Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043
go back to reference Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257 Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257
go back to reference Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22 Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
go back to reference Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157 Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157
go back to reference Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423 Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423
go back to reference Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87CrossRef Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87CrossRef
go back to reference Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University
go back to reference Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503 Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503
go back to reference Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH
go back to reference Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR) Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)
go back to reference Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
go back to reference Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675 Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675
go back to reference Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658CrossRef Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658CrossRef
go back to reference Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195CrossRef Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195CrossRef
go back to reference Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef
go back to reference Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576 Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576
go back to reference Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613 Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613
go back to reference Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124CrossRef Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124CrossRef
go back to reference Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518 Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518
go back to reference Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492 Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492
go back to reference Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32CrossRef Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32CrossRef
go back to reference Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:1610.02055 Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:​1610.​02055
Metadata
Title
Scalable scene understanding via saliency consensus
Authors
Bharath Ramesh
Nicholas Lim Zhi Jian
Liang Chen
Cheng Xiang
Zhi Gao
Publication date
21-11-2017
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 7/2019
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2939-2

Other articles of this Issue 7/2019

Soft Computing 7/2019 Go to the issue

Premium Partner