Skip to main content
Erschienen in: Soft Computing 7/2019

21.11.2017 | Methodologies and Application

Scalable scene understanding via saliency consensus

verfasst von: Bharath Ramesh, Nicholas Lim Zhi Jian, Liang Chen, Cheng Xiang, Zhi Gao

Erschienen in: Soft Computing | Ausgabe 7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given a single image, we propose a scene understanding framework that segments and categorizes the objects in the scene, and classifies the overall scene. A handful of frameworks already exist to perform these tasks coherently, but training of these models is time-consuming, thereby limiting their scalability. This paper presents a scalable framework by adopting an object-based approach, which sequentially performs unsupervised object discovery using multiple saliency detection algorithms, object segmentation by graph-cut, object classification using the bag-of-features model, and lastly, scene classification by binary decision trees. A novel region-of-interest (ROI) detector, based on morphological image processing techniques, is proposed to automatically provide object location priors from saliency maps. Additionally, for improving object discovery, multiple saliency detectors are combined using a novel method to produce the ROI map, which is then used to obtain the segmentation. We tested our system on a novel object-based scene dataset and obtained a high classification accuracy using the proposed object discovery step. Unlike other existing frameworks, the proposed algorithm maintains scalability due to the fully unsupervised object discovery step, and therefore it can easily accommodate more objects and scene categories.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916CrossRef
Zurück zum Zitat Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579CrossRef Bao SY, Sun M, Savarese S (2011) Toward coherent object detection and scene layout understanding. Image Vis Comput 29(9):569–579CrossRef
Zurück zum Zitat Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429 Borji A, Sihite D, Itti, L (2012) Salient object detection: a benchmark. In: European conference on computer vision, lecture notes in computer science, pp 414–429
Zurück zum Zitat Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530 Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 517–530
Zurück zum Zitat Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8 Bosch A, Zisserman A, Muoz X (2007) Image classification using random forests and ferns. In: 11th international conference on computer vision, pp 1–8
Zurück zum Zitat Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239CrossRef Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239CrossRef
Zurück zum Zitat Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950CrossRef Bruce N, Tsotsos J (2007) Attention based on information maximization. J Vis 7(9):950–950CrossRef
Zurück zum Zitat Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463CrossRef Cabrerizo FJ, Moreno JM, Pérez IJ, Herrera-Viedma E (2010) Analyzing consensus approaches in fuzzy group decision making: advantages and drawbacks. Soft Comput 14(5):451–463CrossRef
Zurück zum Zitat Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118MathSciNetCrossRefMATH Cabrerizo FJ, Chiclana F, Al-Hmouz R, Morfeq A, Balamash AS, Herrera-Viedma E (2015) Fuzzy decision making and consensus: challenges. J Intell Fuzzy Syst 29(3):1109–1118MathSciNetCrossRefMATH
Zurück zum Zitat Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef Chapelle O, Haffner P, Vapnik V (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef
Zurück zum Zitat Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416 Cheng MM, Zhang GX, Mitra NJ, Huang X, Hu SM (2011) Global contrast based salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 409–416
Zurück zum Zitat Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136 Choi M, Lim J, Torralba A, Willsky A (2010) Exploiting hierarchical context on a large database of object categories. In: IEEE conference on computer vision and pattern recognition, pp 129–136
Zurück zum Zitat Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408CrossRef Congcong L, Kowdle A, Saxena A, Tsuhan C (2012) Toward holistic scene understanding: feedback enabled cascaded classification models. IEEE Trans Pattern Anal Mach Intell 34(7):1394–1408CrossRef
Zurück zum Zitat Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22 Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Zurück zum Zitat Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Zurück zum Zitat Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72 Dubey SR, Dixit P, Singh N, Gupta JP (2013) Infected fruit part detection using k-means clustering segmentation technique. Int J Interact Multimed Artif Intell 2(2):65–72
Zurück zum Zitat Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction Eddins SL (2012) MATLAB R2012b documentation: morphological reconstruction
Zurück zum Zitat Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70CrossRef Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70CrossRef
Zurück zum Zitat Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef
Zurück zum Zitat Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB Gonzalez RC, Woods RE, Eddins SL (2010) Morphological reconstruction. Digital image processing using MATLAB
Zurück zum Zitat Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545 Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process Syst 19:545
Zurück zum Zitat Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8 Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Zurück zum Zitat Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688 Hou X, Zhang L (2009) Dynamic visual attention: searching for coding length increments. In: Advances in neural information processing systems, pp 681–688
Zurück zum Zitat Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201CrossRef Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201CrossRef
Zurück zum Zitat Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRef Huang G, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529CrossRef
Zurück zum Zitat Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRef
Zurück zum Zitat Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377 Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377
Zurück zum Zitat Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL Judd T, Durand F, Torralba A (2012) A benchmark of computational models of saliency to predict human fixations, Technical Report. TR-2012-001, MIT-CSAIL
Zurück zum Zitat Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437 Ladicky L, Sturgess P, Alahari K, Russell C, Torr P (2010) What, where and how many? Combining object detectors and CRFS. In: European conference on computer vision, lecture notes in computer science. Springer, Berlin, pp 424–437
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178 Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE conference on computer vision and pattern recognition 2:2169–2178
Zurück zum Zitat Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308CrossRef Li Y, Sun J, Tang C, Shum H (2004) Lazy snapping. ACM Trans Graph (ToG) 23(3):303–308CrossRef
Zurück zum Zitat Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043 Li L, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043
Zurück zum Zitat Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257 Li Y, Zhou Y, Yan J, Niu Z, Yang J (2010) Visual saliency based on conditional entropy. In: Asian conference on computer vision, lecture notes in computer Science, vol 5994, pp 246–257
Zurück zum Zitat Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22 Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Zurück zum Zitat Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157 Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 1150–1157
Zurück zum Zitat Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423 Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference computer vision, vol 2, pp 416–423
Zurück zum Zitat Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87CrossRef Mottaghi R, Fidler S, Yuille A, Urtasun R, Parikh D (2016) Human-machine CRFS for identifying bottlenecks in scene understanding. IEEE Trans Pattern Anal Mach Intell 38(1):74–87CrossRef
Zurück zum Zitat Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University Nene S, Nayar S, Murase H et al (1996) Columbia object image library (coil-20), Technical report. Columbia University
Zurück zum Zitat Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503 Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, lecture notes in computer science, vol 3954, pp 490–503
Zurück zum Zitat Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefMATH
Zurück zum Zitat Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR) Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRef
Zurück zum Zitat Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675 Rezazadegan Tavakoli H, Rahtu E, Heikkil J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden A, Kahl F (eds) Image analysis, lecture notes in computer science, vol 6688. Springer, Berlin, pp 666–675
Zurück zum Zitat Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658CrossRef Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T (2013) Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process Image Commun 28(6):642–658CrossRef
Zurück zum Zitat Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195CrossRef Schyns P, Oliva A (1994) From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol Sci 5(4):195CrossRef
Zurück zum Zitat Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298CrossRef
Zurück zum Zitat Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576 Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576
Zurück zum Zitat Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613 Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: IEEE international conference on computer vision, pp 606–613
Zurück zum Zitat Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124CrossRef Vikram TN, Tscherepanow M, Wrede B (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit 45(9):3114–3124CrossRef
Zurück zum Zitat Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518 Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 511–518
Zurück zum Zitat Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492 Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3485–3492
Zurück zum Zitat Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32CrossRef Zhang L, Tong MH, Marks TK, Shan H, Cottrell GW (2008) Sun: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32CrossRef
Zurück zum Zitat Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:1610.02055 Zhou B, Khosla A, Lapedriza A, Torralba A, Oliva A (2016) Places: an image database for deep scene understanding. arXiv preprint: arXiv:​1610.​02055
Metadaten
Titel
Scalable scene understanding via saliency consensus
verfasst von
Bharath Ramesh
Nicholas Lim Zhi Jian
Liang Chen
Cheng Xiang
Zhi Gao
Publikationsdatum
21.11.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 7/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2939-2

Weitere Artikel der Ausgabe 7/2019

Soft Computing 7/2019 Zur Ausgabe