Skip to main content
Log in

Bridging semantic gap between high-level and low-level features in content-based video retrieval using multi-stage ESN–SVM classifier

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Content-based video retrieval system aims at assisting a user to retrieve targeted video sequence in a large database. Most of the search engines use textual annotations to retrieve videos. These types of engines offer a low-level abstraction while the user seeks high-level semantics. Bridging this type of semantic gap in video retrieval remains an important challenge. In this paper, colour, texture and shapes are considered to be low-level features and motion is a high-level feature. Colour histograms convert the RGB colour space into YcbCr and extract hue and saturation values from frames. After colour extraction, filter mask is applied and gradient value is computed. Gradient and threshold values are compared to draw the edge map. Edges are smoothed for sharpening to remove the unnecessary connected components. These diverse shapes are then extracted and stored in shape feature vectors. Finally, an SVM classifier is used for classification of low-level features. For high-level features, depth images are extracted for motion feature identification and classification is done via echo state neural networks (ESN). ESN are a supervised learning technique and follow the principle of recurrent neural networks. ESN are well known for time series classification and also proved their effective performance in gesture detection. By combining the existing algorithms, a high-performance multimedia event detection system is constructed. The effectiveness and efficiency of proposed event detection mechanism is validated using MSR 3D action pair dataset. Experimental results show that the detection accuracy of proposed combination is better than those of other algorithms

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

References

  1. Hoi S C H and Lyu M R 2007 A multimodal and multilevel ranking framework for content-based video retrieval. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 4

  2. Kamishima Y, Nakamasa I and Koichi S 2013 Event detection in consumer videos using GMM supervectors and SVMs. EURASIP J. Image Video Process. 1: 1–13

    Google Scholar 

  3. Lu S, King I and Lyu M R 2004 Video summarization by video structure analysis and graph optimization. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, ICME’04, vol. 3

  4. Assfalg J et al 2002 Soccer highlights detection and recognition using HMMs. Proceedings of ICME (1), pp. 825–828

    Google Scholar 

  5. Adam A et al 2008 Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3): 555–560

    Article  Google Scholar 

  6. Gupta S, Gupta N and Kumar S 2011 Evaluation of object based video retrieval using SIFT. Int. J. Soft Comput. Eng. ISSN: 2231–2307

  7. Ma Z et al 2013 Multimedia event detection using a classifier-specific intermediate representation. IEEE Trans. Multimedia 15(7): 1628–1637

    Article  Google Scholar 

  8. Anami B S, Suvarna S N Govardhan A 2010 A combined color, texture and edge features based approach for identification and classification of Indian medicinal plants. Int. J. Comput. Appl. 6(12): 45–51

    Google Scholar 

  9. Vapnik V N 1999 An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5): 988–999

    Article  Google Scholar 

  10. Schrauwen B and Buesing L 2009 A hierarchy of recurrent networks for speech recognition. In: Proceedings of the Workshop on Deep Learning for Speech Recognition and Related Applications

  11. Delezoide B 2006 Multimedia movie segmentation using low-level and semantic features. Ph.D. Thesis, Université Pierre et Marie Curie, Paris

  12. Oreifej O and Liu Z 2013 Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N Brindha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brindha, N., Visalakshi, P. Bridging semantic gap between high-level and low-level features in content-based video retrieval using multi-stage ESN–SVM classifier. Sādhanā 42, 1–10 (2017). https://doi.org/10.1007/s12046-016-0574-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-016-0574-8

Keywords

Navigation