Skip to main content
Erschienen in: Neural Processing Letters 3/2018

17.03.2018

Unsupervised Video Hashing via Deep Neural Network

verfasst von: Chao Ma, Yun Gu, Chen Gong, Jie Yang, Deying Feng

Erschienen in: Neural Processing Letters | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Hashing is a common solution for content-based multimedia retrieval by encoding high-dimensional feature vectors into short binary codes. Previous works mainly focus on image hashing problem. However, these methods can not be directly used for video hashing, as videos contain not only spatial structure within each frame, but also temporal correlation between successive frames. Several researchers proposed to handle this by encoding the extracted key frames, but these frame-based methods are time-consuming in real applications. Other researchers proposed to characterize the video by averaging the spatial features of frames and then the existing hashing methods can be adopted. Unfortunately, the sort of “video” features does not take the correlation between frames into consideration and may lead to the loss of the temporal information. Therefore, in this paper, we propose a novel unsupervised video hashing framework via deep neural network, which performs video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specially, the spatial features of videos are obtained by utilizing convolutional neural network, and the temporal features are established via long-short term memory. After that, the time series pooling strategy is employed to obtain the single feature vector for each video. The obtained spatio-temporal feature can be applied to many existing unsupervised hashing methods. Experimental results on two real datasets indicate that by employing the spatio-temporal features, our hashing method significantly improves the performance of existing methods which only deploy the spatial features, and meanwhile obtains higher mean average precision compared with the state-of-the-art video hashing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao L, Li Z, Mu Y, Chang SF (2012) Submodular video hashing: a unified framework towards video pooling and indexing. In: Proceedings of the 20th ACM international conference on Multimedia. ACM, pp 299–308 Cao L, Li Z, Mu Y, Chang SF (2012) Submodular video hashing: a unified framework towards video pooling and indexing. In: Proceedings of the 20th ACM international conference on Multimedia. ACM, pp 299–308
2.
Zurück zum Zitat Carreira-Perpinán MA, Raziperchikolaei R (2015) Hashing with binary autoencoders. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 557–566 Carreira-Perpinán MA, Raziperchikolaei R (2015) Hashing with binary autoencoders. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 557–566
3.
Zurück zum Zitat Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255
4.
Zurück zum Zitat Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, 2015. CVPR 2015, pp 2625–2634 Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition, 2015. CVPR 2015, pp 2625–2634
5.
Zurück zum Zitat Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp 518–529 Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp 518–529
6.
Zurück zum Zitat Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 817–824 Gong Y, Lazebnik S (2011) Iterative quantization: a procrustean approach to learning binary codes. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 817–824
7.
Zurück zum Zitat Guo Z, Gao L, Song J, Xu X, Shao J, Shen HT (2016) Attention-based LSTM with semantic consistency for videos captioning. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 357–361 Guo Z, Gao L, Song J, Xu X, Shao J, Shen HT (2016) Attention-based LSTM with semantic consistency for videos captioning. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 357–361
8.
Zurück zum Zitat Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544MathSciNetCrossRef Hao Y, Mu T, Goulermas JY, Jiang J, Hong R, Wang M (2017) Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Trans Image Process 26(11):5531–5544MathSciNetCrossRef
9.
Zurück zum Zitat Heo JP, Lee Y, He J, Chang SF, Yoon SE (2012) Spherical hashing. In: IEEE conference on computer vision and pattern recognition, 2012. CVPR 2012. IEEE, pp 2957–2964 Heo JP, Lee Y, He J, Chang SF, Yoon SE (2012) Spherical hashing. In: IEEE conference on computer vision and pattern recognition, 2012. CVPR 2012. IEEE, pp 2957–2964
10.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
11.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. ArXiv preprint arXiv:1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. ArXiv preprint arXiv:​1408.​5093
12.
Zurück zum Zitat Korman S, Avidan S (2011) Coherency sensitive hashing. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1607–1614 Korman S, Avidan S (2011) Coherency sensitive hashing. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1607–1614
13.
Zurück zum Zitat Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112CrossRef Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112CrossRef
14.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
15.
Zurück zum Zitat Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. ArXiv preprint arXiv:1511.03855 Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. ArXiv preprint arXiv:​1511.​03855
16.
Zurück zum Zitat Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2074–2081 Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2074–2081
17.
Zurück zum Zitat Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110CrossRef Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110CrossRef
18.
Zurück zum Zitat Ma C, Gu Y, Liu W, Yang J, He X (2016) Unsupervised video hashing by exploiting spatio-temporal feature. In: International conference on neural information processing. Springer, pp 511–518 Ma C, Gu Y, Liu W, Yang J, He X (2016) Unsupervised video hashing by exploiting spatio-temporal feature. In: International conference on neural information processing. Springer, pp 511–518
19.
Zurück zum Zitat Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 353–360 Norouzi M, Blei DM (2011) Minimal loss hashing for compact binary codes. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 353–360
20.
Zurück zum Zitat Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175CrossRefMATH Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175CrossRefMATH
21.
Zurück zum Zitat Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems, pp 1509–1517 Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: Advances in neural information processing systems, pp 1509–1517
22.
Zurück zum Zitat Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRef Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRef
23.
Zurück zum Zitat Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef Salakhutdinov R, Hinton G (2009) Semantic hashing. Int J Approx Reason 50(7):969–978CrossRef
24.
Zurück zum Zitat Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813 Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
25.
Zurück zum Zitat Shen F, Shen C, Shi Q, Van Den Hengel A, Tang Z (2013) Inductive hashing on manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1562–1569 Shen F, Shen C, Shi Q, Van Den Hengel A, Tang Z (2013) Inductive hashing on manifolds. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1562–1569
26.
Zurück zum Zitat Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45 Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
27.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:​1409.​1556
28.
Zurück zum Zitat Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432 Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 423–432
29.
Zurück zum Zitat Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. ArXiv preprint arXiv:1212.0402 Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. ArXiv preprint arXiv:​1212.​0402
30.
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
31.
Zurück zum Zitat Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
32.
Zurück zum Zitat Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406CrossRef Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406CrossRef
33.
Zurück zum Zitat Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 13:1 Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 13:1
34.
Zurück zum Zitat Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514CrossRef Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition. IEEE Signal Process Lett 24(4):510–514CrossRef
35.
Zurück zum Zitat Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Curran Associates, Inc., New York, pp 1753–1760 Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. Curran Associates, Inc., New York, pp 1753–1760
36.
Zurück zum Zitat Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) Unsupervised deep video hashing with balanced rotation. In: IJCAI Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) Unsupervised deep video hashing with balanced rotation. In: IJCAI
37.
Zurück zum Zitat Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227 Wu X, Hauptmann AG, Ngo CW (2007) Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 218–227
38.
Zurück zum Zitat Ye G, Liu D, Wang J, Chang SF (2013) Large-scale video hashing via structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 2272–2279 Ye G, Liu D, Wang J, Chang SF (2013) Large-scale video hashing via structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 2272–2279
39.
Zurück zum Zitat Yu FX, Kumar S, Gong Y, Chang SF (2014) Circulant binary embedding. In: Computer Science, pp 946–954 Yu FX, Kumar S, Gong Y, Chang SF (2014) Circulant binary embedding. In: Computer Science, pp 946–954
41.
Zurück zum Zitat Zhang H, Wang M, Hong R, Chua TS (2016) Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 781–790 Zhang H, Wang M, Hong R, Chua TS (2016) Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 781–790
42.
Zurück zum Zitat Zhang P, Zhang W, Li WJ, Guo M (2014) Supervised hashing with latent factor models. In: International ACM SIGIR conference on research and development in information retrieval, pp 173–182 Zhang P, Zhang W, Li WJ, Guo M (2014) Supervised hashing with latent factor models. In: International ACM SIGIR conference on research and development in information retrieval, pp 173–182
43.
Zurück zum Zitat Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399CrossRef Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399CrossRef
Metadaten
Titel
Unsupervised Video Hashing via Deep Neural Network
verfasst von
Chao Ma
Yun Gu
Chen Gong
Jie Yang
Deying Feng
Publikationsdatum
17.03.2018
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 3/2018
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-018-9812-x

Weitere Artikel der Ausgabe 3/2018

Neural Processing Letters 3/2018 Zur Ausgabe

Neuer Inhalt