Skip to main content

2017 | OriginalPaper | Buchkapitel

Alternative Semantic Representations for Zero-Shot Human Action Recognition

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A proper semantic representation for encoding side information is key to the success of zero-shot learning. In this paper, we explore two alternative semantic representations especially for zero-shot human action recognition: textual descriptions of human actions and deep features extracted from still images relevant to human actions. Such side information are accessible on Web with little cost, which paves a new way in gaining side information for large-scale zero-shot human action recognition. We investigate different encoding methods to generate semantic representations for human actions from such side information. Based on our zero-shot visual recognition method, we conducted experiments on UCF101 and HMDB51 to evaluate two proposed semantic representations. The results suggest that our proposed text- and image-based semantic representations outperform traditional attributes and word vectors considerably for zero-shot human action recognition. In particular, the image-based semantic representations yield the favourable performance even though the representation is extracted from a small number of images per class.
Data related to this chapter are available at: http://​staff.​cs.​manchester.​ac.​uk/​~kechen/​ASRHAR/​

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Like attributes and word vectors, our proposed semantic representations may be directly deployed in all the existing zero-shot human action recognition methods.
 
3
The scripts and data used in our experiments can be available on our project page: http://​staff.​cs.​manchester.​ac.​uk/​kechen/​ASRHAR/​.
 
Literatur
1.
Zurück zum Zitat Akata, Z., Malinowski, M., Fritz, M., Schiele, B.: Multi-cue zero-shot learning with strong supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 59–68 (2016) Akata, Z., Malinowski, M., Fritz, M., Schiele, B.: Multi-cue zero-shot learning with strong supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 59–68 (2016)
2.
Zurück zum Zitat Alexiou, I., Xiang, T., Gong, S.: Exploring synonyms as context in zero-shot action recognition. In: IEEE International Conference on Image Processing (ICIP), pp. 4190–4194. IEEE (2016) Alexiou, I., Xiang, T., Gong, S.: Exploring synonyms as context in zero-shot action recognition. In: IEEE International Conference on Image Processing (ICIP), pp. 4190–4194. IEEE (2016)
3.
4.
Zurück zum Zitat Cheng, J., Liu, Q., Lu, H., Chen, Y.W.: Supervised Kernel locality preserving projections for face recognition. Neurocomputing 67, 443–449 (2005)CrossRef Cheng, J., Liu, Q., Lu, H., Chen, Y.W.: Supervised Kernel locality preserving projections for face recognition. Neurocomputing 67, 443–449 (2005)CrossRef
5.
Zurück zum Zitat Chuang Gan, M.L., Yang, Y., Zhuang, Y., Hauptmann, A.G.: Exploring semantic interclass relationships (SIR) for zero-shot action recognition. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 3769–3775 (2015) Chuang Gan, M.L., Yang, Y., Zhuang, Y., Hauptmann, A.G.: Exploring semantic interclass relationships (SIR) for zero-shot action recognition. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 3769–3775 (2015)
6.
Zurück zum Zitat Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 2584–2591 (2013) Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 2584–2591 (2013)
7.
Zurück zum Zitat Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785. IEEE (2009) Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785. IEEE (2009)
8.
Zurück zum Zitat Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Learning multimodal latent attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 303–316 (2014)CrossRef Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Learning multimodal latent attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 303–316 (2014)CrossRef
9.
Zurück zum Zitat Inoue, N., Shinoda, K.: Adaptation of word vectors using tree structure for visual semantics. In: ACM on Multimedia Conference, pp. 277–281. ACM (2016) Inoue, N., Shinoda, K.: Adaptation of word vectors using tree structure for visual semantics. In: ACM on Multimedia Conference, pp. 277–281. ACM (2016)
10.
Zurück zum Zitat Jiang, Y., Liu, J., Zamir, A.R., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: Thumos challenge: action recognition with a large number of classes (2014) Jiang, Y., Liu, J., Zamir, A.R., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: Thumos challenge: action recognition with a large number of classes (2014)
11.
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563. IEEE (2011) Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563. IEEE (2011)
12.
Zurück zum Zitat Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 951–958. IEEE (2009) Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 951–958. IEEE (2009)
13.
Zurück zum Zitat Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (ICML), vol. 14, pp. 1188–1196 (2014) Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (ICML), vol. 14, pp. 1188–1196 (2014)
14.
Zurück zum Zitat Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344. IEEE (2011) Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344. IEEE (2011)
15.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
16.
Zurück zum Zitat Mukherjee, T., Hospedales, T.: Gaussian visual-linguistic embedding for zero-shot recognition. In: Conference on Empirical Methods on Natural Language Processing (EMNLP) (2016) Mukherjee, T., Hospedales, T.: Gaussian visual-linguistic embedding for zero-shot recognition. In: Conference on Empirical Methods on Natural Language Processing (EMNLP) (2016)
18.
Zurück zum Zitat Qin, J., Wang, Y., Liu, L., Chen, J., Shao, L.: Beyond semantic attributes: discrete latent attributes learning for zero-shot recognition. IEEE Sig. Process. Lett. 23(11), 1667–1671 (2016)CrossRef Qin, J., Wang, Y., Liu, L., Chen, J., Shao, L.: Beyond semantic attributes: discrete latent attributes learning for zero-shot recognition. IEEE Sig. Process. Lett. 23(11), 1667–1671 (2016)CrossRef
19.
Zurück zum Zitat Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., Schiele, B.: What helps where-and why? Semantic relatedness for knowledge transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 910–917. IEEE (2010) Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., Schiele, B.: What helps where-and why? Semantic relatedness for knowledge transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 910–917. IEEE (2010)
22.
Zurück zum Zitat Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012) Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402 (2012)
23.
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
24.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015) Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)
26.
Zurück zum Zitat Wang, Q., Chen, K.: Zero-shot visual recognition via bidirectional latent embedding. Int. J. Comput. Vis. (2017) Wang, Q., Chen, K.: Zero-shot visual recognition via bidirectional latent embedding. Int. J. Comput. Vis. (2017)
27.
Zurück zum Zitat Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Metadaten
Titel
Alternative Semantic Representations for Zero-Shot Human Action Recognition
verfasst von
Qian Wang
Ke Chen
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71249-9_6