Skip to main content

2019 | OriginalPaper | Buchkapitel

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

verfasst von : Jinna Lv, Bin Wu

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social relation understanding is an increasingly popular research area. Great progress has been achieved by exploiting sentiment or social relation from image data, however, it is also difficult to attain satisfactory performance for social relation analysis from video data. In this paper, we propose a novel Spatio-Temporal attention model based on Multi-View (STMV) for understanding social relations from video. First, in order to obtain rich representation for social relation traits, we introduce different ConvNets to extract multi-view features including RGB, optical flow, and face. Second, we exploit temporal features of multi-view through time using Long Short-Term Memory (LSTM) for social relation understanding. Specially, we propose multiple attention units in our attention module. Through this manner, we can generate an appropriate feature representation focusing on multiple aspects of social relation traits from video, thus excellent mapping function from low-level video pixels to high-level social relation space can be built. Third, we introduce a tensor fusion layer, which learns interactions among multi-view features. Extensive experiments show that our STMV model achieves the state-of-the-art performance on the SRIV video dataset for social relation classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Alletto, S., Serra, G., Calderara, S.: Understanding social relationships in egocentric vision. Pattern Recognit. 48(12), 4082–4096 (2015)CrossRef Alletto, S., Serra, G., Calderara, S.: Understanding social relationships in egocentric vision. Pattern Recognit. 48(12), 4082–4096 (2015)CrossRef
3.
Zurück zum Zitat Tran, Q.D., Jung, J.E.: Cocharnet: extracting social networks using character co-occurrence in movies. J. Univers. Comput. 21(6), 796–815 (2015) Tran, Q.D., Jung, J.E.: Cocharnet: extracting social networks using character co-occurrence in movies. J. Univers. Comput. 21(6), 796–815 (2015)
4.
Zurück zum Zitat Weng, C.Y., Chu, W.T., Wu, J.L.: RoleNet: movie analysis from the perspective of social networks. IEEE Trans. Multimed. 11(2), 256–271 (2009)CrossRef Weng, C.Y., Chu, W.T., Wu, J.L.: RoleNet: movie analysis from the perspective of social networks. IEEE Trans. Multimed. 11(2), 256–271 (2009)CrossRef
6.
Zurück zum Zitat Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: CVPR, pp. 2982–2991 (2017) Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: CVPR, pp. 2982–2991 (2017)
7.
Zurück zum Zitat Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: CVPR, pp. 435–444 (2017) Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: CVPR, pp. 435–444 (2017)
8.
Zurück zum Zitat Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: ICCV, pp. 3631–3639 (2015) Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: ICCV, pp. 3631–3639 (2015)
9.
Zurück zum Zitat Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J., Finding actors and actions in movies. In: ICCV, pp. 2280–2287 (2013) Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J., Finding actors and actions in movies. In: ICCV, pp. 2280–2287 (2013)
11.
Zurück zum Zitat Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: CVPR, pp. 2027–2036 (2017) Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: CVPR, pp. 2027–2036 (2017)
12.
Zurück zum Zitat You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR, pp. 4651–4659 (2016) You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR, pp. 4651–4659 (2016)
13.
Zurück zum Zitat Pan, Y., Yao, T., Li, H., Mei, T.: Video captioning with transferred semantic attributes. In: CVPR, pp. 984–992 (2017) Pan, Y., Yao, T., Li, H., Mei, T.: Video captioning with transferred semantic attributes. In: CVPR, pp. 984–992 (2017)
14.
Zurück zum Zitat Yu, H., Gui, L., Madaio, M., Ogan, A., Cassell, J., Morency, L.P.: Temporally selective attention model for social and affective state recognition in multimedia content. In: MM, pp. 1743–1751 (2017) Yu, H., Gui, L., Madaio, M., Ogan, A., Cassell, J., Morency, L.P.: Temporally selective attention model for social and affective state recognition in multimedia content. In: MM, pp. 1743–1751 (2017)
15.
Zurück zum Zitat Yang, Y., et al.: Mining competitive relationships by learning across heterogeneous networks. In: CIKM, pp. 1432–1441 (2012) Yang, Y., et al.: Mining competitive relationships by learning across heterogeneous networks. In: CIKM, pp. 1432–1441 (2012)
16.
Zurück zum Zitat Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015) Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)
17.
Zurück zum Zitat Long, X., Gan, C., de Melo, G., Wu, J., Liu, X., Wen, S.: Attention clusters: Purely attention based local feature integration for video classification. CoRR, abs/1711.09550 (2017) Long, X., Gan, C., de Melo, G., Wu, J., Liu, X., Wen, S.: Attention clusters: Purely attention based local feature integration for video classification. CoRR, abs/1711.09550 (2017)
18.
Zurück zum Zitat Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.: Multi-attention recurrent network for human communication comprehension. arXiv:1802.00923 (2018) Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.: Multi-attention recurrent network for human communication comprehension. arXiv:​1802.​00923 (2018)
19.
Zurück zum Zitat Pei, W., Baltrusaitis, T., Tax, D.M.J., Morency, L.: Temporal attention-gated model for robust sequence classification. In: CVPR, pp. 820–829 (2017) Pei, W., Baltrusaitis, T., Tax, D.M.J., Morency, L.: Temporal attention-gated model for robust sequence classification. In: CVPR, pp. 820–829 (2017)
20.
Zurück zum Zitat Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. CoRR, abs/1304.5634 (2013) Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. CoRR, abs/1304.5634 (2013)
21.
Zurück zum Zitat Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, pp. 439–448 (2016) Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, pp. 439–448 (2016)
22.
Zurück zum Zitat Nojavanasghari, B., Gopinath, D., Koushik, J., Baltrusaitis, T., Morency, L.: Deep multimodal fusion for persuasiveness prediction. In: ICMI, pp. 284–288 (2016) Nojavanasghari, B., Gopinath, D., Koushik, J., Baltrusaitis, T., Morency, L.: Deep multimodal fusion for persuasiveness prediction. In: ICMI, pp. 284–288 (2016)
23.
Zurück zum Zitat Du, T., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR, pp. 4489–4497 (2015) Du, T., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR, pp. 4489–4497 (2015)
24.
Zurück zum Zitat Findler, N.V.: Short note on a heuristic search strategy in long-term memory networks. Inf. Process. Lett. 1(5), 191–196 (1972)CrossRef Findler, N.V.: Short note on a heuristic search strategy in long-term memory networks. Inf. Process. Lett. 1(5), 191–196 (1972)CrossRef
26.
Zurück zum Zitat Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR, pp. 1891–1898 (2014) Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR, pp. 1891–1898 (2014)
Metadaten
Titel
Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding
verfasst von
Jinna Lv
Bin Wu
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-05716-9_32