Skip to main content
Top
Published in: International Journal of Computer Vision 6/2020

03-02-2020

Visual Social Relationship Recognition

Authors: Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Published in: International Journal of Computer Vision | Issue 6/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Social relationships form the basis of social structure of humans. Developing computational models to understand social relationships from visual data is essential for building intelligent machines that can better interact with humans in a social environment. In this work, we study the problem of visual social relationship recognition in images. We propose a dual-glance model for social relationship recognition, where the first glance fixates at the person of interest and the second glance deploys attention mechanism to exploit contextual cues. To enable this study, we curated a large scale People in Social Context dataset, which comprises of 23,311 images and 79,244 person pairs with annotated social relationships. Since visually identifying social relationship bears certain degree of uncertainty, we further propose an adaptive focal loss to leverage the ambiguous annotations for more effective learning. We conduct extensive experiments to quantitatively and qualitatively demonstrate the efficacy of our proposed method, which yields state-of-the-art performance on social relationship recognition.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
The people are related based on their professions (e.g. co-worker, coach and player, boss and staff, etc).
 
2
One person is paying money to receive goods/service from the other (e.g. salesman and customer, tour guide and tourist, etc).
 
Literature
go back to reference Agrawal, A., Batra, D., Parikh, D., & Kembhavi, A .(2018). Don’t just assume; look and answer: Overcoming priors for visual question answering. In CVPR (pp. 6904–6913). Agrawal, A., Batra, D., Parikh, D., & Kembhavi, A .(2018). Don’t just assume; look and answer: Overcoming priors for visual question answering. In CVPR (pp. 6904–6913).
go back to reference Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social LSTM: Human trajectory prediction in crowded spaces. In CVPR (pp. 961–971). Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social LSTM: Human trajectory prediction in crowded spaces. In CVPR (pp. 961–971).
go back to reference Alameda-Pineda, X., Staiano, J., Subramanian, R., Batrinca, L. M., Ricci, E., Lepri, B., et al. (2016). SALSA: A novel dataset for multimodal group behavior analysis. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8), 1707–1720.CrossRef Alameda-Pineda, X., Staiano, J., Subramanian, R., Batrinca, L. M., Ricci, E., Lepri, B., et al. (2016). SALSA: A novel dataset for multimodal group behavior analysis. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8), 1707–1720.CrossRef
go back to reference Alletto, S., Serra, G., Calderara, S., Solera, F., & Cucchiara, R. (2014). From ego to nos-vision: Detecting social relationships in first-person views. In CVPR workshops (pp. 594–599). Alletto, S., Serra, G., Calderara, S., Solera, F., & Cucchiara, R. (2014). From ego to nos-vision: Detecting social relationships in first-person views. In CVPR workshops (pp. 594–599).
go back to reference Chen, Y., Hsu, W. H., Liao, H. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACMMM (pp. 669–678). Chen, Y., Hsu, W. H., Liao, H. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACMMM (pp. 669–678).
go back to reference Choi, W., & Savarese, S. (2012). A unified framework for multi-target tracking and collective activity recognition. ECCV, Lecture Notes in Computer Science, 7575, 215–230.CrossRef Choi, W., & Savarese, S. (2012). A unified framework for multi-target tracking and collective activity recognition. ECCV, Lecture Notes in Computer Science, 7575, 215–230.CrossRef
go back to reference Chu, X., Ouyang, W., Yang, W., & Wang, X .(2015). Multi-task recurrent neural network for immediacy prediction. In ICCV (pp. 3352–3360). Chu, X., Ouyang, W., Yang, W., & Wang, X .(2015). Multi-task recurrent neural network for immediacy prediction. In ICCV (pp. 3352–3360).
go back to reference Conte, H. R., & Plutchik, R. (1981). A circumplex model for interpersonal personality traits. Journal of Personality and Social Psychology, 40(4), 701.CrossRef Conte, H. R., & Plutchik, R. (1981). A circumplex model for interpersonal personality traits. Journal of Personality and Social Psychology, 40(4), 701.CrossRef
go back to reference Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In CVPR (pp. 4772–4781). Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In CVPR (pp. 4772–4781).
go back to reference Dibeklioglu, H., Salah, A. A., & Gevers, T. (2013). Like father, like son: Facial expression dynamics for kinship verification. In ICCV (pp. 1497–1504). Dibeklioglu, H., Salah, A. A., & Gevers, T. (2013). Like father, like son: Facial expression dynamics for kinship verification. In ICCV (pp. 1497–1504).
go back to reference Ding, L., & Yilmaz, A. (2014). Learning social relations from videos: Features, models, and analytics. In Human-Centered Social Media Analytics (pp. 21–41). Ding, L., & Yilmaz, A. (2014). Learning social relations from videos: Features, models, and analytics. In Human-Centered Social Media Analytics (pp. 21–41).
go back to reference Direkoglu, C., & O’Connor, N. E. (2012). Team activity recognition in sports. ECCV, Lecture Notes in Computer Science, 7578, 69–83.CrossRef Direkoglu, C., & O’Connor, N. E. (2012). Team activity recognition in sports. ECCV, Lecture Notes in Computer Science, 7578, 69–83.CrossRef
go back to reference Fan, L., Chen, Y., Wei, P., Wang, W., & Zhu, S. C. (2018). Inferring shared attention in social scene videos. In CVPR (pp. 6460–6468). Fan, L., Chen, Y., Wei, P., Wang, W., & Zhu, S. C. (2018). Inferring shared attention in social scene videos. In CVPR (pp. 6460–6468).
go back to reference Fang, R., Tang, K. D., Snavely, N., & Chen, T. (2010). Towards computational models of kinship verification. In ICIP (pp. 1577–1580). Fang, R., Tang, K. D., Snavely, N., & Chen, T. (2010). Towards computational models of kinship verification. In ICIP (pp. 1577–1580).
go back to reference Fiske, A. P. (1992). The four elementary forms of sociality: framework for a unified theory of social relations. Psychological Review, 99(4), 689.CrossRef Fiske, A. P. (1992). The four elementary forms of sociality: framework for a unified theory of social relations. Psychological Review, 99(4), 689.CrossRef
go back to reference Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In CVPR (pp. 256–263). Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In CVPR (pp. 256–263).
go back to reference Gan, T., Wong, Y., Zhang, D., & Kankanhalli, M. S. (2013). Temporal encoded F-formation system for social interaction detection. In ACMMM (pp. 937–946). Gan, T., Wong, Y., Zhang, D., & Kankanhalli, M. S. (2013). Temporal encoded F-formation system for social interaction detection. In ACMMM (pp. 937–946).
go back to reference Gao, B., Xing, C., Xie, C., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.MathSciNetCrossRef Gao, B., Xing, C., Xie, C., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.MathSciNetCrossRef
go back to reference Gkioxari, G., Girshick, R. B., Malik, J. (2015). Contextual action recognition with R*CNN. In ICCV (pp. 1080–1088). Gkioxari, G., Girshick, R. B., Malik, J. (2015). Contextual action recognition with R*CNN. In ICCV (pp. 1080–1088).
go back to reference Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. (2017). Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR (pp. 6325–6334). Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. (2017). Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR (pp. 6325–6334).
go back to reference Guo, Y., Dibeklioglu, H., van der Maaten, L. (2014). Graph-based kinship recognition. In ICPR (pp. 4287–4292). Guo, Y., Dibeklioglu, H., van der Maaten, L. (2014). Graph-based kinship recognition. In ICPR (pp. 4287–4292).
go back to reference Hall, E. T. (1959). The silent language (Vol. 3). New York: Doubleday. Hall, E. T. (1959). The silent language (Vol. 3). New York: Doubleday.
go back to reference Haslam, N. (1994). Categories of social relationship. Cognition, 53(1), 59–90.CrossRef Haslam, N. (1994). Categories of social relationship. Cognition, 53(1), 59–90.CrossRef
go back to reference Haslam, N., & Fiske, A. P. (1992). Implicit relationship prototypes: Investigating five theories of the cognitive organization of social relationships. Journal of Experimental Social Psychology, 28(5), 441–474.CrossRef Haslam, N., & Fiske, A. P. (1992). Implicit relationship prototypes: Investigating five theories of the cognitive organization of social relationships. Journal of Experimental Social Psychology, 28(5), 441–474.CrossRef
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
go back to reference Hung, H., Jayagopi, D. B., Yeo, C., Friedland, G., Ba, S. O., Odobez, J., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACMMM (pp. 835–838). Hung, H., Jayagopi, D. B., Yeo, C., Friedland, G., Ba, S. O., Odobez, J., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACMMM (pp. 835–838).
go back to reference Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In CVPR (pp. 4565–4574). Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In CVPR (pp. 4565–4574).
go back to reference Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer vision, 123(1), 32–73.MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer vision, 123(1), 32–73.MathSciNetCrossRef
go back to reference Lan, T., Sigal, L., & Mori, G. (2012a). Social roles in hierarchical models for human activity recognition. In CVPR (pp. 1354–1361). Lan, T., Sigal, L., & Mori, G. (2012a). Social roles in hierarchical models for human activity recognition. In CVPR (pp. 1354–1361).
go back to reference Lan, T., Wang, Y., Yang, W., Robinovitch, S. N., & Mori, G. (2012b). Discriminative latent models for recognizing contextual group activities. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8), 1549–1562.CrossRef Lan, T., Wang, Y., Yang, W., Robinovitch, S. N., & Mori, G. (2012b). Discriminative latent models for recognizing contextual group activities. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8), 1549–1562.CrossRef
go back to reference Li, J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2017a). Dual-glance model for deciphering social relationships. In ICCV (pp. 2650–2659). Li, J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2017a). Dual-glance model for deciphering social relationships. In ICCV (pp. 2650–2659).
go back to reference Li, Y., Ouyang, W., Zhou, B., Wang, K., & Wang, X. (2017b). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1261–1270). Li, Y., Ouyang, W., Zhou, B., Wang, K., & Wang, X. (2017b). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1261–1270).
go back to reference Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. ECCV, Lecture Notes in Computer Science, 8693, 740–755.CrossRef Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. ECCV, Lecture Notes in Computer Science, 8693, 740–755.CrossRef
go back to reference Lin, T., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In ICCV (pp. 2980–2988). Lin, T., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In ICCV (pp. 2980–2988).
go back to reference Lu, C., Krishna, R., Bernstein, M. S., & Fei-Fei, L. (2016). Visual relationship detection with language priors. ECCV, Lecture Notes in Computer Science, 9905, 852–869.CrossRef Lu, C., Krishna, R., Bernstein, M. S., & Fei-Fei, L. (2016). Visual relationship detection with language priors. ECCV, Lecture Notes in Computer Science, 9905, 852–869.CrossRef
go back to reference Lv, J., Liu, W., Zhou, L., Wu, B., & Ma, H. (2018). Multi-stream fusion model for social relation recognition from videos. In MMM (pp. 355–368). Lv, J., Liu, W., Zhou, L., Wu, B., & Ma, H. (2018). Multi-stream fusion model for social relation recognition from videos. In MMM (pp. 355–368).
go back to reference Marín-Jiménez, M. J., Zisserman, A., Eichner, M., & Ferrari, V. (2014). Detecting people looking at each other in videos. International Journal of Computer Vision, 106(3), 282–296.CrossRef Marín-Jiménez, M. J., Zisserman, A., Eichner, M., & Ferrari, V. (2014). Detecting people looking at each other in videos. International Journal of Computer Vision, 106(3), 282–296.CrossRef
go back to reference Maron, O., & Lozano-Pérez, T. (1997). A framework for multiple-instance learning. In NIPS (pp. 570–576). Maron, O., & Lozano-Pérez, T. (1997). A framework for multiple-instance learning. In NIPS (pp. 570–576).
go back to reference Orekondy, T., Schiele, B., & Fritz, M. (2017). Towards a visual privacy advisor: Understanding and predicting privacy risks in images. In ICCV (pp. 3686–3695). Orekondy, T., Schiele, B., & Fritz, M. (2017). Towards a visual privacy advisor: Understanding and predicting privacy risks in images. In ICCV (pp. 3686–3695).
go back to reference Qin, Z., & Shelton, C. R. (2016). Social grouping for multi-target tracking and head pose estimation in video. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2082–2095.CrossRef Qin, Z., & Shelton, C. R. (2016). Social grouping for multi-target tracking and head pose estimation in video. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2082–2095.CrossRef
go back to reference Ramanathan, V., Yao, B., Fei-Fei, L. (2013). Social role discovery in human events. In CVPR (pp. 2475–2482). Ramanathan, V., Yao, B., Fei-Fei, L. (2013). Social role discovery in human events. In CVPR (pp. 2475–2482).
go back to reference Ren, S., He, K., Girshick, R. B., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99). Ren, S., He, K., Girshick, R. B., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
go back to reference Rienks, R., Zhang, D., Gatica-Perez, D., & Post, W. (2006). Detection and application of influence rankings in small group meetings. In ICMI (pp. 257–264). Rienks, R., Zhang, D., Gatica-Perez, D., & Post, W. (2006). Detection and application of influence rankings in small group meetings. In ICMI (pp. 257–264).
go back to reference Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning social etiquette: Human trajectory understanding in crowded scenes. ECCV, Lecture Notes in Computer Science, 9912, 549–565.CrossRef Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning social etiquette: Human trajectory understanding in crowded scenes. ECCV, Lecture Notes in Computer Science, 9912, 549–565.CrossRef
go back to reference Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
go back to reference Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia, 11(7), 1373–1380.CrossRef Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia, 11(7), 1373–1380.CrossRef
go back to reference Shao, M., Li, L., & Fu, Y. (2013). What do you do? Occupation recognition in a photo via social context. In ICCV (pp. 3631–3638). Shao, M., Li, L., & Fu, Y. (2013). What do you do? Occupation recognition in a photo via social context. In ICCV (pp. 3631–3638).
go back to reference Shao, M., Xia, S., & Fu, Y. (2014). Identity and kinship relations in group pictures. In Human-centered social media analytics (pp. 175–190). Shao, M., Xia, S., & Fu, Y. (2014). Identity and kinship relations in group pictures. In Human-centered social media analytics (pp. 175–190).
go back to reference Sun, Q., Schiele, B., & Fritz, M. (2017). A domain based approach to social relation recognition. In CVPR (pp. 3481–3490). Sun, Q., Schiele, B., & Fritz, M. (2017). A domain based approach to social relation recognition. In CVPR (pp. 3481–3490).
go back to reference Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2), 64–73.CrossRef Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2), 64–73.CrossRef
go back to reference Vicol, P., Tapaswi, M., Castrejon, L., & Fidler, S. (2018). Moviegraphs: Towards understanding human-centric situations from videos. In CVPR (pp. 8581–8590). Vicol, P., Tapaswi, M., Castrejon, L., & Fidler, S. (2018). Moviegraphs: Towards understanding human-centric situations from videos. In CVPR (pp. 8581–8590).
go back to reference Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. The IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. The IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef
go back to reference Wang, G., Gallagher, A. C., Luo, J., & Forsyth, D. A. (2010). Seeing people in social context: Recognizing people and social relationships. ECCV, Lecture Notes in Computer Science, 6315, 169–182.CrossRef Wang, G., Gallagher, A. C., Luo, J., & Forsyth, D. A. (2010). Seeing people in social context: Recognizing people and social relationships. ECCV, Lecture Notes in Computer Science, 6315, 169–182.CrossRef
go back to reference Xia, S., Shao, M., Luo, J., & Fu, Y. (2012). Understanding kin relationships in a photo. IEEE Transactions on Multimedia, 14(4), 1046–1056.CrossRef Xia, S., Shao, M., Luo, J., & Fu, Y. (2012). Understanding kin relationships in a photo. IEEE Transactions on Multimedia, 14(4), 1046–1056.CrossRef
go back to reference Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR (pp. 842–850). Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR (pp. 842–850).
go back to reference Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., & Salakhutdinov, R., et al. (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML (pp. 2048–2057). Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., & Salakhutdinov, R., et al. (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML (pp. 2048–2057).
go back to reference Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR (pp. 3522–3529). Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR (pp. 3522–3529).
go back to reference Yang, Z., He, X., Gao, J., Deng, L., Smola, A. (2016). Stacked attention networks for image question answering. In CVPR (pp. 21–29). Yang, Z., He, X., Gao, J., Deng, L., Smola, A. (2016). Stacked attention networks for image question answering. In CVPR (pp. 21–29).
go back to reference You, Q., Jin, H., Wang, Z., Fang, C., Luo, J. (2016). Image captioning with semantic attention. In CVPR (pp. 4651–4659). You, Q., Jin, H., Wang, Z., Fang, C., Luo, J. (2016). Image captioning with semantic attention. In CVPR (pp. 4651–4659).
go back to reference Yun, K., Honorio, J., Chattopadhyay, D., Berg, T. L., & Samaras, D. (2012). Two-person interaction detection using body-pose features and multiple instance learning. In CVPR workshops (pp. 28–35). Yun, K., Honorio, J., Chattopadhyay, D., Berg, T. L., & Samaras, D. (2012). Two-person interaction detection using body-pose features and multiple instance learning. In CVPR workshops (pp. 28–35).
go back to reference Zhang, N., Paluri, M., Taigman, Y., Fergus, R., & Bourdev, L. D. (2015a). Beyond frontal faces: Improving person recognition using multiple cues. In CVPR (pp. 4804–4813). Zhang, N., Paluri, M., Taigman, Y., Fergus, R., & Bourdev, L. D. (2015a). Beyond frontal faces: Improving person recognition using multiple cues. In CVPR (pp. 4804–4813).
go back to reference Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In ICCV (pp. 3631–3639). Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In ICCV (pp. 3631–3639).
Metadata
Title
Visual Social Relationship Recognition
Authors
Junnan Li
Yongkang Wong
Qi Zhao
Mohan S. Kankanhalli
Publication date
03-02-2020
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 6/2020
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01295-1

Other articles of this Issue 6/2020

International Journal of Computer Vision 6/2020 Go to the issue

Premium Partner