Top

International Journal of Computer Vision

Published in:

03-02-2020

Visual Social Relationship Recognition

Authors: Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

Published in: International Journal of Computer Vision | Issue 6/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Social relationships form the basis of social structure of humans. Developing computational models to understand social relationships from visual data is essential for building intelligent machines that can better interact with humans in a social environment. In this work, we study the problem of visual social relationship recognition in images. We propose a dual-glance model for social relationship recognition, where the first glance fixates at the person of interest and the second glance deploys attention mechanism to exploit contextual cues. To enable this study, we curated a large scale People in Social Context dataset, which comprises of 23,311 images and 79,244 person pairs with annotated social relationships. Since visually identifying social relationship bears certain degree of uncertainty, we further propose an adaptive focal loss to leverage the ambiguous annotations for more effective learning. We conduct extensive experiments to quantitatively and qualitatively demonstrate the efficacy of our proposed method, which yields state-of-the-art performance on social relationship recognition.

previous article Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

next article RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

The people are related based on their professions (e.g. co-worker, coach and player, boss and staff, etc).

One person is paying money to receive goods/service from the other (e.g. salesman and customer, tour guide and tourist, etc).

Agrawal, A., Batra, D., Parikh, D., & Kembhavi, A .(2018). Don’t just assume; look and answer: Overcoming priors for visual question answering. In CVPR (pp. 6904–6913).

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social LSTM: Human trajectory prediction in crowded spaces. In CVPR (pp. 961–971).

Alameda-Pineda, X., Staiano, J., Subramanian, R., Batrinca, L. M., Ricci, E., Lepri, B., et al. (2016). SALSA: A novel dataset for multimodal group behavior analysis. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8), 1707–1720.CrossRef

Alletto, S., Serra, G., Calderara, S., Solera, F., & Cucchiara, R. (2014). From ego to nos-vision: Detecting social relationships in first-person views. In CVPR workshops (pp. 594–599).

Chen, Y., Hsu, W. H., Liao, H. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACMMM (pp. 669–678).

Choi, W., & Savarese, S. (2012). A unified framework for multi-target tracking and collective activity recognition. ECCV, Lecture Notes in Computer Science, 7575, 215–230.CrossRef

Chu, X., Ouyang, W., Yang, W., & Wang, X .(2015). Multi-task recurrent neural network for immediacy prediction. In ICCV (pp. 3352–3360).

Conte, H. R., & Plutchik, R. (1981). A circumplex model for interpersonal personality traits. Journal of Personality and Social Psychology, 40(4), 701.CrossRef

Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In CVPR (pp. 4772–4781).

Dibeklioglu, H., Salah, A. A., & Gevers, T. (2013). Like father, like son: Facial expression dynamics for kinship verification. In ICCV (pp. 1497–1504).

Ding, L., & Yilmaz, A. (2014). Learning social relations from videos: Features, models, and analytics. In Human-Centered Social Media Analytics (pp. 21–41).

Direkoglu, C., & O’Connor, N. E. (2012). Team activity recognition in sports. ECCV, Lecture Notes in Computer Science, 7578, 69–83.CrossRef

Fan, L., Chen, Y., Wei, P., Wang, W., & Zhu, S. C. (2018). Inferring shared attention in social scene videos. In CVPR (pp. 6460–6468).

Fang, R., Tang, K. D., Snavely, N., & Chen, T. (2010). Towards computational models of kinship verification. In ICIP (pp. 1577–1580).

Fiske, A. P. (1992). The four elementary forms of sociality: framework for a unified theory of social relations. Psychological Review, 99(4), 689.CrossRef

Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In CVPR (pp. 256–263).

Gan, T., Wong, Y., Zhang, D., & Kankanhalli, M. S. (2013). Temporal encoded F-formation system for social interaction detection. In ACMMM (pp. 937–946).

Gao, B., Xing, C., Xie, C., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.MathSciNetCrossRef

Gkioxari, G., Girshick, R. B., Malik, J. (2015). Contextual action recognition with R*CNN. In ICCV (pp. 1080–1088).

Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. (2017). Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR (pp. 6325–6334).

Guo, Y., Dibeklioglu, H., van der Maaten, L. (2014). Graph-based kinship recognition. In ICPR (pp. 4287–4292).

Hall, E. T. (1959). The silent language (Vol. 3). New York: Doubleday.

Haslam, N. (1994). Categories of social relationship. Cognition, 53(1), 59–90.CrossRef

Haslam, N., & Fiske, A. P. (1992). Implicit relationship prototypes: Investigating five theories of the cognitive organization of social relationships. Journal of Experimental Social Psychology, 28(5), 441–474.CrossRef

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

Hung, H., Jayagopi, D. B., Yeo, C., Friedland, G., Ba, S. O., Odobez, J., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACMMM (pp. 835–838).

Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In CVPR (pp. 4565–4574).

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer vision, 123(1), 32–73.MathSciNetCrossRef

Lan, T., Sigal, L., & Mori, G. (2012a). Social roles in hierarchical models for human activity recognition. In CVPR (pp. 1354–1361).

Lan, T., Wang, Y., Yang, W., Robinovitch, S. N., & Mori, G. (2012b). Discriminative latent models for recognizing contextual group activities. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8), 1549–1562.CrossRef

Li, J., Wong, Y., Zhao, Q., & Kankanhalli, M. S. (2017a). Dual-glance model for deciphering social relationships. In ICCV (pp. 2650–2659).

Li, Y., Ouyang, W., Zhou, B., Wang, K., & Wang, X. (2017b). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1261–1270).

Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common objects in context. ECCV, Lecture Notes in Computer Science, 8693, 740–755.CrossRef

Lin, T., Goyal, P., Girshick, R. B., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In ICCV (pp. 2980–2988).

Lu, C., Krishna, R., Bernstein, M. S., & Fei-Fei, L. (2016). Visual relationship detection with language priors. ECCV, Lecture Notes in Computer Science, 9905, 852–869.CrossRef

Lv, J., Liu, W., Zhou, L., Wu, B., & Ma, H. (2018). Multi-stream fusion model for social relation recognition from videos. In MMM (pp. 355–368).

Marín-Jiménez, M. J., Zisserman, A., Eichner, M., & Ferrari, V. (2014). Detecting people looking at each other in videos. International Journal of Computer Vision, 106(3), 282–296.CrossRef

Maron, O., & Lozano-Pérez, T. (1997). A framework for multiple-instance learning. In NIPS (pp. 570–576).

Orekondy, T., Schiele, B., & Fritz, M. (2017). Towards a visual privacy advisor: Understanding and predicting privacy risks in images. In ICCV (pp. 3686–3695).

Qin, Z., & Shelton, C. R. (2016). Social grouping for multi-target tracking and head pose estimation in video. The IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), 2082–2095.CrossRef

Ramanathan, V., Yao, B., Fei-Fei, L. (2013). Social role discovery in human events. In CVPR (pp. 2475–2482).

Ren, S., He, K., Girshick, R. B., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).

Rienks, R., Zhang, D., Gatica-Perez, D., & Post, W. (2006). Detection and application of influence rankings in small group meetings. In ICMI (pp. 257–264).

Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning social etiquette: Human trajectory understanding in crowded scenes. ECCV, Lecture Notes in Computer Science, 9912, 549–565.CrossRef

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef

Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia, 11(7), 1373–1380.CrossRef

Shao, M., Li, L., & Fu, Y. (2013). What do you do? Occupation recognition in a photo via social context. In ICCV (pp. 3631–3638).

Shao, M., Xia, S., & Fu, Y. (2014). Identity and kinship relations in group pictures. In Human-centered social media analytics (pp. 175–190).

Sun, Q., Schiele, B., & Fritz, M. (2017). A domain based approach to social relation recognition. In CVPR (pp. 3481–3490).

Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). YFCC100M: The new data in multimedia research. Communications of the ACM, 59(2), 64–73.CrossRef

Vicol, P., Tapaswi, M., Castrejon, L., & Fidler, S. (2018). Moviegraphs: Towards understanding human-centric situations from videos. In CVPR (pp. 8581–8590).

Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. The IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef

Wang, G., Gallagher, A. C., Luo, J., & Forsyth, D. A. (2010). Seeing people in social context: Recognizing people and social relationships. ECCV, Lecture Notes in Computer Science, 6315, 169–182.CrossRef

Xia, S., Shao, M., Luo, J., & Fu, Y. (2012). Understanding kin relationships in a photo. IEEE Transactions on Multimedia, 14(4), 1046–1056.CrossRef

Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR (pp. 842–850).

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., & Salakhutdinov, R., et al. (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML (pp. 2048–2057).

Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR (pp. 3522–3529).

Yang, Z., He, X., Gao, J., Deng, L., Smola, A. (2016). Stacked attention networks for image question answering. In CVPR (pp. 21–29).

You, Q., Jin, H., Wang, Z., Fang, C., Luo, J. (2016). Image captioning with semantic attention. In CVPR (pp. 4651–4659).

Yun, K., Honorio, J., Chattopadhyay, D., Berg, T. L., & Samaras, D. (2012). Two-person interaction detection using body-pose features and multiple instance learning. In CVPR workshops (pp. 28–35).

Zhang, N., Paluri, M., Taigman, Y., Fergus, R., & Bourdev, L. D. (2015a). Beyond frontal faces: Improving person recognition using multiple cues. In CVPR (pp. 4804–4813).

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In ICCV (pp. 3631–3639).

Title: Visual Social Relationship Recognition
Authors: Junnan Li
Yongkang Wong
Qi Zhao
Mohan S. Kankanhalli
Publication date: 03-02-2020
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 6/2020
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01295-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 6/2020

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

Fine-Grained Person Re-identification

GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence

Representation Learning on Unit Ball with 3D Roto-translational Equivariance

Bottom-Up Scene Text Detection with Markov Clustering Networks

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Premium Partner