ABSTRACT
The impact of culture in visual emotion perception has recently captured the attention of multimedia research. In this study, we provide powerful computational linguistics tools to explore, retrieve and browse a dataset of 16K multilingual affective visual concepts and 7.3M Flickr images. First, we design an effective crowdsourcing experiment to collect human judgements of sentiment connected to the visual concepts. We then use word embeddings to represent these concepts in a low dimensional vector space, allowing us to expand the meaning around concepts, and thus enabling insight about commonalities and differences among different languages. We compare a variety of concept representations through a novel evaluation task based on the notion of visual semantic relatedness. Based on these representations, we design clustering schemes to group multilingual visual concepts, and evaluate them with novel metrics based on the crowdsourced sentiment annotations as well as visual semantic relatedness. The proposed clustering framework enables us to analyze the full multilingual dataset in-depth and also show an application on a facial data subset, exploring cultural insights of portrait-related affective visual concepts.
- B. Jou, T. Chen, N. Pappas, M. Redi, M. Topkara*, and S.-F. Chang, "Visual affect around the world: A large-scale multilingual visual sentiment ontology," in ACM International Conference on Multimedia, (Brisbane, Australia), pp. 159--168, 2015. Google ScholarDigital Library
- H. Liu, B. Jou, T. Chen, M. Topkara, N. Pappas, M. Redi, and S.-F. Chang, "Complura: Exploring and leveraging a large-scale multilingual visual sentiment ontology," in ACM Interational Conference on Multimedia Retrieval, (New York, NY, USA), 2016. Google ScholarDigital Library
- J. Turian, L. Ratinov, and Y. Bengio, "Word representations: A simple and general method for semi-supervised learning," in 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, (Uppsala, Sweden), pp. 384--394, 2010. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," Journal of Machine Learning Research, vol. 12, pp. 2493--2537, 2011. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," CoRR, vol. abs/1301.3781, 2013.Google Scholar
- J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Empirical Methods in Natural Language Processing, pp. 1532--1543, 2014.Google Scholar
- R. Al-Rfou, B. Perozzi, and S. Skiena, "Polyglot: Distributed word representations for multilingual NLP," CoRR, vol. abs/1307.1662, 2013.Google Scholar
- A. Klementiev, I. Titov, and B. Bhattarai, "Inducing crosslingual distributed representations of words," in Proceedings of COLING 2012, (Mumbai, India), pp. 1459--1474, 2012.Google Scholar
- W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, "Bilingual word embeddings for phrase-based machine translation," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, (Seattle, WA, USA), pp. 1393--1398, 2013.Google Scholar
- K. M. Hermann and P. Blunsom, "Multilingual models for compositional distributed semantics," in Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 58--68, 2014.Google Scholar
- A. P. S. Chandar, S. Lauly, H. Larochelle, M. M. Khapra, B. Ravindran, V. C. Raykar, and A. Saha, "An autoencoder approach to learning bilingual word representations," CoRR, vol. abs/1402.1454, 2014.Google Scholar
- F. Hill, R. Reichart, and A. Korhonen, "Simlex-999: Evaluating semantic models with (genuine) similarity estimation," CoRR, vol. abs/1408.3456, 2014.Google Scholar
- E. Bruni, N. K. Tran, and M. Baroni, "Multimodal distributional semantics," Journal of Artificial Intelligence Research, vol. 49, pp. 1--47, Jan. 2014. Google ScholarCross Ref
- C. Silberer and M. Lapata, "Learning grounded meaning representations with autoencoders," in 52nd Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 721--732, June 2014.Google Scholar
- A. Lazaridou, N. T. Pham, and M. Baroni, "Combining language and vision with a multimodal skip-gram model," in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Denver, Colorado), pp. 153--163, 2015.Google Scholar
- A. Karpathy, A. Joulin, and F. Li, "Deep fragment embeddings for bidirectional image sentence mapping," in Advances in Neural Information Processing Systems 27, pp. 1889--1897, Curran Associates, Inc., 2014.Google Scholar
- R. Kiros, R. Salakhutdinov, and R. S. Zemel, "Unifying visual-semantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014.Google Scholar
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," TACL, vol. 2, pp. 207--218, 2014.Google ScholarCross Ref
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks," CoRR, vol. abs/1410.1090, 2014.Google Scholar
- S. Kottur, R. Vedantam, J. M. F. Moura, and D. Parikh, "Visual word2vec (vis-w2v): Learning visually grounded word embeddings using abstract scenes," CoRR, vol. abs/1511.07067, 2015.Google Scholar
- T. Schnabel, I. Labutov, D. Mimno, and T. Joachims, "Evaluation methods for unsupervised word embeddings," in Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 298--307, 2015.Google Scholar
- O. Levy, Y. Goldberg, and I. Dagan, "Improving distributional similarity with lessons learned from word embeddings," Transactions of Association for Computational Linguistics, vol. 3, pp. 211--225, 2015.Google ScholarCross Ref
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems 26, pp. 3111--3119, 2013.Google ScholarDigital Library
- R. Lebret and R. Collobert, "Word embeddings through hellinger pca," in Conference of the European Chapter of the Association for Computational Linguistics, (Gothenburg, Sweden), pp. 482--490, 2014.Google Scholar
- M. Baroni and R. Zamparelli, "Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space," in Conference on Empirical Methods in Natural Language Processing, (Cambridge, MA, USA), pp. 1183--1193, 2010. Google ScholarDigital Library
- R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, "Semantic compositionality through recursive matrix-vector spaces," in Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (Jeju Island, Korea), pp. 1201--1211, 2012. Google ScholarDigital Library
- H. Schmid, "Probabilistic part-of-speech tagging using decision trees," in International Conference on New Methods in Language Processing, (Manchester, UK), 1994.Google Scholar
- W. A. Freiwald and D. Y. Tsao, "Neurons that keep a straight face," National Academy of Sciences, vol. 111, no. 22, pp. 7894--7895, 2014.Google ScholarCross Ref
- M. Redi, N. Rasiwasia, G. Aggarwal, and A. Jaimes, "The beauty of capturing faces: Rating the quality of digital portraits," in IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, (Ljubljana, Slovenia), pp. 1--8, 2015.Google Scholar
- B. Jou, S. Bhattacharya, and S.-F. Chang, "Predicting viewer perceived emotions in animated GIFs," in ACM International Conference on Multimedia, (Orlando, Florida, USA), pp. 213--216, 2014. Google ScholarDigital Library
- S. Bakhshi, D. A. Shamma, and E. Gilbert, "Faces engage us: Photos with faces attract more likes and comments on instagram," in ACM Conference on Human Factors in Computing Systems, (Toronto, ON, Canada), pp. 965--974, 2014. Google ScholarDigital Library
- S. Liao, A. K. Jain, and S. Z. Li, "A fast and accurate unconstrained face detector," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 211--223, Feb 2016. Google ScholarDigital Library
Index Terms
- Multilingual Visual Sentiment Concept Matching
Recommendations
SentiCart: Cartography and Geo-contextualization for Multilingual Visual Sentiment
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalWhere in the world are pictures of cute animals or ancient architecture most shared from? And are they equally sentimentally perceived across different languages? We demonstrate a series of visualization tools, that we collectively call SentiCart, for ...
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
MM '15: Proceedings of the 23rd ACM international conference on MultimediaEvery culture and language is unique. Our work expressly focuses on the uniqueness of culture and language in relation to human affect, specifically sentiment and emotion semantics, and how they manifest in social multimedia. We develop sets of ...
SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content
MM '13: Proceedings of the 21st ACM international conference on MultimediaA picture is worth one thousand words, but what words should be used to describe the sentiment and emotions conveyed in the increasingly popular social multimedia? We demonstrate a novel system which combines sound structures from psychology and the ...
Comments