skip to main content
10.1145/2911996.2912016acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Best Multimodal paper

Multilingual Visual Sentiment Concept Matching

Published:06 June 2016Publication History

ABSTRACT

The impact of culture in visual emotion perception has recently captured the attention of multimedia research. In this study, we provide powerful computational linguistics tools to explore, retrieve and browse a dataset of 16K multilingual affective visual concepts and 7.3M Flickr images. First, we design an effective crowdsourcing experiment to collect human judgements of sentiment connected to the visual concepts. We then use word embeddings to represent these concepts in a low dimensional vector space, allowing us to expand the meaning around concepts, and thus enabling insight about commonalities and differences among different languages. We compare a variety of concept representations through a novel evaluation task based on the notion of visual semantic relatedness. Based on these representations, we design clustering schemes to group multilingual visual concepts, and evaluate them with novel metrics based on the crowdsourced sentiment annotations as well as visual semantic relatedness. The proposed clustering framework enables us to analyze the full multilingual dataset in-depth and also show an application on a facial data subset, exploring cultural insights of portrait-related affective visual concepts.

References

  1. B. Jou, T. Chen, N. Pappas, M. Redi, M. Topkara*, and S.-F. Chang, "Visual affect around the world: A large-scale multilingual visual sentiment ontology," in ACM International Conference on Multimedia, (Brisbane, Australia), pp. 159--168, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Liu, B. Jou, T. Chen, M. Topkara, N. Pappas, M. Redi, and S.-F. Chang, "Complura: Exploring and leveraging a large-scale multilingual visual sentiment ontology," in ACM Interational Conference on Multimedia Retrieval, (New York, NY, USA), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Turian, L. Ratinov, and Y. Bengio, "Word representations: A simple and general method for semi-supervised learning," in 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, (Uppsala, Sweden), pp. 384--394, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," Journal of Machine Learning Research, vol. 12, pp. 2493--2537, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," CoRR, vol. abs/1301.3781, 2013.Google ScholarGoogle Scholar
  6. J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Empirical Methods in Natural Language Processing, pp. 1532--1543, 2014.Google ScholarGoogle Scholar
  7. R. Al-Rfou, B. Perozzi, and S. Skiena, "Polyglot: Distributed word representations for multilingual NLP," CoRR, vol. abs/1307.1662, 2013.Google ScholarGoogle Scholar
  8. A. Klementiev, I. Titov, and B. Bhattarai, "Inducing crosslingual distributed representations of words," in Proceedings of COLING 2012, (Mumbai, India), pp. 1459--1474, 2012.Google ScholarGoogle Scholar
  9. W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, "Bilingual word embeddings for phrase-based machine translation," in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, (Seattle, WA, USA), pp. 1393--1398, 2013.Google ScholarGoogle Scholar
  10. K. M. Hermann and P. Blunsom, "Multilingual models for compositional distributed semantics," in Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 58--68, 2014.Google ScholarGoogle Scholar
  11. A. P. S. Chandar, S. Lauly, H. Larochelle, M. M. Khapra, B. Ravindran, V. C. Raykar, and A. Saha, "An autoencoder approach to learning bilingual word representations," CoRR, vol. abs/1402.1454, 2014.Google ScholarGoogle Scholar
  12. F. Hill, R. Reichart, and A. Korhonen, "Simlex-999: Evaluating semantic models with (genuine) similarity estimation," CoRR, vol. abs/1408.3456, 2014.Google ScholarGoogle Scholar
  13. E. Bruni, N. K. Tran, and M. Baroni, "Multimodal distributional semantics," Journal of Artificial Intelligence Research, vol. 49, pp. 1--47, Jan. 2014. Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Silberer and M. Lapata, "Learning grounded meaning representations with autoencoders," in 52nd Annual Meeting of the Association for Computational Linguistics, (Baltimore, Maryland), pp. 721--732, June 2014.Google ScholarGoogle Scholar
  15. A. Lazaridou, N. T. Pham, and M. Baroni, "Combining language and vision with a multimodal skip-gram model," in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Denver, Colorado), pp. 153--163, 2015.Google ScholarGoogle Scholar
  16. A. Karpathy, A. Joulin, and F. Li, "Deep fragment embeddings for bidirectional image sentence mapping," in Advances in Neural Information Processing Systems 27, pp. 1889--1897, Curran Associates, Inc., 2014.Google ScholarGoogle Scholar
  17. R. Kiros, R. Salakhutdinov, and R. S. Zemel, "Unifying visual-semantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014.Google ScholarGoogle Scholar
  18. R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," TACL, vol. 2, pp. 207--218, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks," CoRR, vol. abs/1410.1090, 2014.Google ScholarGoogle Scholar
  20. S. Kottur, R. Vedantam, J. M. F. Moura, and D. Parikh, "Visual word2vec (vis-w2v): Learning visually grounded word embeddings using abstract scenes," CoRR, vol. abs/1511.07067, 2015.Google ScholarGoogle Scholar
  21. T. Schnabel, I. Labutov, D. Mimno, and T. Joachims, "Evaluation methods for unsupervised word embeddings," in Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 298--307, 2015.Google ScholarGoogle Scholar
  22. O. Levy, Y. Goldberg, and I. Dagan, "Improving distributional similarity with lessons learned from word embeddings," Transactions of Association for Computational Linguistics, vol. 3, pp. 211--225, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems 26, pp. 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Lebret and R. Collobert, "Word embeddings through hellinger pca," in Conference of the European Chapter of the Association for Computational Linguistics, (Gothenburg, Sweden), pp. 482--490, 2014.Google ScholarGoogle Scholar
  25. M. Baroni and R. Zamparelli, "Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space," in Conference on Empirical Methods in Natural Language Processing, (Cambridge, MA, USA), pp. 1183--1193, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, "Semantic compositionality through recursive matrix-vector spaces," in Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (Jeju Island, Korea), pp. 1201--1211, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Schmid, "Probabilistic part-of-speech tagging using decision trees," in International Conference on New Methods in Language Processing, (Manchester, UK), 1994.Google ScholarGoogle Scholar
  28. W. A. Freiwald and D. Y. Tsao, "Neurons that keep a straight face," National Academy of Sciences, vol. 111, no. 22, pp. 7894--7895, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  29. M. Redi, N. Rasiwasia, G. Aggarwal, and A. Jaimes, "The beauty of capturing faces: Rating the quality of digital portraits," in IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, (Ljubljana, Slovenia), pp. 1--8, 2015.Google ScholarGoogle Scholar
  30. B. Jou, S. Bhattacharya, and S.-F. Chang, "Predicting viewer perceived emotions in animated GIFs," in ACM International Conference on Multimedia, (Orlando, Florida, USA), pp. 213--216, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Bakhshi, D. A. Shamma, and E. Gilbert, "Faces engage us: Photos with faces attract more likes and comments on instagram," in ACM Conference on Human Factors in Computing Systems, (Toronto, ON, Canada), pp. 965--974, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Liao, A. K. Jain, and S. Z. Li, "A fast and accurate unconstrained face detector," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 211--223, Feb 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multilingual Visual Sentiment Concept Matching

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
                June 2016
                452 pages
                ISBN:9781450343596
                DOI:10.1145/2911996

                Copyright © 2016 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 6 June 2016

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                ICMR '16 Paper Acceptance Rate20of120submissions,17%Overall Acceptance Rate254of830submissions,31%

                Upcoming Conference

                ICMR '24
                International Conference on Multimedia Retrieval
                June 10 - 14, 2024
                Phuket , Thailand

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader