Skip to main content
Top
Published in: International Journal of Computer Vision 3/2015

01-07-2015

Discovering Beautiful Attributes for Aesthetic Image Analysis

Authors: Luca Marchesotti, Naila Murray, Florent Perronnin

Published in: International Journal of Computer Vision | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Aesthetic image analysis is the study and assessment of the aesthetic properties of images. Current computational approaches to aesthetic image analysis either provide accurate or interpretable results. To obtain both accuracy and interpretability by humans, we advocate the use of learned and nameable visual attributes as mid-level features. For this purpose, we propose to discover and learn the visual appearance of attributes automatically, using a recently introduced database, called AVA, which contains more than 250,000 images together with their aesthetic scores and textual comments given by photography enthusiasts. We provide a detailed analysis of these annotations as well as the context in which they were given. We then describe how these three key components of AVA—images, scores, and comments—can be effectively leveraged to learn visual attributes. Lastly, we show that these learned attributes can be successfully used in three applications: aesthetic quality prediction, image tagging and retrieval.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
http://​www.​dpchallenge.​com/​forum.​php?​action=​read%26FORUM_​THREAD_​ID=​19842.
Table 4
Statistics on comments in AVA
Statistics
During challenge
After challenge
Overall
Comments per image (\(\mu \) and \(\sigma \))
9.99 (8.41)
1.49 (4.77)
11.49 (11.12)
Words per comment (\(\mu \) and \(\sigma \))
16.10 (8.24)
43.51 (61.74)
18.12 (11.55)
On average, an image tends to have about 11 comments, with a comment having about 18 words on average. As the statistics in columns 2 and 3 attest however, commenting behavior is quite different during and after challenges
 
Literature
go back to reference “aesthetics” E .(2012). The American Heritage\({\textregistered }\) Dictionary of the English Language, Fourth Edition. “aesthetics” E .(2012). The American Heritage\({\textregistered }\) Dictionary of the English Language, Fourth Edition.
go back to reference Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2014). Good practice in large-scale learning for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 507–520.CrossRef Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2014). Good practice in large-scale learning for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 507–520.CrossRef
go back to reference Bekkerman, R., & Allan, J. (2004). Using bigrams in text categorization. Technical Report IR-408 Department of Computer Science, University of Massachusetts, Amherst, MA. Bekkerman, R., & Allan, J. (2004). Using bigrams in text categorization. Technical Report IR-408 Department of Computer Science, University of Massachusetts, Amherst, MA.
go back to reference Berg, A. C., Berg, T. L., Daume, H., Dodge, J., Goyal, A., Han, X., Mensch, A., Mitchell, M., Sood, A., & Stratos, K., et al. (2012). Understanding and predicting importance in images. In CVPR, pp. 3562–3569. Berg, A. C., Berg, T. L., Daume, H., Dodge, J., Goyal, A., Han, X., Mensch, A., Mitchell, M., Sood, A., & Stratos, K., et al. (2012). Understanding and predicting importance in images. In CVPR, pp. 3562–3569.
go back to reference Berg, T., Berg, A., & Shih, J. (2010). Automatic attribute discovery and characterization from noisy web data. In ECCV. Berg, T., Berg, A., & Shih, J. (2010). Automatic attribute discovery and characterization from noisy web data. In ECCV.
go back to reference Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS. Bottou, L., & Bousquet, O. (2007). The tradeoffs of large scale learning. In NIPS.
go back to reference Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.
go back to reference Chatterjee, A. (2011). Neuroaesthetics: A coming of age story. Journal of Cognitive Neuroscience, 23(1), 53–62.CrossRef Chatterjee, A. (2011). Neuroaesthetics: A coming of age story. Journal of Cognitive Neuroscience, 23(1), 53–62.CrossRef
go back to reference Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). Xrce participation to ImageEval. In ImageEval Workshop at CVIR. Clinchant, S., Csurka, G., Perronnin, F., & Renders, J. M. (2007). Xrce participation to ImageEval. In ImageEval Workshop at CVIR.
go back to reference Cramer, D., & Howitt, D. (2004). The SAGE dictionary of statistics, 1st Edn. SAGE, p. 21 (entry “ceiling effect”), p. 67 (entry “floor effect”). Cramer, D., & Howitt, D. (2004). The SAGE dictionary of statistics, 1st Edn. SAGE, p. 21 (entry “ceiling effect”), p. 67 (entry “floor effect”).
go back to reference Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2, 265–292.MATH Crammer, K., & Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2, 265–292.MATH
go back to reference Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV Workshop. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV SLCV Workshop.
go back to reference Datta, R., & Wang, J. Z. (2010). Acquine: Aesthetic quality inference engine—real-time automatic rating of photo aesthetics. In MIR. Datta, R., & Wang, J. Z. (2010). Acquine: Aesthetic quality inference engine—real-time automatic rating of photo aesthetics. In MIR.
go back to reference Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2006). Studying aesthetics in photographic images using a computational approach. In ECCV. Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2006). Studying aesthetics in photographic images using a computational approach. In ECCV.
go back to reference Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In ICIP. Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In ICIP.
go back to reference Dhar, S., Ordonez, V., & Berg, T. (2011). High level describable attributes for predicting aesthetics and interestingness. In CVPR. Dhar, S., Ordonez, V., & Berg, T. (2011). High level describable attributes for predicting aesthetics and interestingness. In CVPR.
go back to reference Donahue, J., & Grauman, K. (2011). Annotator rationales for visual recognition. In ICCV. Donahue, J., & Grauman, K. (2011). Annotator rationales for visual recognition. In ICCV.
go back to reference Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In CVPR. Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In CVPR.
go back to reference Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR. Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR.
go back to reference Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In NIPS. Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In NIPS.
go back to reference Geng, B., Yang, L., Xu, C., Hua, X., & Li, S. (2011). The role of attractiveness in web image search. In ACM-MM. Geng, B., Yang, L., Xu, C., Hua, X., & Li, S. (2011). The role of attractiveness in web image search. In ACM-MM.
go back to reference Gracyk, T. (2011). Hume’s aesthetics. In: E. N. Zalta (Ed.) The Stanford encyclopedia of philosophy, winter 2011 edn. Gracyk, T. (2011). Hume’s aesthetics. In: E. N. Zalta (Ed.) The Stanford encyclopedia of philosophy, winter 2011 edn.
go back to reference Hammermeister, K. (2002). The German aesthetic tradition. Cambridge, MA: Cambridge University Press.CrossRef Hammermeister, K. (2002). The German aesthetic tradition. Cambridge, MA: Cambridge University Press.CrossRef
go back to reference Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196. Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196.
go back to reference Isola, P., Parikh, D., Torralba, A., & Oliva, A. (2011). Understanding the intrinsic memorability of images. In NIPS. Isola, P., Parikh, D., Torralba, A., & Oliva, A. (2011). Understanding the intrinsic memorability of images. In NIPS.
go back to reference Jacobson, E., & Ostwald, W. (1946). The color harmony manual, large chip edition. Chicago: Container Corporation. Jacobson, E., & Ostwald, W. (1946). The color harmony manual, large chip edition. Chicago: Container Corporation.
go back to reference Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE TPAMI. Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE TPAMI.
go back to reference Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In ECML. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In ECML.
go back to reference Joshi, D., Datta, R., Fedorovskaya, E., Luong, Q., Wang, J., Li, J., et al. (2011). Aesthetics and emotions in images. IEEE on Signal Processing Magazine, 28(5), 94–115.CrossRef Joshi, D., Datta, R., Fedorovskaya, E., Luong, Q., Wang, J., Li, J., et al. (2011). Aesthetics and emotions in images. IEEE on Signal Processing Magazine, 28(5), 94–115.CrossRef
go back to reference Ke, Y., Tang, X., & Jing, F. (2006). The design of high-level features for photo quality assessment. In CVPR. Ke, Y., Tang, X., & Jing, F. (2006). The design of high-level features for photo quality assessment. In CVPR.
go back to reference Kodak. (1987). How to take good pictures: A photo guide (35th ed.). New York, NY: Ballantine Books. Kodak. (1987). How to take good pictures: A photo guide (35th ed.). New York, NY: Ballantine Books.
go back to reference Krages, B. (2005). Photography: The art of composition. New York, US: Allworth Press. Krages, B. (2005). Photography: The art of composition. New York, US: Allworth Press.
go back to reference Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR. Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.
go back to reference Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
go back to reference Leder, H., Belke, B., Oeberst, A., & Augustin, D. (2004). A model of aesthetic appreciation and aesthetic judgments. British Journal of Psychology, 95(4), 489–508.CrossRef Leder, H., Belke, B., Oeberst, A., & Augustin, D. (2004). A model of aesthetic appreciation and aesthetic judgments. British Journal of Psychology, 95(4), 489–508.CrossRef
go back to reference Li, C., Loui, A. C., & Chen, T. (2010). Towards aesthetics: A photo quality assessment and photo selection system. In ACM-MM. Li, C., Loui, A. C., & Chen, T. (2010). Towards aesthetics: A photo quality assessment and photo selection system. In ACM-MM.
go back to reference Lowe, D. (1999). Object recognition from local scale-invariant features. In ICCV. Lowe, D. (1999). Object recognition from local scale-invariant features. In ICCV.
go back to reference Luo, W., Wang, X., & Tang, X. (2011). Content-based photo quality assessment. In ICCV. Luo, W., Wang, X., & Tang, X. (2011). Content-based photo quality assessment. In ICCV.
go back to reference Luo, Y., & Tang, X. (2008). Photo and video quality evaluation: Focusing on the subject. In ECCV. Luo, Y., & Tang, X. (2008). Photo and video quality evaluation: Focusing on the subject. In ECCV.
go back to reference Machajdik, J., & Hanbury, A. (2010). Affective image classification using features inspired by psychology and art theory. In ACM MM, New York, NY, USA. Machajdik, J., & Hanbury, A. (2010). Affective image classification using features inspired by psychology and art theory. In ACM MM, New York, NY, USA.
go back to reference Marchesotti, L., & Perronnin, F. (2013). Learning beautiful (and ugly) attributes. In BMVC. Marchesotti, L., & Perronnin, F. (2013). Learning beautiful (and ugly) attributes. In BMVC.
go back to reference Marchesotti, L., Perronnin, F., Larlus, D., & Csurka, G. (2011). Assessing the aesthetic quality of photographs using generic image descriptors. In ICCV. Marchesotti, L., Perronnin, F., Larlus, D., & Csurka, G. (2011). Assessing the aesthetic quality of photographs using generic image descriptors. In ICCV.
go back to reference Müller, H., Clough, P., Deselaers, T., & Caputo, B. (2010). ImageCLEF: Experimental evaluation in visual information retrieval (Vol. 32). Berlin: Springer. Müller, H., Clough, P., Deselaers, T., & Caputo, B. (2010). ImageCLEF: Experimental evaluation in visual information retrieval (Vol. 32). Berlin: Springer.
go back to reference Murray, N., Marchesotti, L., & Perronnin, F. (2012a). AVA: A large-scale database for aesthetic visual analysis. In CVPR. Murray, N., Marchesotti, L., & Perronnin, F. (2012a). AVA: A large-scale database for aesthetic visual analysis. In CVPR.
go back to reference Murray, N., Marchesotti, L., & Perronnin, F. (2012b) Learning to rank images using semantic and aesthetic labels. In BMVC. Murray, N., Marchesotti, L., & Perronnin, F. (2012b) Learning to rank images using semantic and aesthetic labels. In BMVC.
go back to reference Ng, A. Y., Jordan, M. I., & Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. In NIPS. Ng, A. Y., Jordan, M. I., & Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. In NIPS.
go back to reference Obrador, P., Schmidt-Hackenberg, L., & Oliver, N. (2010). The role of image composition in image aesthetics. In ICIP. Obrador, P., Schmidt-Hackenberg, L., & Oliver, N. (2010). The role of image composition in image aesthetics. In ICIP.
go back to reference Obrador, P., Saad, M., Suryanarayan, P., & Oliver, N. (2012). Towards category-based aesthetic models of photographs. Advances in Multimedia Modeling, pp. 63–76. Obrador, P., Saad, M., Suryanarayan, P., & Oliver, N. (2012). Towards category-based aesthetic models of photographs. Advances in Multimedia Modeling, pp. 63–76.
go back to reference Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. In IJCV.
go back to reference Orendovici, R., & Wang, J. (2010). Training data collection system for a learning-based photographic aesthetic quality inference engine. In ACM-MM. Orendovici, R., & Wang, J. (2010). Training data collection system for a learning-based photographic aesthetic quality inference engine. In ACM-MM.
go back to reference Pang, B., Lee, L., & Vaithyanathan, S. (2012). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing. Pang, B., Lee, L., & Vaithyanathan, S. (2012). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing.
go back to reference Parikh, D., & Grauman, K. (2011a). Interactively building a discriminative vocabulary of nameable attributes. In CVPR. Parikh, D., & Grauman, K. (2011a). Interactively building a discriminative vocabulary of nameable attributes. In CVPR.
go back to reference Parikh, D., & Grauman, K. (2011b). Relative attributes. In ICCV. Parikh, D., & Grauman, K. (2011b). Relative attributes. In ICCV.
go back to reference Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR. Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.
go back to reference Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV. Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.
go back to reference Riloff, E., Patwardhan, S., & Wiebe, J., et al. (2006). Feature subsumption for opinion analysis. In Proceedings of the 2006 conference on empirical methods in natural language processing. Riloff, E., Patwardhan, S., & Wiebe, J., et al. (2006). Feature subsumption for opinion analysis. In Proceedings of the 2006 conference on empirical methods in natural language processing.
go back to reference Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where-and why? Semantic relatedness for knowledge transfer. In CVPR. Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where-and why? Semantic relatedness for knowledge transfer. In CVPR.
go back to reference Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Russell, J. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
go back to reference San Pedro, J., Yeh, T., & Oliver, N. (2012). Leveraging user comments for aesthetic aware image search reranking. In WWW. San Pedro, J., Yeh, T., & Oliver, N. (2012). Leveraging user comments for aesthetic aware image search reranking. In WWW.
go back to reference Shelley, J. (2012a). 18th century british aesthetics. In: E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, summer 2012 edn. Shelley, J. (2012a). 18th century british aesthetics. In: E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, summer 2012 edn.
go back to reference Shelley, J. (2012b). The concept of the aesthetic. In: E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, spring 2012 edn. Shelley, J. (2012b). The concept of the aesthetic. In: E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy, spring 2012 edn.
go back to reference Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
go back to reference Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288.
go back to reference Wang, J., Markert, K., & Everingham, M. (2009). Learning models for object recognition from natural language descriptions. In BMVC. Wang, J., Markert, K., & Everingham, M. (2009). Learning models for object recognition from natural language descriptions. In BMVC.
go back to reference Yanai, K., & Barnard, K. (2005). Image region entropy: A measure of visualness of web images associated with one concept. In ACM-MM. Yanai, K., & Barnard, K. (2005). Image region entropy: A measure of visualness of web images associated with one concept. In ACM-MM.
go back to reference Yao, L., Suryanarayan, P., Qiao, M., Wang, J., & Li, J. (2012). On-site composition and aesthetics feedback through exemplars for photographers. In IJCV, Oscar. Yao, L., Suryanarayan, P., Qiao, M., Wang, J., & Li, J. (2012). On-site composition and aesthetics feedback through exemplars for photographers. In IJCV, Oscar.
go back to reference Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67, 301–320. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67, 301–320.
Metadata
Title
Discovering Beautiful Attributes for Aesthetic Image Analysis
Authors
Luca Marchesotti
Naila Murray
Florent Perronnin
Publication date
01-07-2015
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 3/2015
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0789-2

Other articles of this Issue 3/2015

International Journal of Computer Vision 3/2015 Go to the issue

Premium Partner