Skip to main content
Top
Published in: Multimedia Systems 5/2023

01-08-2023 | Regular Paper

Image aesthetics assessment using composite features from transformer and CNN

Authors: Yongzhen Ke, Yin Wang, Kai Wang, Fan Qin, Jing Guo, Shuai Yang

Published in: Multimedia Systems | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As a popular research problem in computational aesthetics, image aesthetic assessment has many important applications in image editing, retrieval, and recommendation. However, the existing mainstream CNN-based image aesthetic assessment methods are difficult to obtain the global aesthetic attributes of images well. To this end, we propose a two-stream image aesthetic assessment model that couples Transformer and CNN features. We use the traditional CNN network to extract the image’s local aesthetic feature in the first stream, apply the superpixel algorithm to segment the image, and then feed the segmented image region into the Transformer network to learn the image’s aesthetic global features in the second stream. Finally, the features learned by Transformer and CNN are fused to achieve the image aesthetic assessment. The experimental results on the AVA dataset show that our proposed method can obtain local and global aesthetic information on images, which enables the model to learn richer aesthetic information, and the combination of whole and part is more in line with human aesthetic characteristics. Our proposed model achieves an accuracy of 84.5% in the classification task, achieving optimal performance compared to existing methods and good performance in the other two tasks (Score Regression and Distribution).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In Presented at the NIPS June 12 (2017) Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In Presented at the NIPS June 12 (2017)
4.
go back to reference Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv. (2020) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv. (2020)
5.
go back to reference Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv (2020) Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv (2020)
10.
go back to reference Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017) Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)
11.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR. (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR. (2014)
12.
26.
go back to reference Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., Qiao, Y.: UniFormer: unified transformer for efficient spatiotemporal representation learning. arXiv (2022) Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., Qiao, Y.: UniFormer: unified transformer for efficient spatiotemporal representation learning. arXiv (2022)
32.
34.
go back to reference Ma, S., Liu, J., Chen, C.W.: A-lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 722–731 (2017). https://doi.org/10.1109/CVPR.2017.84 Ma, S., Liu, J., Chen, C.W.: A-lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 722–731 (2017). https://​doi.​org/​10.​1109/​CVPR.​2017.​84
38.
go back to reference Lee, J.-T., Kim, C.-S.: Image aesthetic assessment based on pairwise comparison—a unified approach to score regression, binary classification, and personalization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1191–1200 (2019). https://doi.org/10.1109/ICCV.2019.00128 Lee, J.-T., Kim, C.-S.: Image aesthetic assessment based on pairwise comparison—a unified approach to score regression, binary classification, and personalization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1191–1200 (2019). https://​doi.​org/​10.​1109/​ICCV.​2019.​00128
40.
go back to reference Murray, N., Gordo, A.: A deep architecture for unified aesthetic prediction. arXiv (2017) Murray, N., Gordo, A.: A deep architecture for unified aesthetic prediction. arXiv (2017)
41.
Metadata
Title
Image aesthetics assessment using composite features from transformer and CNN
Authors
Yongzhen Ke
Yin Wang
Kai Wang
Fan Qin
Jing Guo
Shuai Yang
Publication date
01-08-2023
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 5/2023
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-023-01141-7

Other articles of this Issue 5/2023

Multimedia Systems 5/2023 Go to the issue