Skip to main content
Erschienen in: International Journal of Computer Vision 5/2021

22.02.2021

Visual Interestingness Prediction: A Benchmark Framework and Literature Review

verfasst von: Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu, Ngoc Q. K. Duong, Claire-Héléne Demarty, Mats Sjöberg

Erschienen in: International Journal of Computer Vision | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we report on the creation of a publicly available, common evaluation framework for image and video visual interestingness prediction. We propose a robust data set, the Interestingness10k, with 9831 images and more than 4 h of video, interestigness scores determined based on more than 1M pair-wise annotations of 800 trusted annotators, some pre-computed multi-modal descriptors, and 192 system output results as baselines. The data were validated extensively during the 2016–2017 MediaEval benchmark campaigns. We provide an in-depth analysis of the crucial components of visual interestingness prediction algorithms by reviewing the capabilities and the evolution of the MediaEval benchmark systems, as well as of prominent systems from the literature. We discuss overall trends, influence of the employed features and techniques, generalization capabilities and the reliability of results. We also discuss the possibility of going beyond state-of-the-art performance via an automatic, ad-hoc system fusion, and propose a deep MLP-based architecture that outperforms the current state-of-the-art systems by a large margin. Finally, we provide the most important lessons learned and insights gained.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdi H.(2007). “The kendall rank correlation coefficient,” Encyclopedia of measurement and statistics. Sage, pp. 508–510. Abdi H.(2007). “The kendall rank correlation coefficient,” Encyclopedia of measurement and statistics. Sage, pp. 508–510.
Zurück zum Zitat Ahmed, O. B., Wacker, J., Gaballo, A., & Huet, B. (2017). Eurecom@mediaeval 2017: Media genre inference for predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15., (Vol. 1984), CEUR-WS.org. Ahmed, O. B., Wacker, J., Gaballo, A., & Huet, B. (2017). Eurecom@mediaeval 2017: Media genre inference for predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15., (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Almeida, J. (2016) UNIFESP at mediaeval 2016: Predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org. Almeida, J. (2016) UNIFESP at mediaeval 2016: Predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Almeida, J., & Savii, R. M. (2017). GIBIS at mediaeval 2017: Predicting media interestingness task. In: MediaEval workshop, Dublin, Ireland, September 13-15., (Vol. 1984), CEUR-WS.org. Almeida, J., & Savii, R. M. (2017). GIBIS at mediaeval 2017: Predicting media interestingness task. In: MediaEval workshop, Dublin, Ireland, September 13-15., (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Almeida, J., Leite, N. J., & Torres, R. d. S. (2011). Comparison of video sequences with histograms of motion patterns. In 18th IEEE international conference on image processing, pp. 3673–3676, IEEE. Almeida, J., Leite, N. J., & Torres, R. d. S. (2011). Comparison of video sequences with histograms of motion patterns. In 18th IEEE international conference on image processing, pp. 3673–3676, IEEE.
Zurück zum Zitat Almeida, J., Valem, L. P., & Pedronette, D. C. (2017) A rank aggregation framework for video interestingness prediction. In International conference on image analysis and processing, pp. 3–14, Springer. Almeida, J., Valem, L. P., & Pedronette, D. C. (2017) A rank aggregation framework for video interestingness prediction. In International conference on image analysis and processing, pp. 3–14, Springer.
Zurück zum Zitat Awad, G., Over, P., & Kraaij, W. (2014). Content-based video copy detection benchmarking at trecvid. ACM Transactions on Information Systems (TOIS), 32(3), 14.CrossRef Awad, G., Over, P., & Kraaij, W. (2014). Content-based video copy detection benchmarking at trecvid. ACM Transactions on Information Systems (TOIS), 32(3), 14.CrossRef
Zurück zum Zitat Aytar, Y., Vondrick, C., & Torralba, A. (2016). Soundnet: Learning sound representations from unlabeled video, In Advances in neural information processing systems 29: annual conference on neural information processing systems, December 5–10 (pp. 892–900). Spain: Barcelona. Aytar, Y., Vondrick, C., & Torralba, A. (2016). Soundnet: Learning sound representations from unlabeled video, In Advances in neural information processing systems 29: annual conference on neural information processing systems, December 5–10 (pp. 892–900). Spain: Barcelona.
Zurück zum Zitat Bakhshi, S., Shamma, D. A., Kennedy, L., Song, Y., De Juan, P., & Kaye, J. (2016) Fast, cheap, and good: Why animated gifs engage us. In Proceedings of the chi conference on human factors in computing systems, pp. 575–586, ACM Bakhshi, S., Shamma, D. A., Kennedy, L., Song, Y., De Juan, P., & Kaye, J. (2016) Fast, cheap, and good: Why animated gifs engage us. In Proceedings of the chi conference on human factors in computing systems, pp. 575–586, ACM
Zurück zum Zitat Berlyne, D. E. (1949). Interest as a psychological concept. British Journal of Psychology. General Section, 39(4), 184–195.CrossRef Berlyne, D. E. (1949). Interest as a psychological concept. British Journal of Psychology. General Section, 39(4), 184–195.CrossRef
Zurück zum Zitat Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill Book Company.CrossRef Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill Book Company.CrossRef
Zurück zum Zitat Berlyne, D. E. (1970). Novelty, complexity, and hedonic value. Perception & Psychophysics, 8(5), 279–286.CrossRef Berlyne, D. E. (1970). Novelty, complexity, and hedonic value. Perception & Psychophysics, 8(5), 279–286.CrossRef
Zurück zum Zitat Berson, E., Demarty, C., & Duong, N. Q. K. (2017). Multimodality and deep learning when predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org. Berson, E., Demarty, C., & Duong, N. Q. K. (2017). Multimodality and deep learning when predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Borth, D., Chen, T., Ji, R., & Chang S.-F. (2013). Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the 21st ACM international conference on Multimedia, pp. 459–460, ACM. Borth, D., Chen, T., Ji, R., & Chang S.-F. (2013). Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the 21st ACM international conference on Multimedia, pp. 459–460, ACM.
Zurück zum Zitat Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika, 39(3–4), 324–345.MathSciNetMATH Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika, 39(3–4), 324–345.MathSciNetMATH
Zurück zum Zitat Buckley, C., & Voorhees, E. M. (2017). Evaluating evaluation measure stability. SIGIR Forum, 51(2), 235–242.CrossRef Buckley, C., & Voorhees, E. M. (2017). Evaluating evaluation measure stability. SIGIR Forum, 51(2), 235–242.CrossRef
Zurück zum Zitat Carballal, A., Fernandez-Lozano, C., Heras, J., & Romero, J. (2019). Transfer learning features for predicting aesthetics through a novel hybrid machine learning method. Neural Computing and Applications, 1–12. Carballal, A., Fernandez-Lozano, C., Heras, J., & Romero, J. (2019). Transfer learning features for predicting aesthetics through a novel hybrid machine learning method. Neural Computing and Applications, 1–12.
Zurück zum Zitat Chamaret, C., Demarty, C.-H., Demoulin, V., & Marquant, G. (2016). Experiencing the interestingness concept within and between pictures. Electronic Imaging, 2016(16), 1–12.CrossRef Chamaret, C., Demarty, C.-H., Demoulin, V., & Marquant, G. (2016). Experiencing the interestingness concept within and between pictures. Electronic Imaging, 2016(16), 1–12.CrossRef
Zurück zum Zitat Constantin, M. G., Boteanu, B. A., & Ionescu, B. (2017). Lapi at mediaeval 2017-predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org. Constantin, M. G., Boteanu, B. A., & Ionescu, B. (2017). Lapi at mediaeval 2017-predicting media interestingness. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Constantin, M. G., Redi, M., Zen, G., & Ionescu, B. (2019). Computational understanding of visual interestingness beyond semantics: Literature survey and analysis of covariates. ACM Computing Surveys. Constantin, M. G., Redi, M., Zen, G., & Ionescu, B. (2019). Computational understanding of visual interestingness beyond semantics: Literature survey and analysis of covariates. ACM Computing Surveys.
Zurück zum Zitat Constantin, M. G., & Ionescu, B. (2017). Content description for predicting image interestingness. In 2017 international symposium on signals, circuits and systems (ISSCS), pp. 1–4, IEEE, 13–14 July. Constantin, M. G., & Ionescu, B. (2017). Content description for predicting image interestingness. In 2017 international symposium on signals, circuits and systems (ISSCS), pp. 1–4, IEEE, 13–14 July.
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In International conference on computer vision & pattern recognition, (Vol. 1), pp. 886–893, IEEE Computer Society. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In International conference on computer vision & pattern recognition, (Vol. 1), pp. 886–893, IEEE Computer Society.
Zurück zum Zitat Danelljan, M., Häger, G., Khan, F., Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In British machine vision conference, nottingham, September 1-5, BMVA Press. Danelljan, M., Häger, G., Khan, F., Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In British machine vision conference, nottingham, September 1-5, BMVA Press.
Zurück zum Zitat Datta, R., Joshi, D., Li, J., & Wang, J.Z. (2006). Studying aesthetics in photographic images using a computational approach. In European conference on computer vision, pp. 288–301, Springer. Datta, R., Joshi, D., Li, J., & Wang, J.Z. (2006). Studying aesthetics in photographic images using a computational approach. In European conference on computer vision, pp. 288–301, Springer.
Zurück zum Zitat Demarty, C.-H., Sjöberg, M., Constantin, M. G., Duong, N. Q., Ionescu, B., Do, T.-T., & Wang, H. (2017). Predicting interestingness of visual content. In Visual content indexing and retrieval with psycho-visual models, pp. 233–265, Cham: Springer. Demarty, C.-H., Sjöberg, M., Constantin, M. G., Duong, N. Q., Ionescu, B., Do, T.-T., & Wang, H. (2017). Predicting interestingness of visual content. In Visual content indexing and retrieval with psycho-visual models, pp. 233–265, Cham: Springer.
Zurück zum Zitat Demarty, C.-H., Sjöberg, M., Ionescu, B., Do, T.-T., Gygli, M., & Duong, N. Q. K. (2017). Mediaeval 2017 predicting media interestingness task. In MediaEval Workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org. Demarty, C.-H., Sjöberg, M., Ionescu, B., Do, T.-T., Gygli, M., & Duong, N. Q. K. (2017). Mediaeval 2017 predicting media interestingness task. In MediaEval Workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Demarty, C.-H., Sjöberg, M., Ionescu, B., Do, T.-T., Wang, H., Duong, N. Q. K., & Lefebvre, F. (2016). Mediaeval 2016 predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org Demarty, C.-H., Sjöberg, M., Ionescu, B., Do, T.-T., Wang, H., Duong, N. Q. K., & Lefebvre, F. (2016). Mediaeval 2016 predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org
Zurück zum Zitat Deselaers, T., Deserno, T. M., & Müller, H. (2008). Automatic medical image annotation in imageclef 2007: Overview, results, and discussion. Pattern Recognition Letters, 29(15), 1988–1995.CrossRef Deselaers, T., Deserno, T. M., & Müller, H. (2008). Automatic medical image annotation in imageclef 2007: Overview, results, and discussion. Pattern Recognition Letters, 29(15), 1988–1995.CrossRef
Zurück zum Zitat Erdogan, G., Erdem, A., & Erdem, E. (2016). HUCVL at mediaeval 2016: Predicting interesting key frames with deep models. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org. Erdogan, G., Erdem, A., & Erdem, E. (2016). HUCVL at mediaeval 2016: Predicting interesting key frames with deep models. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
Zurück zum Zitat Eyben, F., Wöllmer, M., & Schuller, B. (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia, pp. 1459–1462, ACM. Eyben, F., Wöllmer, M., & Schuller, B. (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia, pp. 1459–1462, ACM.
Zurück zum Zitat Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetMATHCrossRef Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetMATHCrossRef
Zurück zum Zitat Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189–1232. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189–1232.
Zurück zum Zitat Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 12046–12055. Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 12046–12055.
Zurück zum Zitat Goyal, R., Kahou, S. E., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., & Mueller-Freitag, M. et al., (2017). The something something video database for learning and evaluating visual common sense. In ICCV, (Vol. 1), p. 5 Goyal, R., Kahou, S. E., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., & Mueller-Freitag, M. et al., (2017). The something something video database for learning and evaluating visual common sense. In ICCV, (Vol. 1), p. 5
Zurück zum Zitat Grabner, H., Nater, F., Druey, M., & Van Gool, L. (2013). Visual interestingness in image sequences. In Proceedings of the 21st ACM international conference on Multimedia, pp. 1017–1026, ACM. Grabner, H., Nater, F., Druey, M., & Van Gool, L. (2013). Visual interestingness in image sequences. In Proceedings of the 21st ACM international conference on Multimedia, pp. 1017–1026, ACM.
Zurück zum Zitat Gygli, M., & Soleymani, M. (2016). Analyzing and predicting gif interestingness. In Proceedings of the 24th ACM international conference on Multimedia, pp. 122–126, ACM. Gygli, M., & Soleymani, M. (2016). Analyzing and predicting gif interestingness. In Proceedings of the 24th ACM international conference on Multimedia, pp. 122–126, ACM.
Zurück zum Zitat Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., & Van Gool, L. (2013) The interestingness of images. In Proceedings of the IEEE international conference on computer vision, pp. 1633–1640, IEEE. Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., & Van Gool, L. (2013) The interestingness of images. In Proceedings of the IEEE international conference on computer vision, pp. 1633–1640, IEEE.
Zurück zum Zitat Gygli, M., Song, Y., & Cao, L. (2016). Video2gif: Automatic generation of animated gifs from video,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1001–1009, IEEE. Gygli, M., Song, Y., & Cao, L. (2016). Video2gif: Automatic generation of animated gifs from video,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1001–1009, IEEE.
Zurück zum Zitat Han, S., Meng, Z., Khan, A.-S., & Tong, Y. (2016). Incremental boosting convolutional neural network for facial action unit recognition. In Advances in neural information processing systems, 109–117. Han, S., Meng, Z., Khan, A.-S., & Tong, Y. (2016). Incremental boosting convolutional neural network for facial action unit recognition. In Advances in neural information processing systems, 109–117.
Zurück zum Zitat Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89.CrossRef Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89.CrossRef
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
Zurück zum Zitat Hidi, S., & Anderson, V. (1992). Situational interest and its impact on reading and expository writing. The Role of Interest in Learning and Development, 11, 213–214. Hidi, S., & Anderson, V. (1992). Situational interest and its impact on reading and expository writing. The Role of Interest in Learning and Development, 11, 213–214.
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef
Zurück zum Zitat Hsieh, L.-C., Hsu, W. H., & Wang, H.-C. (2014). Investigating and predicting social and visual image interestingness on social media by crowdsourcing. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4309–4313, IEEE. Hsieh, L.-C., Hsu, W. H., & Wang, H.-C. (2014). Investigating and predicting social and visual image interestingness on social media by crowdsourcing. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4309–4313, IEEE.
Zurück zum Zitat Hua, X.-S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., & Li, J. (2013). Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the 21st ACM international conference on Multimedia, pp. 243–252. Hua, X.-S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., & Li, J. (2013). Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the 21st ACM international conference on Multimedia, pp. 243–252.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678, ACM. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678, ACM.
Zurück zum Zitat Jiang, Y.-G., Wang, Y., Feng, R., Xue, X., Zheng, Y., & Yang, H. (2013). Understanding and predicting interestingness of videos. In Twenty-Seventh AAAI conference on artificial intelligence, pp. 1–7. Jiang, Y.-G., Wang, Y., Feng, R., Xue, X., Zheng, Y., & Yang, H. (2013). Understanding and predicting interestingness of videos. In Twenty-Seventh AAAI conference on artificial intelligence, pp. 1–7.
Zurück zum Zitat Jiang, Y.-G., Dai, Q., Mei, T., Rui, Y., & Chang, S.-F. (2015). Super fast event recognition in internet videos. IEEE Transactions on Multimedia, 17(8), 1174–1186.CrossRef Jiang, Y.-G., Dai, Q., Mei, T., Rui, Y., & Chang, S.-F. (2015). Super fast event recognition in internet videos. IEEE Transactions on Multimedia, 17(8), 1174–1186.CrossRef
Zurück zum Zitat Kalpathy-Cramer, J., de Herrera, A. G. S., Demner-Fushman, D., Antani, S., Bedrick, S., & Müller, H. (2015). Evaluating performance of biomedical image retrieval systems-an overview of the medical image retrieval task at imageclef 2004–2013. Computerized Medical Imaging and Graphics, 39, 55–61.CrossRef Kalpathy-Cramer, J., de Herrera, A. G. S., Demner-Fushman, D., Antani, S., Bedrick, S., & Müller, H. (2015). Evaluating performance of biomedical image retrieval systems-an overview of the medical image retrieval task at imageclef 2004–2013. Computerized Medical Imaging and Graphics, 39, 55–61.CrossRef
Zurück zum Zitat Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P. et al., (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P. et al., (2017). The kinetics human action video dataset. arXiv preprint arXiv:​1705.​06950.
Zurück zum Zitat Ke, Y., Tang, X., & Jing, F. (2006). The design of high-level features for photo quality assessment. In IEEE computer society conference on computer vision and pattern recognition (Vol. 1), pp. 419–426, IEEE. Ke, Y., Tang, X., & Jing, F. (2006). The design of high-level features for photo quality assessment. In IEEE computer society conference on computer vision and pattern recognition (Vol. 1), pp. 419–426, IEEE.
Zurück zum Zitat Khosla, A., Raju, A. S., Torralba, A., & Oliva, A. (2015). Understanding and predicting image memorability at a large scale. Proceedings of the IEEE international conference on computer vision, 2390–2398. Khosla, A., Raju, A. S., Torralba, A., & Oliva, A. (2015). Understanding and predicting image memorability at a large scale. Proceedings of the IEEE international conference on computer vision, 2390–2398.
Zurück zum Zitat Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, San Diego, CA, USA, May 7-9, conference track proceedings. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, San Diego, CA, USA, May 7-9, conference track proceedings.
Zurück zum Zitat Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). “Unifying visual-semantic embeddings with multimodal neural language models,” arXiv preprint arXiv:1411.2539. Kiros, R., Salakhutdinov, R., & Zemel, R. S. (2014). “Unifying visual-semantic embeddings with multimodal neural language models,” arXiv preprint arXiv:​1411.​2539.
Zurück zum Zitat Kittler, J., Hater, M., Duin, R. P. (1996). Combining classifiers. In Proceedings of 13th international conference on pattern recognition, (Vol. 2), pp. 897–901, IEEE. Kittler, J., Hater, M., Duin, R. P. (1996). Combining classifiers. In Proceedings of 13th international conference on pattern recognition, (Vol. 2), pp. 897–901, IEEE.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 1097–1105. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 1097–1105.
Zurück zum Zitat Lam, V., Do, T., Phan, S., Le, D.-D., Satoh, S., & Duong, D. A. (2016). Nii-uit at mediaeval 2016 predicting media interestingness task. In MediaEval Workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org. Lam, V., Do, T., Phan, S., Le, D.-D., Satoh, S., & Duong, D. A. (2016). Nii-uit at mediaeval 2016 predicting media interestingness task. In MediaEval Workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE computer society conference on computer vision and pattern recognition, (Vol. 2), pp. 2169–2178, IEEE. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE computer society conference on computer vision and pattern recognition, (Vol. 2), pp. 2169–2178, IEEE.
Zurück zum Zitat Li, J., Barkowsky, M., & Callet, P. L. (2013). Boosting paired comparison methodology in measuring visual discomfort of 3dtv: performances of three different designs. In Proceedings of SPIE electronic imaging, stereoscopic displays and applications (Vol. 8648). Li, J., Barkowsky, M., & Callet, P. L. (2013). Boosting paired comparison methodology in measuring visual discomfort of 3dtv: performances of three different designs. In Proceedings of SPIE electronic imaging, stereoscopic displays and applications (Vol. 8648).
Zurück zum Zitat Li, X., Huo, Y., Jin, Q., & Xu, J. (2016). Detecting violence in video using subclasses. In Proceedings of the 2016 ACM conference on multimedia conference, MM 2016, pp. 586–590, ACM, October 15-19. Li, X., Huo, Y., Jin, Q., & Xu, J. (2016). Detecting violence in video using subclasses. In Proceedings of the 2016 ACM conference on multimedia conference, MM 2016, pp. 586–590, ACM, October 15-19.
Zurück zum Zitat Li, C., & Chen, T. (2009). Aesthetic visual quality assessment of paintings. IEEE Journal of Selected Topics in Signal Processing, 3(2), 236–252.CrossRef Li, C., & Chen, T. (2009). Aesthetic visual quality assessment of paintings. IEEE Journal of Selected Topics in Signal Processing, 3(2), 236–252.CrossRef
Zurück zum Zitat Liem, C. (2016). “TUD-MMC at mediaeval 2016: Predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org. Liem, C. (2016). “TUD-MMC at mediaeval 2016: Predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Liu, Y., Gu, Z., & Ko, T. H. (2017). Predicting media interestingness via biased discriminant embedding and supervised manifold regression. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org. Liu, Y., Gu, Z., & Ko, T. H. (2017). Predicting media interestingness via biased discriminant embedding and supervised manifold regression. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Liu, Y., Gu, Z., Ko, T. H., & Hua, K. A. (2018). Learning perceptual embeddings with two related tasks for joint predictions of media interestingness and emotions. In Proceedings of the ACM on international conference on multimedia retrieval, pp. 420–427, ACM. Liu, Y., Gu, Z., Ko, T. H., & Hua, K. A. (2018). Learning perceptual embeddings with two related tasks for joint predictions of media interestingness and emotions. In Proceedings of the ACM on international conference on multimedia retrieval, pp. 420–427, ACM.
Zurück zum Zitat Liu, F., Niu, Y., & Gleicher M. (2009). Using web photos for measuring video frame interestingness. In Twenty-First international joint conference on artificial intelligence. Liu, F., Niu, Y., & Gleicher M. (2009). Using web photos for measuring video frame interestingness. In Twenty-First international joint conference on artificial intelligence.
Zurück zum Zitat Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., et al. (2018). Progressive neural architecture search. In Proceedings of the European conference on computer vision (ECCV), 19–34. Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., et al. (2018). Progressive neural architecture search. In Proceedings of the European conference on computer vision (ECCV), 19–34.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, pp. 50–60. Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, pp. 50–60.
Zurück zum Zitat McCrae, R. R. (2007). Aesthetic chills as a universal marker of openness to experience. Motivation and Emotion, 31(1), 5–11.CrossRef McCrae, R. R. (2007). Aesthetic chills as a universal marker of openness to experience. Motivation and Emotion, 31(1), 5–11.CrossRef
Zurück zum Zitat Mo, S., Niu, J., Su, Y., & Das, S. K. (2018). A novel feature set for video emotion recognition. Neurocomputing, 291, 11–20.CrossRef Mo, S., Niu, J., Su, Y., & Das, S. K. (2018). A novel feature set for video emotion recognition. Neurocomputing, 291, 11–20.CrossRef
Zurück zum Zitat Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 7, 971–987.MATHCrossRef Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 7, 971–987.MATHCrossRef
Zurück zum Zitat Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.MATHCrossRef Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.MATHCrossRef
Zurück zum Zitat Opitz, M., Waltner, G., Possegger, H., & Bischof, H. (2017). Bier-boosting independent embeddings robustly. In Proceedings of the IEEE international conference on computer vision, 5189–5198. Opitz, M., Waltner, G., Possegger, H., & Bischof, H. (2017). Bier-boosting independent embeddings robustly. In Proceedings of the IEEE international conference on computer vision, 5189–5198.
Zurück zum Zitat Ovadia, S. (2004). Ratings and rankings: reconsidering the structure of values and their measurement. International Journal of Social Research Methodology, 7(5), 403–414.CrossRef Ovadia, S. (2004). Ratings and rankings: reconsidering the structure of values and their measurement. International Journal of Social Research Methodology, 7(5), 403–414.CrossRef
Zurück zum Zitat Parekh, J., Tibrewal, H., & Parekh, S. (2018). Deep pairwise classification and ranking for predicting media interestingness. In Proceedings of the 2018 ACM on international conference on multimedia retrieval, ICMR, Yokohama, Japan, June 11-14., pp. 428–433, ACM. Parekh, J., Tibrewal, H., & Parekh, S. (2018). Deep pairwise classification and ranking for predicting media interestingness. In Proceedings of the 2018 ACM on international conference on multimedia retrieval, ICMR, Yokohama, Japan, June 11-14., pp. 428–433, ACM.
Zurück zum Zitat Permadi, R. A., Putra, S. G. P., Helmiriawan, & Liem C. C. S. (2017). DUT-MMSR at mediaeval 2017: Predicting media interestingness task. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org. Permadi, R. A., Putra, S. G. P., Helmiriawan, & Liem C. C. S. (2017). DUT-MMSR at mediaeval 2017: Predicting media interestingness task. In MediaEval workshop, Dublin, Ireland, September 13-15. (Vol. 1984), CEUR-WS.org.
Zurück zum Zitat Poignant, J., Bredin, H., & Barras, C. (2017). Multimodal person discovery in broadcast tv: lessons learned from mediaeval 2015. Multimedia Tools and Applications, 76(21), 22547–22567.CrossRef Poignant, J., Bredin, H., & Barras, C. (2017). Multimodal person discovery in broadcast tv: lessons learned from mediaeval 2015. Multimedia Tools and Applications, 76(21), 22547–22567.CrossRef
Zurück zum Zitat Randolph, J. J. (2005). “Free-marginal multirater kappa (multirater k free): an alternative to fleiss’ fixed-marginal multirater kappa”, In Joensuu learning and instruction symposium. Finland: Joensuu. Randolph, J. J. (2005). “Free-marginal multirater kappa (multirater k free): an alternative to fleiss’ fixed-marginal multirater kappa”, In Joensuu learning and instruction symposium. Finland: Joensuu.
Zurück zum Zitat Rayatdoost, S., & Soleymani, M. (2016). Ranking images and videos on visual interestingness by visual sentiment features. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org. Rayatdoost, S., & Soleymani, M. (2016). Ranking images and videos on visual interestingness by visual sentiment features. In MediaEval workshop, Hilversum, The Netherlands, October 20-21. (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.MathSciNetCrossRef
Zurück zum Zitat Salesses, P., Schechtner, K., & Hidalgo, C. A. (2013). The collaborative image of the city: mapping the inequality of urban perception PloS one 8(7). Salesses, P., Schechtner, K., & Hidalgo, C. A. (2013). The collaborative image of the city: mapping the inequality of urban perception PloS one 8(7).
Zurück zum Zitat Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: effort, sensitivity, and reliability. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 162–169, ACM, August 15-19. Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: effort, sensitivity, and reliability. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp. 162–169, ACM, August 15-19.
Zurück zum Zitat Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626.
Zurück zum Zitat Shen, Y., Demarty, C.-H., & Duong, N. Q. K. (2017). Deep learning for multimodal-based video interestingness prediction. In IEEE international conference on multimedia and expo (ICME), pp. 1003–1008, IEEE. Shen, Y., Demarty, C.-H., & Duong, N. Q. K. (2017). Deep learning for multimodal-based video interestingness prediction. In IEEE international conference on multimedia and expo (ICME), pp. 1003–1008, IEEE.
Zurück zum Zitat Shen, Y., Demarty, C., Duong, N. Q. K. (2016). Technicolor@mediaeval 2016 predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21., (Vol. 1739), CEUR-WS.org. Shen, Y., Demarty, C., Duong, N. Q. K. (2016). Technicolor@mediaeval 2016 predicting media interestingness task. In MediaEval workshop, Hilversum, The Netherlands, October 20-21., (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Silvia, P. J. (2005). What is interesting? exploring the appraisal structure of interest. Emotion, 5(1), 89.CrossRef Silvia, P. J. (2005). What is interesting? exploring the appraisal structure of interest. Emotion, 5(1), 89.CrossRef
Zurück zum Zitat Silvia, P. J. (2009). Looking past pleasure: anger, confusion, disgust, pride, surprise, and other unusual aesthetic emotions. Psychology of Aesthetics, Creativity, and the Arts, 3(1), 48.MathSciNetCrossRef Silvia, P. J. (2009). Looking past pleasure: anger, confusion, disgust, pride, surprise, and other unusual aesthetic emotions. Psychology of Aesthetics, Creativity, and the Arts, 3(1), 48.MathSciNetCrossRef
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556.
Zurück zum Zitat Sivaraman, K., & Somappa, G. (2016). Moviescope: Movie trailer classification using deep neural networks. University of Virginia. Sivaraman, K., & Somappa, G. (2016). Moviescope: Movie trailer classification using deep neural networks. University of Virginia.
Zurück zum Zitat Smeaton, A. F., Over, P., & Doherty, A. R. (2010). Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4), 411–418.CrossRef Smeaton, A. F., Over, P., & Doherty, A. R. (2010). Video shot boundary detection: Seven years of trecvid activity. Computer Vision and Image Understanding, 114(4), 411–418.CrossRef
Zurück zum Zitat Soleymani, M. (2015) The quest for visual interest. In Proceedings of the 23rd ACM international conference on multimedia, pp. 919–922, ACM. Soleymani, M. (2015) The quest for visual interest. In Proceedings of the 23rd ACM international conference on multimedia, pp. 919–922, ACM.
Zurück zum Zitat Son, J., Jung, I., Park, K., & Han, B. (2015). Tracking-by-segmentation with online gradient boosting decision tree. In Proceedings of the IEEE international conference on computer vision, 3056–3064. Son, J., Jung, I., Park, K., & Han, B. (2015). Tracking-by-segmentation with online gradient boosting decision tree. In Proceedings of the IEEE international conference on computer vision, 3056–3064.
Zurück zum Zitat Springenberg, J. T., Dosovitskiy, A., Brox, T., Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806. Springenberg, J. T., Dosovitskiy, A., Brox, T., Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:​1412.​6806.
Zurück zum Zitat Squalli-Houssaini, H., Duong, N. Q. K., Gwenaëlle, M., & Demarty, C.-H. (2018). Deep learning for predicting image memorability. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2371–2375, IEEE Squalli-Houssaini, H., Duong, N. Q. K., Gwenaëlle, M., & Demarty, C.-H. (2018). Deep learning for predicting image memorability. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2371–2375, IEEE
Zurück zum Zitat Sudhakaran, S., Escalera, S., & Lanz, O. (2020). Gate-shift networks for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Sudhakaran, S., Escalera, S., & Lanz, O. (2020). Gate-shift networks for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Touvron, H., Vedaldi, A., Douze, M., & Jégou, H. (2019). Fixing the train-test resolution discrepancy. Advances in Neural Information Processing Systems, 8250–8260. Touvron, H., Vedaldi, A., Douze, M., & Jégou, H. (2019). Fixing the train-test resolution discrepancy. Advances in Neural Information Processing Systems, 8250–8260.
Zurück zum Zitat Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, IEEE. Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, IEEE.
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE international conference on computer vision, 5552–5561. Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE international conference on computer vision, 5552–5561.
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 6450–6459. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 6450–6459.
Zurück zum Zitat Urbano, J., Marrero, M., & Martín, D. (2013) On the measurement of test collection reliability. In The 36th International ACM SIGIR conference on research and development in information retrieval, pp. 393–402, ACM, July 28 - August 1. Urbano, J., Marrero, M., & Martín, D. (2013) On the measurement of test collection reliability. In The 36th International ACM SIGIR conference on research and development in information retrieval, pp. 393–402, ACM, July 28 - August 1.
Zurück zum Zitat Vasudevan, A. B., Gygli, M., Volokitin, A., & Van Gool, L. (2016). Eth-cvl@ mediaeval 2016: Textual-visual embeddings and video2gif for video interestingness. In MediaEval workshop, Hilversum, The Netherlands, October 20-21., (Vol. 1739), CEUR-WS.org. Vasudevan, A. B., Gygli, M., Volokitin, A., & Van Gool, L. (2016). Eth-cvl@ mediaeval 2016: Textual-visual embeddings and video2gif for video interestingness. In MediaEval workshop, Hilversum, The Netherlands, October 20-21., (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Vigna, S. (2015). A weighted correlation index for rankings with ties. In Proceedings of the 24th international conference on World Wide Web, WWW Eds. A. Gangemi, S. Leonardi, and A. Panconesi, pp. 1166–1176, ACM, May 18-22. Vigna, S. (2015). A weighted correlation index for rankings with ties. In Proceedings of the 24th international conference on World Wide Web, WWW Eds. A. Gangemi, S. Leonardi, and A. Panconesi, pp. 1166–1176, ACM, May 18-22.
Zurück zum Zitat Voorhees, E. M. (1998). Variations in relevance judgments and the measurement of retrieval effectiveness In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. Eds. W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, pp. 315–323, ACM, August 24-28. Voorhees, E. M. (1998). Variations in relevance judgments and the measurement of retrieval effectiveness In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. Eds. W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, pp. 315–323, ACM, August 24-28.
Zurück zum Zitat Wang, S., Chen, S., Zhao, J., & Jin, Q. (2018). Video interestingness prediction based on ranking model. In Proceedings of the joint workshop of the 4th workshop on affective social multimedia computing and first multi-modal affective computing of large-scale multimedia data, ASMMC-MMAC’18, pp. 55–61, ACM. Wang, S., Chen, S., Zhao, J., & Jin, Q. (2018). Video interestingness prediction based on ranking model. In Proceedings of the joint workshop of the 4th workshop on affective social multimedia computing and first multi-modal affective computing of large-scale multimedia data, ASMMC-MMAC’18, pp. 55–61, ACM.
Zurück zum Zitat Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE computer society conference on computer vision and pattern recognition, pp. 3485–3492, IEEE. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE computer society conference on computer vision and pattern recognition, pp. 3485–3492, IEEE.
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1492–1500. Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1492–1500.
Zurück zum Zitat Xu, B., Fu, Y., & Jiang, Y. (2016). Bigvid at mediaeval 2016: Predicting interestingness in images and videos. In MediaEval workshop, Hilversum, The Netherlands, October 20-21 (Vol. 1739), CEUR-WS.org. Xu, B., Fu, Y., & Jiang, Y. (2016). Bigvid at mediaeval 2016: Predicting interestingness in images and videos. In MediaEval workshop, Hilversum, The Netherlands, October 20-21 (Vol. 1739), CEUR-WS.org.
Zurück zum Zitat Yalniz, I. Z., Jégou, H., Chen, K., Paluri, M., & Mahajan, D. (2019). Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546. Yalniz, I. Z., Jégou, H., Chen, K., Paluri, M., & Mahajan, D. (2019). Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:​1905.​00546.
Zurück zum Zitat Yang, Y.-H., & Chen, H. H. (2011). Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 762–774.CrossRef Yang, Y.-H., & Chen, H. H. (2011). Ranking-based emotion recognition for music organization and retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 762–774.CrossRef
Zurück zum Zitat Yannakakis, G. N., & Hallam, J. (2011). Ranking vs. preference: a comparative study of self-reporting In: International conference on affective computing and intelligent interaction, pp. 437–446, Springer. Yannakakis, G. N., & Hallam, J. (2011). Ranking vs. preference: a comparative study of self-reporting In: International conference on affective computing and intelligent interaction, pp. 437–446, Springer.
Metadaten
Titel
Visual Interestingness Prediction: A Benchmark Framework and Literature Review
verfasst von
Mihai Gabriel Constantin
Liviu-Daniel Ştefan
Bogdan Ionescu
Ngoc Q. K. Duong
Claire-Héléne Demarty
Mats Sjöberg
Publikationsdatum
22.02.2021
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 5/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-021-01443-1

Weitere Artikel der Ausgabe 5/2021

International Journal of Computer Vision 5/2021 Zur Ausgabe