Skip to main content
Top
Published in: Multimedia Systems 6/2022

23-04-2022 | Special Issue Article

Data-driven personalisation of television content: a survey

Authors: Lyndon Nixon, Jeremy Foss, Konstantinos Apostolidis, Vasileios Mezaris

Published in: Multimedia Systems | Issue 6/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This survey considers the vision of TV broadcasting where content is personalised and personalisation is data-driven, looks at the AI and data technologies making this possible and surveys the current uptake and usage of those technologies. We examine the current state-of-the-art in standards and best practices for data-driven technologies and identify remaining limitations and gaps for research and innovation. Our hope is that this survey provides an overview of the current state of AI and data-driven technologies for use within broadcasters and media organisations. It also provides a pathway to the needed research and innovation activities to fulfil the vision of data-driven personalisation of TV content.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014) Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014)
2.
go back to reference Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: 2008 15th IEEE International Conference on Image Processing, pp. 45–48. IEEE (2008) Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: 2008 15th IEEE International Conference on Image Processing, pp. 45–48. IEEE (2008)
3.
go back to reference Xiao, Z.-M., Lin, K.-H., Zhou, C.-l., Lin, Q.: Shot segmentation based on HSV color model. J. Xiamen Univ. (Natural Science) 5 (2008) Xiao, Z.-M., Lin, K.-H., Zhou, C.-l., Lin, Q.: Shot segmentation based on HSV color model. J. Xiamen Univ. (Natural Science) 5 (2008)
4.
go back to reference Küçüktunç, O., Güdükbay, U., Ulusoy, Ö.: Fuzzy color histogram-based video segmentation. Comput. Vis. Image Underst. 114(1), 125–134 (2010) Küçüktunç, O., Güdükbay, U., Ulusoy, Ö.: Fuzzy color histogram-based video segmentation. Comput. Vis. Image Underst. 114(1), 125–134 (2010)
5.
go back to reference Baber, J., Afzulpurkar, N., Dailey, M.N., Bakhtyar, M.: Shot boundary detection from videos using entropy and local descriptor. In: 2011 17th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE (2011) Baber, J., Afzulpurkar, N., Dailey, M.N., Bakhtyar, M.: Shot boundary detection from videos using entropy and local descriptor. In: 2011 17th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE (2011)
6.
go back to reference e Santos, A.C.S., Pedrini, H.: Shot boundary detection for video temporal segmentation based on the weber local descriptor. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1310–1315. IEEE (2017) e Santos, A.C.S., Pedrini, H.: Shot boundary detection for video temporal segmentation based on the weber local descriptor. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1310–1315. IEEE (2017)
7.
go back to reference Hassanien, A., Elgharib, M., Selim, A., Bae, S.-H., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017). arXiv:1705.03281 Hassanien, A., Elgharib, M., Selim, A., Bae, S.-H., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017). arXiv:​1705.​03281
8.
go back to reference Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122. IEEE (2018) Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122. IEEE (2018)
10.
go back to reference Souček, T., Lokoč, J.: Transnet v2: an effective deep network architecture for fast shot transition detection (2020). arXiv:2008.04838 Souček, T., Lokoč, J.: Transnet v2: an effective deep network architecture for fast shot transition detection (2020). arXiv:​2008.​04838
11.
go back to reference Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: In Proceedings of the 27th ACM International Conference on Multimedia (MM’19), October 21–25, 2019, Nice, France, pp. 1–9 (2019). https://doi.org/10.1145/3343031.3351046 Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: In Proceedings of the 27th ACM International Conference on Multimedia (MM’19), October 21–25, 2019, Nice, France, pp. 1–9 (2019). https://​doi.​org/​10.​1145/​3343031.​3351046
12.
go back to reference Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019) Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019)
13.
go back to reference Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W.: Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, pp. 577–592 (2018). Springer Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W.: Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, pp. 577–592 (2018). Springer
14.
go back to reference Gushchin, A., Antsiferova, A., Vatolin, D.: Shot boundary detection method based on a new extensive dataset and mixed features (2021). arXiv:2109.01057 Gushchin, A., Antsiferova, A., Vatolin, D.: Shot boundary detection method based on a new extensive dataset and mixed features (2021). arXiv:​2109.​01057
15.
go back to reference Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011) Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)
16.
go back to reference Kishi, R.M., Trojahn, T.H., Goularte, R.: Correlation based feature fusion for the temporal video scene segmentation task. Multimed. Tools Appl. 78(11), 15623–15646 (2019) Kishi, R.M., Trojahn, T.H., Goularte, R.: Correlation based feature fusion for the temporal video scene segmentation task. Multimed. Tools Appl. 78(11), 15623–15646 (2019)
17.
go back to reference Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1199–1202 (2015) Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1199–1202 (2015)
18.
go back to reference Rotman, D., Porat, D., Ashour, G., Barzelay, U.: Optimally grouped deep features using normalized cost for video scene detection. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 187–195 (2018) Rotman, D., Porat, D., Ashour, G., Barzelay, U.: Optimally grouped deep features using normalized cost for video scene detection. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 187–195 (2018)
19.
go back to reference Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: International Conference on Multimedia Modeling, pp. 29–41 (2018). Springer Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: International Conference on Multimedia Modeling, pp. 29–41 (2018). Springer
20.
go back to reference Peleshko, D., Soroka, K.: Research of usage of haar-like features and AdaBoost algorithm in viola-jones method of object detection. In: 2013 12th International Conference on the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 284–286. IEEE (2013) Peleshko, D., Soroka, K.: Research of usage of haar-like features and AdaBoost algorithm in viola-jones method of object detection. In: 2013 12th International Conference on the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 284–286. IEEE (2013)
21.
go back to reference Nguyen, T., Park, E.-A., Han, J., Park, D.-C., Min, S.-Y.: Object detection using scale invariant feature transform. In: Pan, J.-S., Krömer, P., Snášel, V. (eds.) Genetic and Evolutionary Computing, pp. 65–72. Springer, Cham (2014) Nguyen, T., Park, E.-A., Han, J., Park, D.-C., Min, S.-Y.: Object detection using scale invariant feature transform. In: Pan, J.-S., Krömer, P., Snášel, V. (eds.) Genetic and Evolutionary Computing, pp. 65–72. Springer, Cham (2014)
22.
go back to reference Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 297–302. IEEE (2008) Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 297–302. IEEE (2008)
23.
go back to reference Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
24.
go back to reference Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
25.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016) Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
26.
go back to reference Pramanik, A., Pal, S.K., Maiti, J., Mitra, P.: Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. (2021) Pramanik, A., Pal, S.K., Maiti, J., Mitra, P.: Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. (2021)
27.
go back to reference Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences, vol. 1, pp. 186–189. Citeseer (2000) Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences, vol. 1, pp. 186–189. Citeseer (2000)
28.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
29.
go back to reference Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
31.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer, Cham (2016) Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer, Cham (2016)
32.
go back to reference Sanchez, S., Romero, H., Morales, A.: A review: comparison of performance metrics of pretrained models for object detection using the tensorflow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024. IOP Publishing (2020) Sanchez, S., Romero, H., Morales, A.: A review: comparison of performance metrics of pretrained models for object detection using the tensorflow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024. IOP Publishing (2020)
33.
go back to reference Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
34.
go back to reference Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019) Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
35.
go back to reference Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020) Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
36.
go back to reference Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
37.
38.
39.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
40.
go back to reference Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015) Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
41.
go back to reference Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016) Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
42.
go back to reference He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
43.
go back to reference Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 173–190. Springer (2020) Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 173–190. Springer (2020)
44.
go back to reference Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: SeMask: semantically masked transformers for semantic segmentation (2021). arXiv:2112.12782 Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: SeMask: semantically masked transformers for semantic segmentation (2021). arXiv:​2112.​12782
45.
go back to reference Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: scaling up capacity and resolution (2021). arXiv:2111.09883 Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: scaling up capacity and resolution (2021). arXiv:​2111.​09883
46.
go back to reference Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020) Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020)
47.
go back to reference Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimed. Tools Appl. 71(1), 333–347 (2014) Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimed. Tools Appl. 71(1), 333–347 (2014)
48.
go back to reference Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. SIViP 13(7), 1413–1420 (2019) Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. SIViP 13(7), 1413–1420 (2019)
49.
go back to reference Gkalelis, N., Mezaris, V.: Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification. In: International Conference on Multimedia Modeling, pp. 227–238. Springer (2020) Gkalelis, N., Mezaris, V.: Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification. In: International Conference on Multimedia Modeling, pp. 227–238. Springer (2020)
50.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
51.
go back to reference Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 373–378. IEEE (2017) Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 373–378. IEEE (2017)
52.
go back to reference Shamsolmoali, P., Jain, D.K., Zareapoor, M., Yang, J., Alam, M.A.: High-dimensional multimedia classification using deep cnn and extended residual units. Multimed. Tools Appl. 78(17), 23867–23882 (2019) Shamsolmoali, P., Jain, D.K., Zareapoor, M., Yang, J., Alam, M.A.: High-dimensional multimedia classification using deep cnn and extended residual units. Multimed. Tools Appl. 78(17), 23867–23882 (2019)
53.
go back to reference Dai, X., Yin, H., Jha, N.K.: Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing (2020) Dai, X., Yin, H., Jha, N.K.: Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing (2020)
54.
go back to reference Gkalelis, N., Mezaris, V.: Structured pruning of lstms via Eigen analysis and geometric median for mobile multimedia and deep learning applications. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 122–126. IEEE (2020) Gkalelis, N., Mezaris, V.: Structured pruning of lstms via Eigen analysis and geometric median for mobile multimedia and deep learning applications. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 122–126. IEEE (2020)
55.
go back to reference Chiodino, E., Di Luccio, D., Lieto, A., Messina, A., Pozzato, G.L., Rubinetti, D.: A knowledge-based system for the dynamic generation and classification of novel contents in multimedia broadcasting. In: ECAI 2020, pp. 680–687 (2020) Chiodino, E., Di Luccio, D., Lieto, A., Messina, A., Pozzato, G.L., Rubinetti, D.: A knowledge-based system for the dynamic generation and classification of novel contents in multimedia broadcasting. In: ECAI 2020, pp. 680–687 (2020)
56.
go back to reference Doulaty, M., Saz-Torralba, O., Ng, R.W.M., Hain, T.: Automatic genre and show identification of broadcast media. In: INTERSPEECH (2016) Doulaty, M., Saz-Torralba, O., Ng, R.W.M., Hain, T.: Automatic genre and show identification of broadcast media. In: INTERSPEECH (2016)
57.
go back to reference Yadav, A., Vishwakarma, D.K.: A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96, 106624 (2020) Yadav, A., Vishwakarma, D.K.: A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96, 106624 (2020)
58.
go back to reference Mills, T.J., Pye, D., Hollinghurst, N.J., Wood, K.R.: AT_TV: broadcast television and radio retrieval. In: RIAO, pp. 1135–1144 (2000) Mills, T.J., Pye, D., Hollinghurst, N.J., Wood, K.R.: AT_TV: broadcast television and radio retrieval. In: RIAO, pp. 1135–1144 (2000)
59.
go back to reference Smeaton, A.F., Over, P., Kraaij, W.: High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia Content Analysis, pp. 1–24 (2009) Smeaton, A.F., Over, P., Kraaij, W.: High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia Content Analysis, pp. 1–24 (2009)
60.
go back to reference Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: International Conference on Multimedia Modeling, pp. 616–621. Springer (2019) Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: International Conference on Multimedia Modeling, pp. 616–621. Springer (2019)
63.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
64.
go back to reference Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018) Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018)
65.
go back to reference Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy: Fixefficientnet (2020). arXiv:2003.08237 Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy: Fixefficientnet (2020). arXiv:​2003.​08237
66.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
67.
go back to reference Gkalelis, N., Goulas, A., Galanopoulos, D., Mezaris, V.: Objectgraphs: using objects and a graph convolutional network for the bottom-up recognition and explanation of events in video. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3370–3378 (2021). https://doi.org/10.1109/CVPRW53098.2021.00376 Gkalelis, N., Goulas, A., Galanopoulos, D., Mezaris, V.: Objectgraphs: using objects and a graph convolutional network for the bottom-up recognition and explanation of events in video. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3370–3378 (2021). https://​doi.​org/​10.​1109/​CVPRW53098.​2021.​00376
68.
go back to reference Pouyanfar, S., Chen, S.-C.: Semantic event detection using ensemble deep learning. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 203–208. IEEE (2016) Pouyanfar, S., Chen, S.-C.: Semantic event detection using ensemble deep learning. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 203–208. IEEE (2016)
69.
go back to reference Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., Wegrzyn-Wolska, K.: Survey on AI-based multimodal methods for emotion detection (2019) Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., Wegrzyn-Wolska, K.: Survey on AI-based multimodal methods for emotion detection (2019)
70.
go back to reference Kwak, C.-U., Son, J.-W., Lee, A., Kim, S.-J.: Scene emotion detection using closed caption based on hierarchical attention network. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1206–1208. IEEE (2017) Kwak, C.-U., Son, J.-W., Lee, A., Kim, S.-J.: Scene emotion detection using closed caption based on hierarchical attention network. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1206–1208. IEEE (2017)
71.
go back to reference Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015) Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)
72.
go back to reference Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10(1), 60–75 (2017) Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10(1), 60–75 (2017)
73.
go back to reference Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., Van de Walle, R.: An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 409–412 (2016) Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., Van de Walle, R.: An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 409–412 (2016)
75.
go back to reference Sharma, D.P., Atkins, J.: Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng. 7(4), 220–234 (2014) Sharma, D.P., Atkins, J.: Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng. 7(4), 220–234 (2014)
76.
go back to reference Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio Speech Process. 2021(1), 1–10 (2021) Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio Speech Process. 2021(1), 1–10 (2021)
77.
go back to reference Nixon, L., Mezaris, V., Thomsen, J.: Seamlessly interlinking tv and web content to enable linked television. In: ACM Int. Conf. on Interactive Experiences for Television and Online Video (TVX 2014), Adjunct Proceedings, Newcastle Upon Tyne, p. 21 (2014) Nixon, L., Mezaris, V., Thomsen, J.: Seamlessly interlinking tv and web content to enable linked television. In: ACM Int. Conf. on Interactive Experiences for Television and Online Video (TVX 2014), Adjunct Proceedings, Newcastle Upon Tyne, p. 21 (2014)
78.
go back to reference Liu, A.H., Jin, S., Lai, C.-I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning (2021). arXiv:2106.05438 Liu, A.H., Jin, S., Lai, C.-I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning (2021). arXiv:​2106.​05438
80.
go back to reference Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 17(1s), 1–25 (2021) Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 17(1s), 1–25 (2021)
81.
go back to reference Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for video-text retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1114–1124 (2021) Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for video-text retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1114–1124 (2021)
83.
go back to reference Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia (2019) Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
85.
go back to reference Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 336–340 (2020) Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 336–340 (2020)
86.
go back to reference Dong, J., Li, X., Xu, C., Ji, S., He, Y., Yang, G., Wang, X.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9346–9355 (2019) Dong, J., Li, X., Xu, C., Ji, S., He, Y., Yang, G., Wang, X.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9346–9355 (2019)
87.
go back to reference Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7464–7473 (2019) Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7464–7473 (2019)
89.
go back to reference Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: HERO: hierarchical encoder for video+ language omni-representation pre-training. In: EMNLP (2020) Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: HERO: hierarchical encoder for video+ language omni-representation pre-training. In: EMNLP (2020)
90.
go back to reference Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T.L., Bansal, M., Liu, J.: Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7331–7341 (2021) Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T.L., Bansal, M., Liu, J.: Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7331–7341 (2021)
91.
go back to reference Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer (2019). arXiv:1906.05743 Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer (2019). arXiv:​1906.​05743
92.
go back to reference Zhu, L., Yang, Y.: Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020) Zhu, L., Yang, Y.: Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020)
93.
go back to reference Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Li, J., Bharti, T., Zhou, M.: UniVL: a unified video and language pre-training model for multimodal understanding and generation (2020). arXiv:2002.06353 Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Li, J., Bharti, T., Zhou, M.: UniVL: a unified video and language pre-training model for multimodal understanding and generation (2020). arXiv:​2002.​06353
94.
go back to reference Gao, Z., Liu, J., Chen, S., Chang, D., Zhang, H., Yuan, J.: CLIP2TV: an empirical study on transformer-based methods for video-text retrieval (2021). arXiv:2111.05610 Gao, Z., Liu, J., Chen, S., Chang, D., Zhang, H., Yuan, J.: CLIP2TV: an empirical study on transformer-based methods for video-text retrieval (2021). arXiv:​2111.​05610
95.
go back to reference Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5288–5296 (2016) Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5288–5296 (2016)
112.
go back to reference Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 358–374. Springer, Cham (2018) Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 358–374. Springer, Cham (2018)
113.
go back to reference Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) Computer Vision—ACCV 2018 Workshops, pp. 39–54. Springer, Cham (2019) Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) Computer Vision—ACCV 2018 Workshops, pp. 39–54. Springer, Cham (2019)
114.
go back to reference Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: The 13th Asian Conference on Computer Vision (ACCV’16) (2016) Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: The 13th Asian Conference on Computer Vision (ACCV’16) (2016)
115.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
117.
go back to reference Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 766–782. Springer, Cham (2016) Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 766–782. Springer, Cham (2016)
119.
120.
122.
go back to reference Zhao, B., Li, X., Lu, X.: HSA-RNN: Hierarchical structure-adaptive rnn for video summarization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. CVPR ’18 (2018) Zhao, B., Li, X., Lu, X.: HSA-RNN: Hierarchical structure-adaptive rnn for video summarization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. CVPR ’18 (2018)
123.
go back to reference Zhang, Y., Kampffmeyer, M., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Dtr-gan: Dilated temporal relational adversarial network for video summarization (2018). arXiv:1804.11228 [CoRR/abs] Zhang, Y., Kampffmeyer, M., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Dtr-gan: Dilated temporal relational adversarial network for video summarization (2018). arXiv:​1804.​11228 [CoRR/abs]
124.
go back to reference Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Ac-sum-gan: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans. Circuits Syst. Video Technol. (2020) Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Ac-sum-gan: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans. Circuits Syst. Video Technol. (2020)
125.
go back to reference Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019) Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019)
126.
go back to reference Jung, Y., Cho, D., Woo, S., Kweon, I.S.: Global-and-local relative position embedding for unsupervised video summarization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XXV 16, pp. 167–183 (2020). Springer Jung, Y., Cho, D., Woo, S., Kweon, I.S.: Global-and-local relative position embedding for unsupervised video summarization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XXV 16, pp. 167–183 (2020). Springer
127.
go back to reference Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on Multimedia Modeling, pp. 492–504 (2020). Springer Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on Multimedia Modeling, pp. 492–504 (2020). Springer
128.
go back to reference Apostolidis, E., Metsai, A.I., Adamantidou, E., Mezaris, V., Patras, I.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–25 (2019) Apostolidis, E., Metsai, A.I., Adamantidou, E., Mezaris, V., Patras, I.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–25 (2019)
129.
go back to reference Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., Tan, T.: Stacked memory network for video summarization. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844 (2019) Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., Tan, T.: Stacked memory network for video summarization. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844 (2019)
130.
go back to reference Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Asian Conference on Computer Vision, pp. 39–54 (2018). Springer Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Asian Conference on Computer Vision, pp. 39–54 (2018). Springer
131.
go back to reference Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., Wang, Y.-C.F.: Learning hierarchical self-attention for video summarization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3377–3381 (2019). IEEE Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., Wang, Y.-C.F.: Learning hierarchical self-attention for video summarization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3377–3381 (2019). IEEE
132.
go back to reference Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., Shao, L.: Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recogn. 111, 107677 (2021) Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., Shao, L.: Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recogn. 111, 107677 (2021)
133.
go back to reference Ji, Z., Jiao, F., Pang, Y., Shao, L.: Deep attentive and semantic preserving video summarization. Neurocomputing 405, 200–207 (2020) Ji, Z., Jiao, F., Pang, Y., Shao, L.: Deep attentive and semantic preserving video summarization. Neurocomputing 405, 200–207 (2020)
134.
go back to reference Apostolidis, E., Balaouras, G., Mezaris, V., Patras, I.: Combining global and local attention with positional encoding for video summarization. In: 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234. IEEE (2021) Apostolidis, E., Balaouras, G., Mezaris, V., Patras, I.: Combining global and local attention with positional encoding for video summarization. In: 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234. IEEE (2021)
135.
go back to reference Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 677–680 (2008) Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 677–680 (2008)
136.
go back to reference Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K.: Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1258–1267 (2019) Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K.: Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1258–1267 (2019)
137.
go back to reference Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework. In: 2005 IEEE International Conference on Multimedia and Expo, p. 4. IEEE (2005) Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework. In: 2005 IEEE International Conference on Multimedia and Expo, p. 4. IEEE (2005)
138.
go back to reference Tang, H., Kwatra, V., Sargin, M.E., Gargi, U.: Detecting highlights in sports videos: cricket as a test case. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2011) Tang, H., Kwatra, V., Sargin, M.E., Gargi, U.: Detecting highlights in sports videos: cricket as a test case. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2011)
139.
go back to reference Wang, J., Xu, C., Chng, E., Tian, Q.: Sports highlight detection from keyword sequences using HMM. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602. IEEE (2004) Wang, J., Xu, C., Chng, E., Tian, Q.: Sports highlight detection from keyword sequences using HMM. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602. IEEE (2004)
140.
go back to reference Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for tv baseball programs. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 105–115 (2000) Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for tv baseball programs. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 105–115 (2000)
141.
go back to reference Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: European Conference on Computer Vision, pp. 787–802. Springer (2014) Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: European Conference on Computer Vision, pp. 787–802. Springer (2014)
142.
go back to reference Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-modal extraction of highlights from tv formula 1 programs. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 817–820. IEEE (2002) Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-modal extraction of highlights from tv formula 1 programs. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 817–820. IEEE (2002)
143.
go back to reference Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 982–990 (2016) Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 982–990 (2016)
144.
go back to reference Gygli, M., Song, Y., Cao, L.: Video2gif: automatic generation of animated gifs from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1001–1009 (2016) Gygli, M., Song, Y., Cao, L.: Video2gif: automatic generation of animated gifs from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1001–1009 (2016)
145.
go back to reference Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimed. 20(10), 2693–2705 (2018) Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimed. 20(10), 2693–2705 (2018)
146.
go back to reference Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014) Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014)
147.
go back to reference Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4633–4641 (2015) Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4633–4641 (2015)
148.
go back to reference Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017) Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017)
149.
go back to reference Hong, F.-T., Huang, X., Li, W.-H., Zheng, W.-S.: Mini-net: multiple instance ranking network for video highlight detection. In: European Conference on Computer Vision, pp. 345–360. Springer (2020) Hong, F.-T., Huang, X., Li, W.-H., Zheng, W.-S.: Mini-net: multiple instance ranking network for video highlight detection. In: European Conference on Computer Vision, pp. 345–360. Springer (2020)
150.
go back to reference Rochan, M., Reddy, M.K.K., Ye, L., Wang, Y.: Adaptive video highlight detection by learning from user history. In: European Conference on Computer Vision, pp. 261–278. Springer (2020) Rochan, M., Reddy, M.K.K., Ye, L., Wang, Y.: Adaptive video highlight detection by learning from user history. In: European Conference on Computer Vision, pp. 261–278. Springer (2020)
151.
go back to reference Wu, L., Yang, Y., Chen, L., Lian, D., Hong, R., Wang, M.: Learning to transfer graph embeddings for inductive graph based recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1211–1220 (2020) Wu, L., Yang, Y., Chen, L., Lian, D., Hong, R., Wang, M.: Learning to transfer graph embeddings for inductive graph based recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1211–1220 (2020)
152.
go back to reference Xu, M., Wang, H., Ni, B., Zhu, R., Sun, Z., Wang, C.: Cross-category video highlight detection via set-based learning (2021). arXiv:2108.11770 Xu, M., Wang, H., Ni, B., Zhu, R., Sun, Z., Wang, C.: Cross-category video highlight detection via set-based learning (2021). arXiv:​2108.​11770
153.
go back to reference Mundnich, K., Fenster, A., Khare, A., Sundaram, S.: Audiovisual highlight detection in videos. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4155–4159. IEEE (2021) Mundnich, K., Fenster, A., Khare, A., Sundaram, S.: Audiovisual highlight detection in videos. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4155–4159. IEEE (2021)
154.
go back to reference Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004) Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)
155.
go back to reference Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution from undersampled color images. In: Computational Imaging II, vol. 5299, pp. 222–233. International Society for Optics and Photonics (2004) Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution from undersampled color images. In: Computational Imaging II, vol. 5299, pp. 222–233. International Society for Optics and Photonics (2004)
156.
go back to reference Farsiu, S., Robinson, D.M., Elad, M., Milanfar, P.: Dynamic demosaicing and color superresolution of video sequences. In: Image Reconstruction from Incomplete Data III, vol. 5562, pp. 169–178. International Society for Optics and Photonics (2004) Farsiu, S., Robinson, D.M., Elad, M., Milanfar, P.: Dynamic demosaicing and color superresolution of video sequences. In: Image Reconstruction from Incomplete Data III, vol. 5562, pp. 169–178. International Society for Optics and Photonics (2004)
157.
go back to reference Yang, C.-Y., Huang, J.-B., Yang, M.-H.: Exploiting self-similarities for single frame super-resolution. In: Asian Conference on Computer Vision, pp. 497–510. Springer (2010) Yang, C.-Y., Huang, J.-B., Yang, M.-H.: Exploiting self-similarities for single frame super-resolution. In: Asian Conference on Computer Vision, pp. 497–510. Springer (2010)
158.
go back to reference Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002) Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)
159.
go back to reference Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015) Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
160.
go back to reference Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009) Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
161.
go back to reference Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
162.
go back to reference Rad, M.S., Bozorgtabar, B., Marti, U.-V., Basler, M., Ekenel, H.K., Thiran, J.-P.: Srobb: targeted perceptual loss for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2710–2719 (2019) Rad, M.S., Bozorgtabar, B., Marti, U.-V., Basler, M., Ekenel, H.K., Thiran, J.-P.: Srobb: targeted perceptual loss for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2710–2719 (2019)
163.
go back to reference Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017) Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
164.
go back to reference Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018) Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
165.
go back to reference Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019) Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
167.
go back to reference Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016) Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016)
168.
go back to reference Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021) Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021)
169.
go back to reference Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation (2021). arXiv:2106.15282 Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation (2021). arXiv:​2106.​15282
170.
go back to reference Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement (2021). arXiv:2104.07636 Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement (2021). arXiv:​2104.​07636
171.
go back to reference Chadha, A., Britto, J., Roja, M.M.: iseebetter: spatio-temporal video super-resolution using recurrent generative back-projection networks. Comput. Vis. Media 6(3), 307–317 (2020) Chadha, A., Britto, J., Roja, M.M.: iseebetter: spatio-temporal video super-resolution using recurrent generative back-projection networks. Comput. Vis. Media 6(3), 307–317 (2020)
172.
go back to reference Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. In: Proceedings of the 31st British Machine Vision Conference (BMVC) (2020) Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. In: Proceedings of the 31st British Machine Vision Conference (BMVC) (2020)
173.
go back to reference Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019) Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
174.
go back to reference Rozumnyi, D., Oswald, M.R., Ferrari, V., Matas, J., Pollefeys, M.: DeFMO: deblurring and shape recovery of fast moving objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2021) Rozumnyi, D., Oswald, M.R., Ferrari, V., Matas, J., Pollefeys, M.: DeFMO: deblurring and shape recovery of fast moving objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2021)
175.
go back to reference Liu, H., Ruan, Z., Zhao, P., Dong, C., Shang, F., Liu, Y., Yang, L.: Video super resolution based on deep learning: a comprehensive survey (2020). arXiv:2007.12928 Liu, H., Ruan, Z., Zhao, P., Dong, C., Shang, F., Liu, Y., Yang, L.: Video super resolution based on deep learning: a comprehensive survey (2020). arXiv:​2007.​12928
177.
go back to reference Lee, H.-S., Bae, G., Cho, S.-I., Kim, Y.-H., Kang, S.: Smartgrid: video retargeting with spatiotemporal grid optimization. IEEE Access 7, 127564–127579 (2019) Lee, H.-S., Bae, G., Cho, S.-I., Kim, Y.-H., Kang, S.: Smartgrid: video retargeting with spatiotemporal grid optimization. IEEE Access 7, 127564–127579 (2019)
178.
go back to reference Rachavarapu, K.-K., Kumar, M., Gandhi, V., Subramanian, R.: Watch to edit: video retargeting using gaze. In: Computer Graphics Forum, vol. 37, pp. 205–215. Wiley Online Library (2018) Rachavarapu, K.-K., Kumar, M., Gandhi, V., Subramanian, R.: Watch to edit: video retargeting using gaze. In: Computer Graphics Forum, vol. 37, pp. 205–215. Wiley Online Library (2018)
179.
go back to reference Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2), 1–12 (2015) Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2), 1–12 (2015)
181.
go back to reference Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 241–250 (2006) Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 241–250 (2006)
182.
go back to reference Kaur, H., Kour, S., Sen, D.: Video retargeting through spatio-temporal seam carving using kalman filter. IET Image Proc. 13(11), 1862–1871 (2019) Kaur, H., Kour, S., Sen, D.: Video retargeting through spatio-temporal seam carving using kalman filter. IET Image Proc. 13(11), 1862–1871 (2019)
184.
go back to reference Wang, Y.-S., Lin, H.-C., Sorkine, O., Lee, T.-Y.: Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 Papers, pp. 1–9 (2010) Wang, Y.-S., Lin, H.-C., Sorkine, O., Lee, T.-Y.: Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 Papers, pp. 1–9 (2010)
186.
go back to reference Kiess, J., Guthier, B., Kopf, S., Effelsberg, W.: SeamCrop for image retargeting. In: Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI, vol. 8304, p. 83040. International Society for Optics and Photonics (2012) Kiess, J., Guthier, B., Kopf, S., Effelsberg, W.: SeamCrop for image retargeting. In: Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI, vol. 8304, p. 83040. International Society for Optics and Photonics (2012)
187.
go back to reference Nam, S.-H., Ahn, W., Yu, I.-J., Kwon, M.-J., Son, M., Lee, H.-K.: Deep convolutional neural network for identifying seam-carving forgery. IEEE Trans. Circuits Syst. Video Technol. (2020) Nam, S.-H., Ahn, W., Yu, I.-J., Kwon, M.-J., Son, M., Lee, H.-K.: Deep convolutional neural network for identifying seam-carving forgery. IEEE Trans. Circuits Syst. Video Technol. (2020)
188.
go back to reference Apostolidis, K., Mezaris, V.: A fast smart-cropping method and dataset for video retargeting. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2618–2622. IEEE (2021) Apostolidis, K., Mezaris, V.: A fast smart-cropping method and dataset for video retargeting. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2618–2622. IEEE (2021)
189.
go back to reference Chou, Y.-C., Fang, C.-Y., Su, P.-C., Chien, Y.-C.: Content-based cropping using visual saliency and blur detection. In: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–6. IEEE (2017) Chou, Y.-C., Fang, C.-Y., Su, P.-C., Chien, Y.-C.: Content-based cropping using visual saliency and blur detection. In: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–6. IEEE (2017)
190.
go back to reference Zhu, T., Zhang, D., Hu, Y., Wang, T., Jiang, X., Zhu, J., Li, J.: Horizontal-to-vertical video conversion. IEEE Trans. Multimed. (2021) Zhu, T., Zhang, D., Hu, Y., Wang, T., Jiang, X., Zhu, J., Li, J.: Horizontal-to-vertical video conversion. IEEE Trans. Multimed. (2021)
192.
go back to reference Kim, E., Pyo, S., Park, E., Kim, M.: An automatic recommendation scheme of tv program contents for (ip) tv personalization. IEEE Trans. Broadcast. 57(3), 674–684 (2011) Kim, E., Pyo, S., Park, E., Kim, M.: An automatic recommendation scheme of tv program contents for (ip) tv personalization. IEEE Trans. Broadcast. 57(3), 674–684 (2011)
193.
go back to reference Soares, M., Viana, P.: Tv recommendation and personalization systems: integrating broadcast and video on-demand services. Adv. Electr. Comput. Eng. 14(1), 115–120 (2014) Soares, M., Viana, P.: Tv recommendation and personalization systems: integrating broadcast and video on-demand services. Adv. Electr. Comput. Eng. 14(1), 115–120 (2014)
194.
go back to reference Hsu, S.H., Wen, M.-H., Lin, H.-C., Lee, C.-C., Lee, C.-H.: Aimed-a personalized tv recommendation system. In: European Conference on Interactive Television, pp. 166–174. Springer (2007) Hsu, S.H., Wen, M.-H., Lin, H.-C., Lee, C.-C., Lee, C.-H.: Aimed-a personalized tv recommendation system. In: European Conference on Interactive Television, pp. 166–174. Springer (2007)
195.
go back to reference Aharon, M., Hillel, E., Kagian, A., Lempel, R., Makabee, H., Nissim, R.: Watch-it-next: a contextual tv recommendation system. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–195. Springer (2015) Aharon, M., Hillel, E., Kagian, A., Lempel, R., Makabee, H., Nissim, R.: Watch-it-next: a contextual tv recommendation system. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–195. Springer (2015)
196.
go back to reference Aroyo, L., Nixon, L., Miller, L.: NoTube: the television experience enhanced by online social and semantic data. In: 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), pp. 269–273. IEEE (2011) Aroyo, L., Nixon, L., Miller, L.: NoTube: the television experience enhanced by online social and semantic data. In: 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), pp. 269–273. IEEE (2011)
200.
go back to reference Armstrong, M., Brooks, M., Churnside, A., Evans, M., Melchior, F., Shotton, M.: Object-based broadcasting-curation, responsiveness and user experience (2014) Armstrong, M., Brooks, M., Churnside, A., Evans, M., Melchior, F., Shotton, M.: Object-based broadcasting-curation, responsiveness and user experience (2014)
201.
go back to reference Cox, J., Jones, R., Northwood, C., Tutcher, J., Robinson, B.: Object-based production: a personalised interactive cooking application. In: Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 79–80 (2017) Cox, J., Jones, R., Northwood, C., Tutcher, J., Robinson, B.: Object-based production: a personalised interactive cooking application. In: Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 79–80 (2017)
202.
go back to reference Ursu, M., Smith, D., Hook, J., Concannon, S., Gray, J.: Authoring interactive fictional stories in object-based media (OBM). In: ACM International Conference on Interactive Media Experiences, pp. 127–137 (2020) Ursu, M., Smith, D., Hook, J., Concannon, S., Gray, J.: Authoring interactive fictional stories in object-based media (OBM). In: ACM International Conference on Interactive Media Experiences, pp. 127–137 (2020)
203.
go back to reference Silzle, A., Weitnauer, M., Warusfel, O., Bleisteiner, W., Herberger, T., Epain, N., Duval, B., Bogaards, N., Baume, C., Herzog, U., et al.: Orpheus audio project: piloting an end-to-end object-based audio broadcasting chain. In: IBC Conference, Amsterdam, September, pp. 14–18 (2017) Silzle, A., Weitnauer, M., Warusfel, O., Bleisteiner, W., Herberger, T., Epain, N., Duval, B., Bogaards, N., Baume, C., Herzog, U., et al.: Orpheus audio project: piloting an end-to-end object-based audio broadcasting chain. In: IBC Conference, Amsterdam, September, pp. 14–18 (2017)
204.
go back to reference Chen, X., Nguyen, T.V., Shen, Z., Kankanhalli, M.: Livesense: contextual advertising in live streaming videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 392–400 (2019) Chen, X., Nguyen, T.V., Shen, Z., Kankanhalli, M.: Livesense: contextual advertising in live streaming videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 392–400 (2019)
205.
go back to reference Akgul, T., Ozcan, S., Iplik, A.: A cloud-based end-to-end server-side dynamic ad insertion platform for live content. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 361–364 (2020) Akgul, T., Ozcan, S., Iplik, A.: A cloud-based end-to-end server-side dynamic ad insertion platform for live content. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 361–364 (2020)
206.
go back to reference Carvalho, P., Pereira, A., Viana, P.: Automatic tv logo identification for advertisement detection without prior data. Appl. Sci. 11(16), 7494 (2021) Carvalho, P., Pereira, A., Viana, P.: Automatic tv logo identification for advertisement detection without prior data. Appl. Sci. 11(16), 7494 (2021)
207.
go back to reference Park, S., Cho, K.: Framework for personalized broadcast notice based on contents metadata. In: Proceedings of the Korea Contents Association Conference, pp. 445–446. The Korea Contents Association (2014) Park, S., Cho, K.: Framework for personalized broadcast notice based on contents metadata. In: Proceedings of the Korea Contents Association Conference, pp. 445–446. The Korea Contents Association (2014)
208.
go back to reference Hunter, J.: Adding multimedia to the semantic web: Building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working. SWWS’01, pp. 261–283. CEUR-WS.org, Aachen, DEU (2001) Hunter, J.: Adding multimedia to the semantic web: Building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working. SWWS’01, pp. 261–283. CEUR-WS.org, Aachen, DEU (2001)
209.
go back to reference EBU-MIM: EBU-MIM semantic web activity report. Technical report, EBU-MIM (2015). Accessed 30 Sept 2021 EBU-MIM: EBU-MIM semantic web activity report. Technical report, EBU-MIM (2015). Accessed 30 Sept 2021
210.
go back to reference Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, pp. 782–792 (2011) Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, pp. 782–792 (2011)
211.
go back to reference Brasoveanu, A.M., Weichselbraun, A., Nixon, L.: In media res: a corpus for evaluating named entity linking with creative works. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 355–364 (2020) Brasoveanu, A.M., Weichselbraun, A., Nixon, L.: In media res: a corpus for evaluating named entity linking with creative works. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 355–364 (2020)
212.
go back to reference Nixon, L., Troncy, R.: Survey of semantic media annotation tools for the web: towards new media applications with linked media. In: European Semantic Web Conference, pp. 100–114. Springer (2014) Nixon, L., Troncy, R.: Survey of semantic media annotation tools for the web: towards new media applications with linked media. In: European Semantic Web Conference, pp. 100–114. Springer (2014)
213.
go back to reference Collyda, C., Apostolidis, K., Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V.: A web service for video summarization. In: ACM International Conference on Interactive Media Experiences, pp. 148–153 (2020) Collyda, C., Apostolidis, K., Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V.: A web service for video summarization. In: ACM International Conference on Interactive Media Experiences, pp. 148–153 (2020)
217.
go back to reference Armstrong, M.: Object-based media: a toolkit for building responsive content. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference 32, pp. 1–2 (2018) Armstrong, M.: Object-based media: a toolkit for building responsive content. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference 32, pp. 1–2 (2018)
218.
go back to reference Cox, J., Brooks, M., Forrester, I., Armstrong, M.: Moving object-based media production from one-off examples to scalable workflows. SMPTE Motion Imaging J. 127(4), 32–37 (2018) Cox, J., Brooks, M., Forrester, I., Armstrong, M.: Moving object-based media production from one-off examples to scalable workflows. SMPTE Motion Imaging J. 127(4), 32–37 (2018)
219.
go back to reference Carter, J., Ramdhany, R., Lomas, M., Pearce, T., Shephard, J., Sparks, M.: Universal access for object-based media experiences. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 382–385 (2020) Carter, J., Ramdhany, R., Lomas, M., Pearce, T., Shephard, J., Sparks, M.: Universal access for object-based media experiences. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 382–385 (2020)
220.
go back to reference Zwicklbauer, M., Lamm, W., Gordon, M., Apostolidis, K., Philipp, B., Mezaris, V.: Video analysis for interactive story creation: the sandmännchen showcase. In: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–24 (2020) Zwicklbauer, M., Lamm, W., Gordon, M., Apostolidis, K., Philipp, B., Mezaris, V.: Video analysis for interactive story creation: the sandmännchen showcase. In: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–24 (2020)
221.
go back to reference Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies, pp. 501–510. Springer, Cham (2018) Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies, pp. 501–510. Springer, Cham (2018)
222.
go back to reference Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J.: Product placement platform for personalised advertising. New European Media (NEM) Summit 2016 (2016) Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J.: Product placement platform for personalised advertising. New European Media (NEM) Summit 2016 (2016)
223.
go back to reference Malheiro, B., Foss, J., Burguillo, J.: B2B platform for media content personalisation. In: B2B Platform for Media Content Personalisation (2013) Malheiro, B., Foss, J., Burguillo, J.: B2B platform for media content personalisation. In: B2B Platform for Media Content Personalisation (2013)
225.
go back to reference Stewart, S.: Video game industry silently taking over entertainment world. Verfügbar unter ejinsight. com/eji/article/id/2280405/20191022 (2019) Stewart, S.: Video game industry silently taking over entertainment world. Verfügbar unter ejinsight. com/eji/article/id/2280405/20191022 (2019)
226.
go back to reference Witkowski, W.: Videogames are a bigger industry than movies and north American sports combined, thanks to the pandemic. MarketWatch (2020) Witkowski, W.: Videogames are a bigger industry than movies and north American sports combined, thanks to the pandemic. MarketWatch (2020)
227.
go back to reference Ward, L., Paradis, M., Shirley, B., Russon, L., Moore, R., Davies, R.: Casualty accessible and enhanced (A&E) audio: trialling object-based accessible tv audio. In: Audio Engineering Society Convention, p. 147. Audio Engineering Society (2019) Ward, L., Paradis, M., Shirley, B., Russon, L., Moore, R., Davies, R.: Casualty accessible and enhanced (A&E) audio: trialling object-based accessible tv audio. In: Audio Engineering Society Convention, p. 147. Audio Engineering Society (2019)
228.
go back to reference Montagud, M., Núñez, J.A., Karavellas, T., Jurado, I., Fernández, S.: Convergence between tv and vr: enabling truly immersive and social experiences. In: Workshop on Virtual Reality, Co-located with ACM TVX 2018 (2018) Montagud, M., Núñez, J.A., Karavellas, T., Jurado, I., Fernández, S.: Convergence between tv and vr: enabling truly immersive and social experiences. In: Workshop on Virtual Reality, Co-located with ACM TVX 2018 (2018)
229.
go back to reference Kudumakis, P., Wilmering, T., Sandler, M., Foss, J.: MPEG IPR ontologies for media trading and personalization. In: International Workshop on Data-Driven Personalization of Television (DataTV2019), ACM International Conference on Interactive Experiences for Television and Online Video (TVX2019) (2019) Kudumakis, P., Wilmering, T., Sandler, M., Foss, J.: MPEG IPR ontologies for media trading and personalization. In: International Workshop on Data-Driven Personalization of Television (DataTV2019), ACM International Conference on Interactive Experiences for Television and Online Video (TVX2019) (2019)
231.
go back to reference ISO/IEC.: Information technology—multimedia framework (MPEG-21)—part 19: Media value chain ontology/amd 1 extensions on time-segments and multi-track audio’. Standard, International Organization for Standardization (2018). Accessed 30 Sept 2021 ISO/IEC.: Information technology—multimedia framework (MPEG-21)—part 19: Media value chain ontology/amd 1 extensions on time-segments and multi-track audio’. Standard, International Organization for Standardization (2018). Accessed 30 Sept 2021
232.
go back to reference ISO/IEC.: Information technology—multimedia framework (MPEG-21)—media contract ontology. standard, International Organization for Standardization (2017). Accessed 30 Sept 2021 ISO/IEC.: Information technology—multimedia framework (MPEG-21)—media contract ontology. standard, International Organization for Standardization (2017). Accessed 30 Sept 2021
237.
go back to reference ISO/IE.: MPEG-7, part 1 et seq. standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IE.: MPEG-7, part 1 et seq. standard, International Organization for Standardization. Accessed 30 Sept 2021
238.
go back to reference Chang, S.-F., Sikora, T., Purl, A.: Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst. Video Technol. 11(6), 688–695 (2001) Chang, S.-F., Sikora, T., Purl, A.: Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst. Video Technol. 11(6), 688–695 (2001)
239.
go back to reference ISO/IEC.: Introduction to MPEG-7, coding of moving pictures and audio. Standard, International Organization for Standardization (March 2001). Accessed 30 Sept 2021 ISO/IEC.: Introduction to MPEG-7, coding of moving pictures and audio. Standard, International Organization for Standardization (March 2001). Accessed 30 Sept 2021
240.
go back to reference ISO/IEC.: MPEG-I: Scene description for MPEG media, MPEG group, MPEG-I part 14. Standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IEC.: MPEG-I: Scene description for MPEG media, MPEG group, MPEG-I part 14. Standard, International Organization for Standardization. Accessed 30 Sept 2021
241.
go back to reference ISO/IEC.: Coded representation of immersive media– part 14: scene description for mpeg media, ISO. Standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IEC.: Coded representation of immersive media– part 14: scene description for mpeg media, ISO. Standard, International Organization for Standardization. Accessed 30 Sept 2021
242.
go back to reference Group, M.: MPEG group, coded representation of immersive media. standard, MPEG standards (2020). Accessed 30 Sept 2021 Group, M.: MPEG group, coded representation of immersive media. standard, MPEG standards (2020). Accessed 30 Sept 2021
243.
go back to reference Group, M.: MPEG-I: Versatile video coding, MPEG-I part 3, MPEG group. Standard, MPEG standards. Accessed 30 Sept 2021 Group, M.: MPEG-I: Versatile video coding, MPEG-I part 3, MPEG group. Standard, MPEG standards. Accessed 30 Sept 2021
244.
go back to reference Wieckowski, A., Ma, J., Schwarz, H., Marpe, D., Wiegand, T.: Fast partitioning decision strategies for the upcoming versatile video coding (VVC) standard. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4130–4134. IEEE (2019) Wieckowski, A., Ma, J., Schwarz, H., Marpe, D., Wiegand, T.: Fast partitioning decision strategies for the upcoming versatile video coding (VVC) standard. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4130–4134. IEEE (2019)
254.
go back to reference EBU.: EBU tech 3351–ccdm. Technical report, EBU (August 2020). Accessed 30 Sept 2021 EBU.: EBU tech 3351–ccdm. Technical report, EBU (August 2020). Accessed 30 Sept 2021
258.
go back to reference ISO/IEC.: Information technology–multimedia framework (MPEG-21)–contract expression language. Standard, International Organization for Standardization (2016). Accessed 30 Sept 2021 ISO/IEC.: Information technology–multimedia framework (MPEG-21)–contract expression language. Standard, International Organization for Standardization (2016). Accessed 30 Sept 2021
259.
go back to reference Rodríguez-Doncel, V.: Overview of the mpeg-21 media contract ontology. In: Overview of the MPEG-21 Media Contract Ontology (2016) Rodríguez-Doncel, V.: Overview of the mpeg-21 media contract ontology. In: Overview of the MPEG-21 Media Contract Ontology (2016)
263.
go back to reference Shou, M.Z., Ghadiyaram, D., Wang, W., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation (2021). arXiv:2101.10511 [CoRR abs] Shou, M.Z., Ghadiyaram, D., Wang, W., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation (2021). arXiv:​2101.​10511 [CoRR abs]
264.
go back to reference Krishna, M.V., Bodesheim, P., Körner, M., Denzler, J.: Temporal video segmentation by event detection: a novelty detection approach. Pattern Recogn. Image Anal. 24(2), 243–255 (2014) Krishna, M.V., Bodesheim, P., Körner, M., Denzler, J.: Temporal video segmentation by event detection: a novelty detection approach. Pattern Recogn. Image Anal. 24(2), 243–255 (2014)
265.
go back to reference Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017) Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)
266.
go back to reference Shou, M.Z., Lei, S.W., Wang, W., Ghadiyaram, D., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8075–8084 (2021) Shou, M.Z., Lei, S.W., Wang, W., Ghadiyaram, D., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8075–8084 (2021)
267.
go back to reference Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi, K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: a dataset and benchmarks for holistic understanding of broadcast soccer videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4508–4519 (2021) Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi, K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: a dataset and benchmarks for holistic understanding of broadcast soccer videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4508–4519 (2021)
268.
go back to reference Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2015) Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2015)
271.
go back to reference Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
272.
go back to reference Kaur, P., Pannu, H.S., Malhi, A.K.: Comparative analysis on cross-modal information retrieval: a review. Comput. Sci. Rev. 39, 100336 (2021) Kaur, P., Pannu, H.S., Malhi, A.K.: Comparative analysis on cross-modal information retrieval: a review. Comput. Sci. Rev. 39, 100336 (2021)
273.
go back to reference Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015) Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
274.
go back to reference Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4581–4591 (2019) Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4581–4591 (2019)
275.
go back to reference Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv:1609.08675 Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv:​1609.​08675
276.
go back to reference Rehman, S.U., Waqas, M., Tu, S., Koubaa, A., ur Rehman, O., Ahmad, J., Hanif, M., Han, Z.: Deep learning techniques for future intelligent cross-media retrieval. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems (2020) Rehman, S.U., Waqas, M., Tu, S., Koubaa, A., ur Rehman, O., Ahmad, J., Hanif, M., Han, Z.: Deep learning techniques for future intelligent cross-media retrieval. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems (2020)
277.
go back to reference Tu, S., ur Rehman, S., Waqas, M., Rehman, O.u., Yang, Z., Ahmad, B., Halim, Z., Zhao, W.: Optimisation-based training of evolutionary convolution neural network for visual classification applications. IET Comput. Vis. 14(5), 259–267 (2020) Tu, S., ur Rehman, S., Waqas, M., Rehman, O.u., Yang, Z., Ahmad, B., Halim, Z., Zhao, W.: Optimisation-based training of evolutionary convolution neural network for visual classification applications. IET Comput. Vis. 14(5), 259–267 (2020)
278.
go back to reference Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929 [CoRR abs] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:​2010.​11929 [CoRR abs]
279.
go back to reference Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 34 (2021) Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 34 (2021)
280.
go back to reference Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., Parisi, P., et al.: Unity perception: generate synthetic data for computer vision (2021). arXiv:2107.04259 Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., Parisi, P., et al.: Unity perception: generate synthetic data for computer vision (2021). arXiv:​2107.​04259
281.
go back to reference Tan, C., Xu, X., Shen, F.: A survey of zero shot detection: methods and applications. Cogn. Robot. 1, 159–167 (2021) Tan, C., Xu, X., Shen, F.: A survey of zero shot detection: methods and applications. Cogn. Robot. 1, 159–167 (2021)
282.
go back to reference Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–37 (2019) Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–37 (2019)
283.
go back to reference Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021) Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021)
285.
go back to reference Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.: Jointly localizing and describing events for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7492–7500 (2018) Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.: Jointly localizing and describing events for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7492–7500 (2018)
286.
go back to reference Chen, S., Jiang, Y.-G.: Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8435 (2021) Chen, S., Jiang, Y.-G.: Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8435 (2021)
287.
go back to reference Dong, C., Chen, X., Chen, A., Hu, F., Wang, Z., Li, X.: Multi-level visual representation with semantic-reinforced learning for video captioning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4750–4754 (2021) Dong, C., Chen, X., Chen, A., Hu, F., Wang, Z., Li, X.: Multi-level visual representation with semantic-reinforced learning for video captioning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4750–4754 (2021)
288.
go back to reference Francis, D., Anh Nguyen, P., Huet, B., Ngo, C.-W.: Fusion of multimodal embeddings for ad-hoc video search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019) Francis, D., Anh Nguyen, P., Huet, B., Ngo, C.-W.: Fusion of multimodal embeddings for ad-hoc video search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019)
289.
go back to reference Yaliniz, G., Ikizler-Cinbis, N.: Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimed. Tools Appl. 80(12), 17827–17847 (2021) Yaliniz, G., Ikizler-Cinbis, N.: Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimed. Tools Appl. 80(12), 17827–17847 (2021)
293.
go back to reference Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2113–2122 (2021) Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2113–2122 (2021)
295.
go back to reference Ignatov, A., Timofte, R., Denna, M., Younes, A.: Real-time quantized image super-resolution on mobile NPUs, mobile AI 2021 challenge: Report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2525–2534 (2021) Ignatov, A., Timofte, R., Denna, M., Younes, A.: Real-time quantized image super-resolution on mobile NPUs, mobile AI 2021 challenge: Report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2525–2534 (2021)
296.
go back to reference Ignatov, A., Romero, A., Kim, H., Timofte, R.: Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2535–2544 (2021) Ignatov, A., Romero, A., Kim, H., Timofte, R.: Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2535–2544 (2021)
297.
go back to reference Zang, T., Zhu, Y., Liu, H., Zhang, R., Yu, J.: A survey on cross-domain recommendation: taxonomies, methods, and future directions (2021). arXiv:2108.03357 [CoRR abs] Zang, T., Zhu, Y., Liu, H., Zhang, R., Yu, J.: A survey on cross-domain recommendation: taxonomies, methods, and future directions (2021). arXiv:​2108.​03357 [CoRR abs]
298.
go back to reference Nixon, L., Ciesielski, K., Philipp, B.: AI for audience prediction and profiling to power innovative TV content recommendation services, pp. 42–48 (2019) Nixon, L., Ciesielski, K., Philipp, B.: AI for audience prediction and profiling to power innovative TV content recommendation services, pp. 42–48 (2019)
299.
go back to reference Talu\(\breve{{\rm g}}\), D.Y.: User expectations on smart TV; an empiric study on user emotions towards smart TV. Turk. Online J. Design Art Commun. 11(2), 424–442 (2021) Talu\(\breve{{\rm g}}\), D.Y.: User expectations on smart TV; an empiric study on user emotions towards smart TV. Turk. Online J. Design Art Commun. 11(2), 424–442 (2021)
300.
go back to reference Borgotallo, R., Pero, R.D., Messina, A., Negro, F., Vignaroli, L., Aroyo, L., Aart, C., Conconi, A.: Personalized semantic news: Combining semantics and television. In: International Conference on User Centric Media, pp. 137–140. Springer (2009) Borgotallo, R., Pero, R.D., Messina, A., Negro, F., Vignaroli, L., Aroyo, L., Aart, C., Conconi, A.: Personalized semantic news: Combining semantics and television. In: International Conference on User Centric Media, pp. 137–140. Springer (2009)
Metadata
Title
Data-driven personalisation of television content: a survey
Authors
Lyndon Nixon
Jeremy Foss
Konstantinos Apostolidis
Vasileios Mezaris
Publication date
23-04-2022
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 6/2022
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-022-00926-6

Other articles of this Issue 6/2022

Multimedia Systems 6/2022 Go to the issue