Skip to main content
Erschienen in: Multimedia Systems 6/2022

23.04.2022 | Special Issue Article

Data-driven personalisation of television content: a survey

verfasst von: Lyndon Nixon, Jeremy Foss, Konstantinos Apostolidis, Vasileios Mezaris

Erschienen in: Multimedia Systems | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This survey considers the vision of TV broadcasting where content is personalised and personalisation is data-driven, looks at the AI and data technologies making this possible and surveys the current uptake and usage of those technologies. We examine the current state-of-the-art in standards and best practices for data-driven technologies and identify remaining limitations and gaps for research and innovation. Our hope is that this survey provides an overview of the current state of AI and data-driven technologies for use within broadcasters and media organisations. It also provides a pathway to the needed research and innovation activities to fulfil the vision of data-driven personalisation of TV content.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
1.
Zurück zum Zitat Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014) Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014)
2.
Zurück zum Zitat Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: 2008 15th IEEE International Conference on Image Processing, pp. 45–48. IEEE (2008) Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: 2008 15th IEEE International Conference on Image Processing, pp. 45–48. IEEE (2008)
3.
Zurück zum Zitat Xiao, Z.-M., Lin, K.-H., Zhou, C.-l., Lin, Q.: Shot segmentation based on HSV color model. J. Xiamen Univ. (Natural Science) 5 (2008) Xiao, Z.-M., Lin, K.-H., Zhou, C.-l., Lin, Q.: Shot segmentation based on HSV color model. J. Xiamen Univ. (Natural Science) 5 (2008)
4.
Zurück zum Zitat Küçüktunç, O., Güdükbay, U., Ulusoy, Ö.: Fuzzy color histogram-based video segmentation. Comput. Vis. Image Underst. 114(1), 125–134 (2010) Küçüktunç, O., Güdükbay, U., Ulusoy, Ö.: Fuzzy color histogram-based video segmentation. Comput. Vis. Image Underst. 114(1), 125–134 (2010)
5.
Zurück zum Zitat Baber, J., Afzulpurkar, N., Dailey, M.N., Bakhtyar, M.: Shot boundary detection from videos using entropy and local descriptor. In: 2011 17th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE (2011) Baber, J., Afzulpurkar, N., Dailey, M.N., Bakhtyar, M.: Shot boundary detection from videos using entropy and local descriptor. In: 2011 17th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE (2011)
6.
Zurück zum Zitat e Santos, A.C.S., Pedrini, H.: Shot boundary detection for video temporal segmentation based on the weber local descriptor. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1310–1315. IEEE (2017) e Santos, A.C.S., Pedrini, H.: Shot boundary detection for video temporal segmentation based on the weber local descriptor. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1310–1315. IEEE (2017)
7.
Zurück zum Zitat Hassanien, A., Elgharib, M., Selim, A., Bae, S.-H., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017). arXiv:1705.03281 Hassanien, A., Elgharib, M., Selim, A., Bae, S.-H., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017). arXiv:​1705.​03281
8.
Zurück zum Zitat Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122. IEEE (2018) Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122. IEEE (2018)
10.
Zurück zum Zitat Souček, T., Lokoč, J.: Transnet v2: an effective deep network architecture for fast shot transition detection (2020). arXiv:2008.04838 Souček, T., Lokoč, J.: Transnet v2: an effective deep network architecture for fast shot transition detection (2020). arXiv:​2008.​04838
11.
Zurück zum Zitat Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: In Proceedings of the 27th ACM International Conference on Multimedia (MM’19), October 21–25, 2019, Nice, France, pp. 1–9 (2019). https://doi.org/10.1145/3343031.3351046 Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: In Proceedings of the 27th ACM International Conference on Multimedia (MM’19), October 21–25, 2019, Nice, France, pp. 1–9 (2019). https://​doi.​org/​10.​1145/​3343031.​3351046
12.
Zurück zum Zitat Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019) Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019)
13.
Zurück zum Zitat Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W.: Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, pp. 577–592 (2018). Springer Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W.: Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, pp. 577–592 (2018). Springer
14.
Zurück zum Zitat Gushchin, A., Antsiferova, A., Vatolin, D.: Shot boundary detection method based on a new extensive dataset and mixed features (2021). arXiv:2109.01057 Gushchin, A., Antsiferova, A., Vatolin, D.: Shot boundary detection method based on a new extensive dataset and mixed features (2021). arXiv:​2109.​01057
15.
Zurück zum Zitat Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011) Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)
16.
Zurück zum Zitat Kishi, R.M., Trojahn, T.H., Goularte, R.: Correlation based feature fusion for the temporal video scene segmentation task. Multimed. Tools Appl. 78(11), 15623–15646 (2019) Kishi, R.M., Trojahn, T.H., Goularte, R.: Correlation based feature fusion for the temporal video scene segmentation task. Multimed. Tools Appl. 78(11), 15623–15646 (2019)
17.
Zurück zum Zitat Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1199–1202 (2015) Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1199–1202 (2015)
18.
Zurück zum Zitat Rotman, D., Porat, D., Ashour, G., Barzelay, U.: Optimally grouped deep features using normalized cost for video scene detection. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 187–195 (2018) Rotman, D., Porat, D., Ashour, G., Barzelay, U.: Optimally grouped deep features using normalized cost for video scene detection. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 187–195 (2018)
19.
Zurück zum Zitat Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: International Conference on Multimedia Modeling, pp. 29–41 (2018). Springer Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: International Conference on Multimedia Modeling, pp. 29–41 (2018). Springer
20.
Zurück zum Zitat Peleshko, D., Soroka, K.: Research of usage of haar-like features and AdaBoost algorithm in viola-jones method of object detection. In: 2013 12th International Conference on the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 284–286. IEEE (2013) Peleshko, D., Soroka, K.: Research of usage of haar-like features and AdaBoost algorithm in viola-jones method of object detection. In: 2013 12th International Conference on the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 284–286. IEEE (2013)
21.
Zurück zum Zitat Nguyen, T., Park, E.-A., Han, J., Park, D.-C., Min, S.-Y.: Object detection using scale invariant feature transform. In: Pan, J.-S., Krömer, P., Snášel, V. (eds.) Genetic and Evolutionary Computing, pp. 65–72. Springer, Cham (2014) Nguyen, T., Park, E.-A., Han, J., Park, D.-C., Min, S.-Y.: Object detection using scale invariant feature transform. In: Pan, J.-S., Krömer, P., Snášel, V. (eds.) Genetic and Evolutionary Computing, pp. 65–72. Springer, Cham (2014)
22.
Zurück zum Zitat Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 297–302. IEEE (2008) Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 297–302. IEEE (2008)
23.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
24.
Zurück zum Zitat Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
25.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016) Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
26.
Zurück zum Zitat Pramanik, A., Pal, S.K., Maiti, J., Mitra, P.: Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. (2021) Pramanik, A., Pal, S.K., Maiti, J., Mitra, P.: Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. (2021)
27.
Zurück zum Zitat Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences, vol. 1, pp. 186–189. Citeseer (2000) Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences, vol. 1, pp. 186–189. Citeseer (2000)
28.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
29.
Zurück zum Zitat Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
31.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer, Cham (2016) Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer, Cham (2016)
32.
Zurück zum Zitat Sanchez, S., Romero, H., Morales, A.: A review: comparison of performance metrics of pretrained models for object detection using the tensorflow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024. IOP Publishing (2020) Sanchez, S., Romero, H., Morales, A.: A review: comparison of performance metrics of pretrained models for object detection using the tensorflow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024. IOP Publishing (2020)
33.
Zurück zum Zitat Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
34.
Zurück zum Zitat Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019) Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
35.
Zurück zum Zitat Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020) Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
36.
Zurück zum Zitat Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
37.
38.
Zurück zum Zitat Wang, C.-Y., Yeh, I.-H., Liao, H.-Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv:2105.04206 Wang, C.-Y., Yeh, I.-H., Liao, H.-Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv:​2105.​04206
39.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
40.
Zurück zum Zitat Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015) Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
41.
Zurück zum Zitat Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016) Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
42.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
43.
Zurück zum Zitat Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 173–190. Springer (2020) Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 173–190. Springer (2020)
44.
Zurück zum Zitat Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: SeMask: semantically masked transformers for semantic segmentation (2021). arXiv:2112.12782 Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: SeMask: semantically masked transformers for semantic segmentation (2021). arXiv:​2112.​12782
45.
Zurück zum Zitat Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: scaling up capacity and resolution (2021). arXiv:2111.09883 Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: scaling up capacity and resolution (2021). arXiv:​2111.​09883
46.
Zurück zum Zitat Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020) Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020)
47.
Zurück zum Zitat Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimed. Tools Appl. 71(1), 333–347 (2014) Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimed. Tools Appl. 71(1), 333–347 (2014)
48.
Zurück zum Zitat Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. SIViP 13(7), 1413–1420 (2019) Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. SIViP 13(7), 1413–1420 (2019)
49.
Zurück zum Zitat Gkalelis, N., Mezaris, V.: Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification. In: International Conference on Multimedia Modeling, pp. 227–238. Springer (2020) Gkalelis, N., Mezaris, V.: Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification. In: International Conference on Multimedia Modeling, pp. 227–238. Springer (2020)
50.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
51.
Zurück zum Zitat Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 373–378. IEEE (2017) Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 373–378. IEEE (2017)
52.
Zurück zum Zitat Shamsolmoali, P., Jain, D.K., Zareapoor, M., Yang, J., Alam, M.A.: High-dimensional multimedia classification using deep cnn and extended residual units. Multimed. Tools Appl. 78(17), 23867–23882 (2019) Shamsolmoali, P., Jain, D.K., Zareapoor, M., Yang, J., Alam, M.A.: High-dimensional multimedia classification using deep cnn and extended residual units. Multimed. Tools Appl. 78(17), 23867–23882 (2019)
53.
Zurück zum Zitat Dai, X., Yin, H., Jha, N.K.: Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing (2020) Dai, X., Yin, H., Jha, N.K.: Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing (2020)
54.
Zurück zum Zitat Gkalelis, N., Mezaris, V.: Structured pruning of lstms via Eigen analysis and geometric median for mobile multimedia and deep learning applications. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 122–126. IEEE (2020) Gkalelis, N., Mezaris, V.: Structured pruning of lstms via Eigen analysis and geometric median for mobile multimedia and deep learning applications. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 122–126. IEEE (2020)
55.
Zurück zum Zitat Chiodino, E., Di Luccio, D., Lieto, A., Messina, A., Pozzato, G.L., Rubinetti, D.: A knowledge-based system for the dynamic generation and classification of novel contents in multimedia broadcasting. In: ECAI 2020, pp. 680–687 (2020) Chiodino, E., Di Luccio, D., Lieto, A., Messina, A., Pozzato, G.L., Rubinetti, D.: A knowledge-based system for the dynamic generation and classification of novel contents in multimedia broadcasting. In: ECAI 2020, pp. 680–687 (2020)
56.
Zurück zum Zitat Doulaty, M., Saz-Torralba, O., Ng, R.W.M., Hain, T.: Automatic genre and show identification of broadcast media. In: INTERSPEECH (2016) Doulaty, M., Saz-Torralba, O., Ng, R.W.M., Hain, T.: Automatic genre and show identification of broadcast media. In: INTERSPEECH (2016)
57.
Zurück zum Zitat Yadav, A., Vishwakarma, D.K.: A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96, 106624 (2020) Yadav, A., Vishwakarma, D.K.: A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96, 106624 (2020)
58.
Zurück zum Zitat Mills, T.J., Pye, D., Hollinghurst, N.J., Wood, K.R.: AT_TV: broadcast television and radio retrieval. In: RIAO, pp. 1135–1144 (2000) Mills, T.J., Pye, D., Hollinghurst, N.J., Wood, K.R.: AT_TV: broadcast television and radio retrieval. In: RIAO, pp. 1135–1144 (2000)
59.
Zurück zum Zitat Smeaton, A.F., Over, P., Kraaij, W.: High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia Content Analysis, pp. 1–24 (2009) Smeaton, A.F., Over, P., Kraaij, W.: High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia Content Analysis, pp. 1–24 (2009)
60.
Zurück zum Zitat Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: International Conference on Multimedia Modeling, pp. 616–621. Springer (2019) Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: International Conference on Multimedia Modeling, pp. 616–621. Springer (2019)
63.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
64.
Zurück zum Zitat Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018) Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018)
65.
Zurück zum Zitat Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy: Fixefficientnet (2020). arXiv:2003.08237 Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy: Fixefficientnet (2020). arXiv:​2003.​08237
66.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
67.
Zurück zum Zitat Gkalelis, N., Goulas, A., Galanopoulos, D., Mezaris, V.: Objectgraphs: using objects and a graph convolutional network for the bottom-up recognition and explanation of events in video. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3370–3378 (2021). https://doi.org/10.1109/CVPRW53098.2021.00376 Gkalelis, N., Goulas, A., Galanopoulos, D., Mezaris, V.: Objectgraphs: using objects and a graph convolutional network for the bottom-up recognition and explanation of events in video. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3370–3378 (2021). https://​doi.​org/​10.​1109/​CVPRW53098.​2021.​00376
68.
Zurück zum Zitat Pouyanfar, S., Chen, S.-C.: Semantic event detection using ensemble deep learning. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 203–208. IEEE (2016) Pouyanfar, S., Chen, S.-C.: Semantic event detection using ensemble deep learning. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 203–208. IEEE (2016)
69.
Zurück zum Zitat Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., Wegrzyn-Wolska, K.: Survey on AI-based multimodal methods for emotion detection (2019) Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., Wegrzyn-Wolska, K.: Survey on AI-based multimodal methods for emotion detection (2019)
70.
Zurück zum Zitat Kwak, C.-U., Son, J.-W., Lee, A., Kim, S.-J.: Scene emotion detection using closed caption based on hierarchical attention network. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1206–1208. IEEE (2017) Kwak, C.-U., Son, J.-W., Lee, A., Kim, S.-J.: Scene emotion detection using closed caption based on hierarchical attention network. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1206–1208. IEEE (2017)
71.
Zurück zum Zitat Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015) Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)
72.
Zurück zum Zitat Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10(1), 60–75 (2017) Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10(1), 60–75 (2017)
73.
Zurück zum Zitat Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., Van de Walle, R.: An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 409–412 (2016) Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., Van de Walle, R.: An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 409–412 (2016)
75.
Zurück zum Zitat Sharma, D.P., Atkins, J.: Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng. 7(4), 220–234 (2014) Sharma, D.P., Atkins, J.: Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng. 7(4), 220–234 (2014)
76.
Zurück zum Zitat Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio Speech Process. 2021(1), 1–10 (2021) Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio Speech Process. 2021(1), 1–10 (2021)
77.
Zurück zum Zitat Nixon, L., Mezaris, V., Thomsen, J.: Seamlessly interlinking tv and web content to enable linked television. In: ACM Int. Conf. on Interactive Experiences for Television and Online Video (TVX 2014), Adjunct Proceedings, Newcastle Upon Tyne, p. 21 (2014) Nixon, L., Mezaris, V., Thomsen, J.: Seamlessly interlinking tv and web content to enable linked television. In: ACM Int. Conf. on Interactive Experiences for Television and Online Video (TVX 2014), Adjunct Proceedings, Newcastle Upon Tyne, p. 21 (2014)
78.
Zurück zum Zitat Liu, A.H., Jin, S., Lai, C.-I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning (2021). arXiv:2106.05438 Liu, A.H., Jin, S., Lai, C.-I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning (2021). arXiv:​2106.​05438
80.
Zurück zum Zitat Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 17(1s), 1–25 (2021) Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 17(1s), 1–25 (2021)
81.
Zurück zum Zitat Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for video-text retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1114–1124 (2021) Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for video-text retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1114–1124 (2021)
83.
Zurück zum Zitat Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia (2019) Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
84.
85.
Zurück zum Zitat Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 336–340 (2020) Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 336–340 (2020)
86.
Zurück zum Zitat Dong, J., Li, X., Xu, C., Ji, S., He, Y., Yang, G., Wang, X.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9346–9355 (2019) Dong, J., Li, X., Xu, C., Ji, S., He, Y., Yang, G., Wang, X.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9346–9355 (2019)
87.
Zurück zum Zitat Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7464–7473 (2019) Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7464–7473 (2019)
89.
Zurück zum Zitat Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: HERO: hierarchical encoder for video+ language omni-representation pre-training. In: EMNLP (2020) Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: HERO: hierarchical encoder for video+ language omni-representation pre-training. In: EMNLP (2020)
90.
Zurück zum Zitat Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T.L., Bansal, M., Liu, J.: Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7331–7341 (2021) Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T.L., Bansal, M., Liu, J.: Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7331–7341 (2021)
91.
Zurück zum Zitat Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer (2019). arXiv:1906.05743 Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer (2019). arXiv:​1906.​05743
92.
Zurück zum Zitat Zhu, L., Yang, Y.: Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020) Zhu, L., Yang, Y.: Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020)
93.
Zurück zum Zitat Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Li, J., Bharti, T., Zhou, M.: UniVL: a unified video and language pre-training model for multimodal understanding and generation (2020). arXiv:2002.06353 Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Li, J., Bharti, T., Zhou, M.: UniVL: a unified video and language pre-training model for multimodal understanding and generation (2020). arXiv:​2002.​06353
94.
Zurück zum Zitat Gao, Z., Liu, J., Chen, S., Chang, D., Zhang, H., Yuan, J.: CLIP2TV: an empirical study on transformer-based methods for video-text retrieval (2021). arXiv:2111.05610 Gao, Z., Liu, J., Chen, S., Chang, D., Zhang, H., Yuan, J.: CLIP2TV: an empirical study on transformer-based methods for video-text retrieval (2021). arXiv:​2111.​05610
95.
Zurück zum Zitat Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5288–5296 (2016) Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5288–5296 (2016)
112.
Zurück zum Zitat Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 358–374. Springer, Cham (2018) Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 358–374. Springer, Cham (2018)
113.
Zurück zum Zitat Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) Computer Vision—ACCV 2018 Workshops, pp. 39–54. Springer, Cham (2019) Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) Computer Vision—ACCV 2018 Workshops, pp. 39–54. Springer, Cham (2019)
114.
Zurück zum Zitat Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: The 13th Asian Conference on Computer Vision (ACCV’16) (2016) Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: The 13th Asian Conference on Computer Vision (ACCV’16) (2016)
115.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
117.
Zurück zum Zitat Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 766–782. Springer, Cham (2016) Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 766–782. Springer, Cham (2016)
119.
120.
122.
Zurück zum Zitat Zhao, B., Li, X., Lu, X.: HSA-RNN: Hierarchical structure-adaptive rnn for video summarization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. CVPR ’18 (2018) Zhao, B., Li, X., Lu, X.: HSA-RNN: Hierarchical structure-adaptive rnn for video summarization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. CVPR ’18 (2018)
123.
Zurück zum Zitat Zhang, Y., Kampffmeyer, M., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Dtr-gan: Dilated temporal relational adversarial network for video summarization (2018). arXiv:1804.11228 [CoRR/abs] Zhang, Y., Kampffmeyer, M., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Dtr-gan: Dilated temporal relational adversarial network for video summarization (2018). arXiv:​1804.​11228 [CoRR/abs]
124.
Zurück zum Zitat Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Ac-sum-gan: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans. Circuits Syst. Video Technol. (2020) Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Ac-sum-gan: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans. Circuits Syst. Video Technol. (2020)
125.
Zurück zum Zitat Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019) Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019)
126.
Zurück zum Zitat Jung, Y., Cho, D., Woo, S., Kweon, I.S.: Global-and-local relative position embedding for unsupervised video summarization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XXV 16, pp. 167–183 (2020). Springer Jung, Y., Cho, D., Woo, S., Kweon, I.S.: Global-and-local relative position embedding for unsupervised video summarization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XXV 16, pp. 167–183 (2020). Springer
127.
Zurück zum Zitat Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on Multimedia Modeling, pp. 492–504 (2020). Springer Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on Multimedia Modeling, pp. 492–504 (2020). Springer
128.
Zurück zum Zitat Apostolidis, E., Metsai, A.I., Adamantidou, E., Mezaris, V., Patras, I.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–25 (2019) Apostolidis, E., Metsai, A.I., Adamantidou, E., Mezaris, V., Patras, I.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–25 (2019)
129.
Zurück zum Zitat Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., Tan, T.: Stacked memory network for video summarization. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844 (2019) Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., Tan, T.: Stacked memory network for video summarization. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844 (2019)
130.
Zurück zum Zitat Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Asian Conference on Computer Vision, pp. 39–54 (2018). Springer Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Asian Conference on Computer Vision, pp. 39–54 (2018). Springer
131.
Zurück zum Zitat Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., Wang, Y.-C.F.: Learning hierarchical self-attention for video summarization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3377–3381 (2019). IEEE Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., Wang, Y.-C.F.: Learning hierarchical self-attention for video summarization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3377–3381 (2019). IEEE
132.
Zurück zum Zitat Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., Shao, L.: Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recogn. 111, 107677 (2021) Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., Shao, L.: Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recogn. 111, 107677 (2021)
133.
Zurück zum Zitat Ji, Z., Jiao, F., Pang, Y., Shao, L.: Deep attentive and semantic preserving video summarization. Neurocomputing 405, 200–207 (2020) Ji, Z., Jiao, F., Pang, Y., Shao, L.: Deep attentive and semantic preserving video summarization. Neurocomputing 405, 200–207 (2020)
134.
Zurück zum Zitat Apostolidis, E., Balaouras, G., Mezaris, V., Patras, I.: Combining global and local attention with positional encoding for video summarization. In: 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234. IEEE (2021) Apostolidis, E., Balaouras, G., Mezaris, V., Patras, I.: Combining global and local attention with positional encoding for video summarization. In: 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234. IEEE (2021)
135.
Zurück zum Zitat Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 677–680 (2008) Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 677–680 (2008)
136.
Zurück zum Zitat Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K.: Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1258–1267 (2019) Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K.: Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1258–1267 (2019)
137.
Zurück zum Zitat Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework. In: 2005 IEEE International Conference on Multimedia and Expo, p. 4. IEEE (2005) Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework. In: 2005 IEEE International Conference on Multimedia and Expo, p. 4. IEEE (2005)
138.
Zurück zum Zitat Tang, H., Kwatra, V., Sargin, M.E., Gargi, U.: Detecting highlights in sports videos: cricket as a test case. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2011) Tang, H., Kwatra, V., Sargin, M.E., Gargi, U.: Detecting highlights in sports videos: cricket as a test case. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2011)
139.
Zurück zum Zitat Wang, J., Xu, C., Chng, E., Tian, Q.: Sports highlight detection from keyword sequences using HMM. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602. IEEE (2004) Wang, J., Xu, C., Chng, E., Tian, Q.: Sports highlight detection from keyword sequences using HMM. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602. IEEE (2004)
140.
Zurück zum Zitat Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for tv baseball programs. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 105–115 (2000) Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for tv baseball programs. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 105–115 (2000)
141.
Zurück zum Zitat Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: European Conference on Computer Vision, pp. 787–802. Springer (2014) Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: European Conference on Computer Vision, pp. 787–802. Springer (2014)
142.
Zurück zum Zitat Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-modal extraction of highlights from tv formula 1 programs. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 817–820. IEEE (2002) Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-modal extraction of highlights from tv formula 1 programs. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 817–820. IEEE (2002)
143.
Zurück zum Zitat Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 982–990 (2016) Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 982–990 (2016)
144.
Zurück zum Zitat Gygli, M., Song, Y., Cao, L.: Video2gif: automatic generation of animated gifs from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1001–1009 (2016) Gygli, M., Song, Y., Cao, L.: Video2gif: automatic generation of animated gifs from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1001–1009 (2016)
145.
Zurück zum Zitat Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimed. 20(10), 2693–2705 (2018) Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimed. 20(10), 2693–2705 (2018)
146.
Zurück zum Zitat Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014) Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014)
147.
Zurück zum Zitat Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4633–4641 (2015) Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4633–4641 (2015)
148.
Zurück zum Zitat Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017) Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017)
149.
Zurück zum Zitat Hong, F.-T., Huang, X., Li, W.-H., Zheng, W.-S.: Mini-net: multiple instance ranking network for video highlight detection. In: European Conference on Computer Vision, pp. 345–360. Springer (2020) Hong, F.-T., Huang, X., Li, W.-H., Zheng, W.-S.: Mini-net: multiple instance ranking network for video highlight detection. In: European Conference on Computer Vision, pp. 345–360. Springer (2020)
150.
Zurück zum Zitat Rochan, M., Reddy, M.K.K., Ye, L., Wang, Y.: Adaptive video highlight detection by learning from user history. In: European Conference on Computer Vision, pp. 261–278. Springer (2020) Rochan, M., Reddy, M.K.K., Ye, L., Wang, Y.: Adaptive video highlight detection by learning from user history. In: European Conference on Computer Vision, pp. 261–278. Springer (2020)
151.
Zurück zum Zitat Wu, L., Yang, Y., Chen, L., Lian, D., Hong, R., Wang, M.: Learning to transfer graph embeddings for inductive graph based recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1211–1220 (2020) Wu, L., Yang, Y., Chen, L., Lian, D., Hong, R., Wang, M.: Learning to transfer graph embeddings for inductive graph based recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1211–1220 (2020)
152.
Zurück zum Zitat Xu, M., Wang, H., Ni, B., Zhu, R., Sun, Z., Wang, C.: Cross-category video highlight detection via set-based learning (2021). arXiv:2108.11770 Xu, M., Wang, H., Ni, B., Zhu, R., Sun, Z., Wang, C.: Cross-category video highlight detection via set-based learning (2021). arXiv:​2108.​11770
153.
Zurück zum Zitat Mundnich, K., Fenster, A., Khare, A., Sundaram, S.: Audiovisual highlight detection in videos. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4155–4159. IEEE (2021) Mundnich, K., Fenster, A., Khare, A., Sundaram, S.: Audiovisual highlight detection in videos. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4155–4159. IEEE (2021)
154.
Zurück zum Zitat Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004) Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)
155.
Zurück zum Zitat Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution from undersampled color images. In: Computational Imaging II, vol. 5299, pp. 222–233. International Society for Optics and Photonics (2004) Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution from undersampled color images. In: Computational Imaging II, vol. 5299, pp. 222–233. International Society for Optics and Photonics (2004)
156.
Zurück zum Zitat Farsiu, S., Robinson, D.M., Elad, M., Milanfar, P.: Dynamic demosaicing and color superresolution of video sequences. In: Image Reconstruction from Incomplete Data III, vol. 5562, pp. 169–178. International Society for Optics and Photonics (2004) Farsiu, S., Robinson, D.M., Elad, M., Milanfar, P.: Dynamic demosaicing and color superresolution of video sequences. In: Image Reconstruction from Incomplete Data III, vol. 5562, pp. 169–178. International Society for Optics and Photonics (2004)
157.
Zurück zum Zitat Yang, C.-Y., Huang, J.-B., Yang, M.-H.: Exploiting self-similarities for single frame super-resolution. In: Asian Conference on Computer Vision, pp. 497–510. Springer (2010) Yang, C.-Y., Huang, J.-B., Yang, M.-H.: Exploiting self-similarities for single frame super-resolution. In: Asian Conference on Computer Vision, pp. 497–510. Springer (2010)
158.
Zurück zum Zitat Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002) Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)
159.
Zurück zum Zitat Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015) Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
160.
Zurück zum Zitat Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009) Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
161.
Zurück zum Zitat Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
162.
Zurück zum Zitat Rad, M.S., Bozorgtabar, B., Marti, U.-V., Basler, M., Ekenel, H.K., Thiran, J.-P.: Srobb: targeted perceptual loss for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2710–2719 (2019) Rad, M.S., Bozorgtabar, B., Marti, U.-V., Basler, M., Ekenel, H.K., Thiran, J.-P.: Srobb: targeted perceptual loss for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2710–2719 (2019)
163.
Zurück zum Zitat Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017) Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
164.
Zurück zum Zitat Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018) Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
165.
Zurück zum Zitat Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019) Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
167.
Zurück zum Zitat Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016) Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016)
168.
Zurück zum Zitat Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021) Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021)
169.
Zurück zum Zitat Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation (2021). arXiv:2106.15282 Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation (2021). arXiv:​2106.​15282
170.
Zurück zum Zitat Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement (2021). arXiv:2104.07636 Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement (2021). arXiv:​2104.​07636
171.
Zurück zum Zitat Chadha, A., Britto, J., Roja, M.M.: iseebetter: spatio-temporal video super-resolution using recurrent generative back-projection networks. Comput. Vis. Media 6(3), 307–317 (2020) Chadha, A., Britto, J., Roja, M.M.: iseebetter: spatio-temporal video super-resolution using recurrent generative back-projection networks. Comput. Vis. Media 6(3), 307–317 (2020)
172.
Zurück zum Zitat Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. In: Proceedings of the 31st British Machine Vision Conference (BMVC) (2020) Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. In: Proceedings of the 31st British Machine Vision Conference (BMVC) (2020)
173.
Zurück zum Zitat Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019) Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
174.
Zurück zum Zitat Rozumnyi, D., Oswald, M.R., Ferrari, V., Matas, J., Pollefeys, M.: DeFMO: deblurring and shape recovery of fast moving objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2021) Rozumnyi, D., Oswald, M.R., Ferrari, V., Matas, J., Pollefeys, M.: DeFMO: deblurring and shape recovery of fast moving objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2021)
175.
Zurück zum Zitat Liu, H., Ruan, Z., Zhao, P., Dong, C., Shang, F., Liu, Y., Yang, L.: Video super resolution based on deep learning: a comprehensive survey (2020). arXiv:2007.12928 Liu, H., Ruan, Z., Zhao, P., Dong, C., Shang, F., Liu, Y., Yang, L.: Video super resolution based on deep learning: a comprehensive survey (2020). arXiv:​2007.​12928
177.
Zurück zum Zitat Lee, H.-S., Bae, G., Cho, S.-I., Kim, Y.-H., Kang, S.: Smartgrid: video retargeting with spatiotemporal grid optimization. IEEE Access 7, 127564–127579 (2019) Lee, H.-S., Bae, G., Cho, S.-I., Kim, Y.-H., Kang, S.: Smartgrid: video retargeting with spatiotemporal grid optimization. IEEE Access 7, 127564–127579 (2019)
178.
Zurück zum Zitat Rachavarapu, K.-K., Kumar, M., Gandhi, V., Subramanian, R.: Watch to edit: video retargeting using gaze. In: Computer Graphics Forum, vol. 37, pp. 205–215. Wiley Online Library (2018) Rachavarapu, K.-K., Kumar, M., Gandhi, V., Subramanian, R.: Watch to edit: video retargeting using gaze. In: Computer Graphics Forum, vol. 37, pp. 205–215. Wiley Online Library (2018)
179.
Zurück zum Zitat Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2), 1–12 (2015) Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2), 1–12 (2015)
181.
Zurück zum Zitat Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 241–250 (2006) Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 241–250 (2006)
182.
Zurück zum Zitat Kaur, H., Kour, S., Sen, D.: Video retargeting through spatio-temporal seam carving using kalman filter. IET Image Proc. 13(11), 1862–1871 (2019) Kaur, H., Kour, S., Sen, D.: Video retargeting through spatio-temporal seam carving using kalman filter. IET Image Proc. 13(11), 1862–1871 (2019)
184.
Zurück zum Zitat Wang, Y.-S., Lin, H.-C., Sorkine, O., Lee, T.-Y.: Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 Papers, pp. 1–9 (2010) Wang, Y.-S., Lin, H.-C., Sorkine, O., Lee, T.-Y.: Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 Papers, pp. 1–9 (2010)
186.
Zurück zum Zitat Kiess, J., Guthier, B., Kopf, S., Effelsberg, W.: SeamCrop for image retargeting. In: Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI, vol. 8304, p. 83040. International Society for Optics and Photonics (2012) Kiess, J., Guthier, B., Kopf, S., Effelsberg, W.: SeamCrop for image retargeting. In: Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI, vol. 8304, p. 83040. International Society for Optics and Photonics (2012)
187.
Zurück zum Zitat Nam, S.-H., Ahn, W., Yu, I.-J., Kwon, M.-J., Son, M., Lee, H.-K.: Deep convolutional neural network for identifying seam-carving forgery. IEEE Trans. Circuits Syst. Video Technol. (2020) Nam, S.-H., Ahn, W., Yu, I.-J., Kwon, M.-J., Son, M., Lee, H.-K.: Deep convolutional neural network for identifying seam-carving forgery. IEEE Trans. Circuits Syst. Video Technol. (2020)
188.
Zurück zum Zitat Apostolidis, K., Mezaris, V.: A fast smart-cropping method and dataset for video retargeting. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2618–2622. IEEE (2021) Apostolidis, K., Mezaris, V.: A fast smart-cropping method and dataset for video retargeting. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2618–2622. IEEE (2021)
189.
Zurück zum Zitat Chou, Y.-C., Fang, C.-Y., Su, P.-C., Chien, Y.-C.: Content-based cropping using visual saliency and blur detection. In: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–6. IEEE (2017) Chou, Y.-C., Fang, C.-Y., Su, P.-C., Chien, Y.-C.: Content-based cropping using visual saliency and blur detection. In: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–6. IEEE (2017)
190.
Zurück zum Zitat Zhu, T., Zhang, D., Hu, Y., Wang, T., Jiang, X., Zhu, J., Li, J.: Horizontal-to-vertical video conversion. IEEE Trans. Multimed. (2021) Zhu, T., Zhang, D., Hu, Y., Wang, T., Jiang, X., Zhu, J., Li, J.: Horizontal-to-vertical video conversion. IEEE Trans. Multimed. (2021)
192.
Zurück zum Zitat Kim, E., Pyo, S., Park, E., Kim, M.: An automatic recommendation scheme of tv program contents for (ip) tv personalization. IEEE Trans. Broadcast. 57(3), 674–684 (2011) Kim, E., Pyo, S., Park, E., Kim, M.: An automatic recommendation scheme of tv program contents for (ip) tv personalization. IEEE Trans. Broadcast. 57(3), 674–684 (2011)
193.
Zurück zum Zitat Soares, M., Viana, P.: Tv recommendation and personalization systems: integrating broadcast and video on-demand services. Adv. Electr. Comput. Eng. 14(1), 115–120 (2014) Soares, M., Viana, P.: Tv recommendation and personalization systems: integrating broadcast and video on-demand services. Adv. Electr. Comput. Eng. 14(1), 115–120 (2014)
194.
Zurück zum Zitat Hsu, S.H., Wen, M.-H., Lin, H.-C., Lee, C.-C., Lee, C.-H.: Aimed-a personalized tv recommendation system. In: European Conference on Interactive Television, pp. 166–174. Springer (2007) Hsu, S.H., Wen, M.-H., Lin, H.-C., Lee, C.-C., Lee, C.-H.: Aimed-a personalized tv recommendation system. In: European Conference on Interactive Television, pp. 166–174. Springer (2007)
195.
Zurück zum Zitat Aharon, M., Hillel, E., Kagian, A., Lempel, R., Makabee, H., Nissim, R.: Watch-it-next: a contextual tv recommendation system. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–195. Springer (2015) Aharon, M., Hillel, E., Kagian, A., Lempel, R., Makabee, H., Nissim, R.: Watch-it-next: a contextual tv recommendation system. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–195. Springer (2015)
196.
Zurück zum Zitat Aroyo, L., Nixon, L., Miller, L.: NoTube: the television experience enhanced by online social and semantic data. In: 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), pp. 269–273. IEEE (2011) Aroyo, L., Nixon, L., Miller, L.: NoTube: the television experience enhanced by online social and semantic data. In: 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), pp. 269–273. IEEE (2011)
200.
Zurück zum Zitat Armstrong, M., Brooks, M., Churnside, A., Evans, M., Melchior, F., Shotton, M.: Object-based broadcasting-curation, responsiveness and user experience (2014) Armstrong, M., Brooks, M., Churnside, A., Evans, M., Melchior, F., Shotton, M.: Object-based broadcasting-curation, responsiveness and user experience (2014)
201.
Zurück zum Zitat Cox, J., Jones, R., Northwood, C., Tutcher, J., Robinson, B.: Object-based production: a personalised interactive cooking application. In: Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 79–80 (2017) Cox, J., Jones, R., Northwood, C., Tutcher, J., Robinson, B.: Object-based production: a personalised interactive cooking application. In: Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 79–80 (2017)
202.
Zurück zum Zitat Ursu, M., Smith, D., Hook, J., Concannon, S., Gray, J.: Authoring interactive fictional stories in object-based media (OBM). In: ACM International Conference on Interactive Media Experiences, pp. 127–137 (2020) Ursu, M., Smith, D., Hook, J., Concannon, S., Gray, J.: Authoring interactive fictional stories in object-based media (OBM). In: ACM International Conference on Interactive Media Experiences, pp. 127–137 (2020)
203.
Zurück zum Zitat Silzle, A., Weitnauer, M., Warusfel, O., Bleisteiner, W., Herberger, T., Epain, N., Duval, B., Bogaards, N., Baume, C., Herzog, U., et al.: Orpheus audio project: piloting an end-to-end object-based audio broadcasting chain. In: IBC Conference, Amsterdam, September, pp. 14–18 (2017) Silzle, A., Weitnauer, M., Warusfel, O., Bleisteiner, W., Herberger, T., Epain, N., Duval, B., Bogaards, N., Baume, C., Herzog, U., et al.: Orpheus audio project: piloting an end-to-end object-based audio broadcasting chain. In: IBC Conference, Amsterdam, September, pp. 14–18 (2017)
204.
Zurück zum Zitat Chen, X., Nguyen, T.V., Shen, Z., Kankanhalli, M.: Livesense: contextual advertising in live streaming videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 392–400 (2019) Chen, X., Nguyen, T.V., Shen, Z., Kankanhalli, M.: Livesense: contextual advertising in live streaming videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 392–400 (2019)
205.
Zurück zum Zitat Akgul, T., Ozcan, S., Iplik, A.: A cloud-based end-to-end server-side dynamic ad insertion platform for live content. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 361–364 (2020) Akgul, T., Ozcan, S., Iplik, A.: A cloud-based end-to-end server-side dynamic ad insertion platform for live content. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 361–364 (2020)
206.
Zurück zum Zitat Carvalho, P., Pereira, A., Viana, P.: Automatic tv logo identification for advertisement detection without prior data. Appl. Sci. 11(16), 7494 (2021) Carvalho, P., Pereira, A., Viana, P.: Automatic tv logo identification for advertisement detection without prior data. Appl. Sci. 11(16), 7494 (2021)
207.
Zurück zum Zitat Park, S., Cho, K.: Framework for personalized broadcast notice based on contents metadata. In: Proceedings of the Korea Contents Association Conference, pp. 445–446. The Korea Contents Association (2014) Park, S., Cho, K.: Framework for personalized broadcast notice based on contents metadata. In: Proceedings of the Korea Contents Association Conference, pp. 445–446. The Korea Contents Association (2014)
208.
Zurück zum Zitat Hunter, J.: Adding multimedia to the semantic web: Building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working. SWWS’01, pp. 261–283. CEUR-WS.org, Aachen, DEU (2001) Hunter, J.: Adding multimedia to the semantic web: Building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working. SWWS’01, pp. 261–283. CEUR-WS.org, Aachen, DEU (2001)
209.
Zurück zum Zitat EBU-MIM: EBU-MIM semantic web activity report. Technical report, EBU-MIM (2015). Accessed 30 Sept 2021 EBU-MIM: EBU-MIM semantic web activity report. Technical report, EBU-MIM (2015). Accessed 30 Sept 2021
210.
Zurück zum Zitat Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, pp. 782–792 (2011) Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, pp. 782–792 (2011)
211.
Zurück zum Zitat Brasoveanu, A.M., Weichselbraun, A., Nixon, L.: In media res: a corpus for evaluating named entity linking with creative works. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 355–364 (2020) Brasoveanu, A.M., Weichselbraun, A., Nixon, L.: In media res: a corpus for evaluating named entity linking with creative works. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 355–364 (2020)
212.
Zurück zum Zitat Nixon, L., Troncy, R.: Survey of semantic media annotation tools for the web: towards new media applications with linked media. In: European Semantic Web Conference, pp. 100–114. Springer (2014) Nixon, L., Troncy, R.: Survey of semantic media annotation tools for the web: towards new media applications with linked media. In: European Semantic Web Conference, pp. 100–114. Springer (2014)
213.
Zurück zum Zitat Collyda, C., Apostolidis, K., Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V.: A web service for video summarization. In: ACM International Conference on Interactive Media Experiences, pp. 148–153 (2020) Collyda, C., Apostolidis, K., Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V.: A web service for video summarization. In: ACM International Conference on Interactive Media Experiences, pp. 148–153 (2020)
217.
Zurück zum Zitat Armstrong, M.: Object-based media: a toolkit for building responsive content. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference 32, pp. 1–2 (2018) Armstrong, M.: Object-based media: a toolkit for building responsive content. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference 32, pp. 1–2 (2018)
218.
Zurück zum Zitat Cox, J., Brooks, M., Forrester, I., Armstrong, M.: Moving object-based media production from one-off examples to scalable workflows. SMPTE Motion Imaging J. 127(4), 32–37 (2018) Cox, J., Brooks, M., Forrester, I., Armstrong, M.: Moving object-based media production from one-off examples to scalable workflows. SMPTE Motion Imaging J. 127(4), 32–37 (2018)
219.
Zurück zum Zitat Carter, J., Ramdhany, R., Lomas, M., Pearce, T., Shephard, J., Sparks, M.: Universal access for object-based media experiences. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 382–385 (2020) Carter, J., Ramdhany, R., Lomas, M., Pearce, T., Shephard, J., Sparks, M.: Universal access for object-based media experiences. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 382–385 (2020)
220.
Zurück zum Zitat Zwicklbauer, M., Lamm, W., Gordon, M., Apostolidis, K., Philipp, B., Mezaris, V.: Video analysis for interactive story creation: the sandmännchen showcase. In: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–24 (2020) Zwicklbauer, M., Lamm, W., Gordon, M., Apostolidis, K., Philipp, B., Mezaris, V.: Video analysis for interactive story creation: the sandmännchen showcase. In: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–24 (2020)
221.
Zurück zum Zitat Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies, pp. 501–510. Springer, Cham (2018) Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies, pp. 501–510. Springer, Cham (2018)
222.
Zurück zum Zitat Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J.: Product placement platform for personalised advertising. New European Media (NEM) Summit 2016 (2016) Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J.: Product placement platform for personalised advertising. New European Media (NEM) Summit 2016 (2016)
223.
Zurück zum Zitat Malheiro, B., Foss, J., Burguillo, J.: B2B platform for media content personalisation. In: B2B Platform for Media Content Personalisation (2013) Malheiro, B., Foss, J., Burguillo, J.: B2B platform for media content personalisation. In: B2B Platform for Media Content Personalisation (2013)
225.
Zurück zum Zitat Stewart, S.: Video game industry silently taking over entertainment world. Verfügbar unter ejinsight. com/eji/article/id/2280405/20191022 (2019) Stewart, S.: Video game industry silently taking over entertainment world. Verfügbar unter ejinsight. com/eji/article/id/2280405/20191022 (2019)
226.
Zurück zum Zitat Witkowski, W.: Videogames are a bigger industry than movies and north American sports combined, thanks to the pandemic. MarketWatch (2020) Witkowski, W.: Videogames are a bigger industry than movies and north American sports combined, thanks to the pandemic. MarketWatch (2020)
227.
Zurück zum Zitat Ward, L., Paradis, M., Shirley, B., Russon, L., Moore, R., Davies, R.: Casualty accessible and enhanced (A&E) audio: trialling object-based accessible tv audio. In: Audio Engineering Society Convention, p. 147. Audio Engineering Society (2019) Ward, L., Paradis, M., Shirley, B., Russon, L., Moore, R., Davies, R.: Casualty accessible and enhanced (A&E) audio: trialling object-based accessible tv audio. In: Audio Engineering Society Convention, p. 147. Audio Engineering Society (2019)
228.
Zurück zum Zitat Montagud, M., Núñez, J.A., Karavellas, T., Jurado, I., Fernández, S.: Convergence between tv and vr: enabling truly immersive and social experiences. In: Workshop on Virtual Reality, Co-located with ACM TVX 2018 (2018) Montagud, M., Núñez, J.A., Karavellas, T., Jurado, I., Fernández, S.: Convergence between tv and vr: enabling truly immersive and social experiences. In: Workshop on Virtual Reality, Co-located with ACM TVX 2018 (2018)
229.
Zurück zum Zitat Kudumakis, P., Wilmering, T., Sandler, M., Foss, J.: MPEG IPR ontologies for media trading and personalization. In: International Workshop on Data-Driven Personalization of Television (DataTV2019), ACM International Conference on Interactive Experiences for Television and Online Video (TVX2019) (2019) Kudumakis, P., Wilmering, T., Sandler, M., Foss, J.: MPEG IPR ontologies for media trading and personalization. In: International Workshop on Data-Driven Personalization of Television (DataTV2019), ACM International Conference on Interactive Experiences for Television and Online Video (TVX2019) (2019)
231.
Zurück zum Zitat ISO/IEC.: Information technology—multimedia framework (MPEG-21)—part 19: Media value chain ontology/amd 1 extensions on time-segments and multi-track audio’. Standard, International Organization for Standardization (2018). Accessed 30 Sept 2021 ISO/IEC.: Information technology—multimedia framework (MPEG-21)—part 19: Media value chain ontology/amd 1 extensions on time-segments and multi-track audio’. Standard, International Organization for Standardization (2018). Accessed 30 Sept 2021
232.
Zurück zum Zitat ISO/IEC.: Information technology—multimedia framework (MPEG-21)—media contract ontology. standard, International Organization for Standardization (2017). Accessed 30 Sept 2021 ISO/IEC.: Information technology—multimedia framework (MPEG-21)—media contract ontology. standard, International Organization for Standardization (2017). Accessed 30 Sept 2021
237.
Zurück zum Zitat ISO/IE.: MPEG-7, part 1 et seq. standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IE.: MPEG-7, part 1 et seq. standard, International Organization for Standardization. Accessed 30 Sept 2021
238.
Zurück zum Zitat Chang, S.-F., Sikora, T., Purl, A.: Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst. Video Technol. 11(6), 688–695 (2001) Chang, S.-F., Sikora, T., Purl, A.: Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst. Video Technol. 11(6), 688–695 (2001)
239.
Zurück zum Zitat ISO/IEC.: Introduction to MPEG-7, coding of moving pictures and audio. Standard, International Organization for Standardization (March 2001). Accessed 30 Sept 2021 ISO/IEC.: Introduction to MPEG-7, coding of moving pictures and audio. Standard, International Organization for Standardization (March 2001). Accessed 30 Sept 2021
240.
Zurück zum Zitat ISO/IEC.: MPEG-I: Scene description for MPEG media, MPEG group, MPEG-I part 14. Standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IEC.: MPEG-I: Scene description for MPEG media, MPEG group, MPEG-I part 14. Standard, International Organization for Standardization. Accessed 30 Sept 2021
241.
Zurück zum Zitat ISO/IEC.: Coded representation of immersive media– part 14: scene description for mpeg media, ISO. Standard, International Organization for Standardization. Accessed 30 Sept 2021 ISO/IEC.: Coded representation of immersive media– part 14: scene description for mpeg media, ISO. Standard, International Organization for Standardization. Accessed 30 Sept 2021
242.
Zurück zum Zitat Group, M.: MPEG group, coded representation of immersive media. standard, MPEG standards (2020). Accessed 30 Sept 2021 Group, M.: MPEG group, coded representation of immersive media. standard, MPEG standards (2020). Accessed 30 Sept 2021
243.
Zurück zum Zitat Group, M.: MPEG-I: Versatile video coding, MPEG-I part 3, MPEG group. Standard, MPEG standards. Accessed 30 Sept 2021 Group, M.: MPEG-I: Versatile video coding, MPEG-I part 3, MPEG group. Standard, MPEG standards. Accessed 30 Sept 2021
244.
Zurück zum Zitat Wieckowski, A., Ma, J., Schwarz, H., Marpe, D., Wiegand, T.: Fast partitioning decision strategies for the upcoming versatile video coding (VVC) standard. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4130–4134. IEEE (2019) Wieckowski, A., Ma, J., Schwarz, H., Marpe, D., Wiegand, T.: Fast partitioning decision strategies for the upcoming versatile video coding (VVC) standard. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4130–4134. IEEE (2019)
254.
Zurück zum Zitat EBU.: EBU tech 3351–ccdm. Technical report, EBU (August 2020). Accessed 30 Sept 2021 EBU.: EBU tech 3351–ccdm. Technical report, EBU (August 2020). Accessed 30 Sept 2021
258.
Zurück zum Zitat ISO/IEC.: Information technology–multimedia framework (MPEG-21)–contract expression language. Standard, International Organization for Standardization (2016). Accessed 30 Sept 2021 ISO/IEC.: Information technology–multimedia framework (MPEG-21)–contract expression language. Standard, International Organization for Standardization (2016). Accessed 30 Sept 2021
259.
Zurück zum Zitat Rodríguez-Doncel, V.: Overview of the mpeg-21 media contract ontology. In: Overview of the MPEG-21 Media Contract Ontology (2016) Rodríguez-Doncel, V.: Overview of the mpeg-21 media contract ontology. In: Overview of the MPEG-21 Media Contract Ontology (2016)
263.
Zurück zum Zitat Shou, M.Z., Ghadiyaram, D., Wang, W., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation (2021). arXiv:2101.10511 [CoRR abs] Shou, M.Z., Ghadiyaram, D., Wang, W., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation (2021). arXiv:​2101.​10511 [CoRR abs]
264.
Zurück zum Zitat Krishna, M.V., Bodesheim, P., Körner, M., Denzler, J.: Temporal video segmentation by event detection: a novelty detection approach. Pattern Recogn. Image Anal. 24(2), 243–255 (2014) Krishna, M.V., Bodesheim, P., Körner, M., Denzler, J.: Temporal video segmentation by event detection: a novelty detection approach. Pattern Recogn. Image Anal. 24(2), 243–255 (2014)
265.
Zurück zum Zitat Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017) Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)
266.
Zurück zum Zitat Shou, M.Z., Lei, S.W., Wang, W., Ghadiyaram, D., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8075–8084 (2021) Shou, M.Z., Lei, S.W., Wang, W., Ghadiyaram, D., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8075–8084 (2021)
267.
Zurück zum Zitat Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi, K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: a dataset and benchmarks for holistic understanding of broadcast soccer videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4508–4519 (2021) Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi, K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: a dataset and benchmarks for holistic understanding of broadcast soccer videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4508–4519 (2021)
268.
Zurück zum Zitat Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2015) Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2015)
271.
Zurück zum Zitat Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
272.
Zurück zum Zitat Kaur, P., Pannu, H.S., Malhi, A.K.: Comparative analysis on cross-modal information retrieval: a review. Comput. Sci. Rev. 39, 100336 (2021) Kaur, P., Pannu, H.S., Malhi, A.K.: Comparative analysis on cross-modal information retrieval: a review. Comput. Sci. Rev. 39, 100336 (2021)
273.
Zurück zum Zitat Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015) Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
274.
Zurück zum Zitat Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4581–4591 (2019) Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4581–4591 (2019)
275.
Zurück zum Zitat Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv:1609.08675 Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv:​1609.​08675
276.
Zurück zum Zitat Rehman, S.U., Waqas, M., Tu, S., Koubaa, A., ur Rehman, O., Ahmad, J., Hanif, M., Han, Z.: Deep learning techniques for future intelligent cross-media retrieval. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems (2020) Rehman, S.U., Waqas, M., Tu, S., Koubaa, A., ur Rehman, O., Ahmad, J., Hanif, M., Han, Z.: Deep learning techniques for future intelligent cross-media retrieval. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems (2020)
277.
Zurück zum Zitat Tu, S., ur Rehman, S., Waqas, M., Rehman, O.u., Yang, Z., Ahmad, B., Halim, Z., Zhao, W.: Optimisation-based training of evolutionary convolution neural network for visual classification applications. IET Comput. Vis. 14(5), 259–267 (2020) Tu, S., ur Rehman, S., Waqas, M., Rehman, O.u., Yang, Z., Ahmad, B., Halim, Z., Zhao, W.: Optimisation-based training of evolutionary convolution neural network for visual classification applications. IET Comput. Vis. 14(5), 259–267 (2020)
278.
Zurück zum Zitat Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929 [CoRR abs] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:​2010.​11929 [CoRR abs]
279.
Zurück zum Zitat Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 34 (2021) Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 34 (2021)
280.
Zurück zum Zitat Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., Parisi, P., et al.: Unity perception: generate synthetic data for computer vision (2021). arXiv:2107.04259 Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., Parisi, P., et al.: Unity perception: generate synthetic data for computer vision (2021). arXiv:​2107.​04259
281.
Zurück zum Zitat Tan, C., Xu, X., Shen, F.: A survey of zero shot detection: methods and applications. Cogn. Robot. 1, 159–167 (2021) Tan, C., Xu, X., Shen, F.: A survey of zero shot detection: methods and applications. Cogn. Robot. 1, 159–167 (2021)
282.
Zurück zum Zitat Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–37 (2019) Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–37 (2019)
283.
Zurück zum Zitat Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021) Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021)
285.
Zurück zum Zitat Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.: Jointly localizing and describing events for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7492–7500 (2018) Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.: Jointly localizing and describing events for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7492–7500 (2018)
286.
Zurück zum Zitat Chen, S., Jiang, Y.-G.: Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8435 (2021) Chen, S., Jiang, Y.-G.: Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8435 (2021)
287.
Zurück zum Zitat Dong, C., Chen, X., Chen, A., Hu, F., Wang, Z., Li, X.: Multi-level visual representation with semantic-reinforced learning for video captioning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4750–4754 (2021) Dong, C., Chen, X., Chen, A., Hu, F., Wang, Z., Li, X.: Multi-level visual representation with semantic-reinforced learning for video captioning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4750–4754 (2021)
288.
Zurück zum Zitat Francis, D., Anh Nguyen, P., Huet, B., Ngo, C.-W.: Fusion of multimodal embeddings for ad-hoc video search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019) Francis, D., Anh Nguyen, P., Huet, B., Ngo, C.-W.: Fusion of multimodal embeddings for ad-hoc video search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019)
289.
Zurück zum Zitat Yaliniz, G., Ikizler-Cinbis, N.: Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimed. Tools Appl. 80(12), 17827–17847 (2021) Yaliniz, G., Ikizler-Cinbis, N.: Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimed. Tools Appl. 80(12), 17827–17847 (2021)
293.
Zurück zum Zitat Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2113–2122 (2021) Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2113–2122 (2021)
295.
Zurück zum Zitat Ignatov, A., Timofte, R., Denna, M., Younes, A.: Real-time quantized image super-resolution on mobile NPUs, mobile AI 2021 challenge: Report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2525–2534 (2021) Ignatov, A., Timofte, R., Denna, M., Younes, A.: Real-time quantized image super-resolution on mobile NPUs, mobile AI 2021 challenge: Report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2525–2534 (2021)
296.
Zurück zum Zitat Ignatov, A., Romero, A., Kim, H., Timofte, R.: Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2535–2544 (2021) Ignatov, A., Romero, A., Kim, H., Timofte, R.: Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2535–2544 (2021)
297.
Zurück zum Zitat Zang, T., Zhu, Y., Liu, H., Zhang, R., Yu, J.: A survey on cross-domain recommendation: taxonomies, methods, and future directions (2021). arXiv:2108.03357 [CoRR abs] Zang, T., Zhu, Y., Liu, H., Zhang, R., Yu, J.: A survey on cross-domain recommendation: taxonomies, methods, and future directions (2021). arXiv:​2108.​03357 [CoRR abs]
298.
Zurück zum Zitat Nixon, L., Ciesielski, K., Philipp, B.: AI for audience prediction and profiling to power innovative TV content recommendation services, pp. 42–48 (2019) Nixon, L., Ciesielski, K., Philipp, B.: AI for audience prediction and profiling to power innovative TV content recommendation services, pp. 42–48 (2019)
299.
Zurück zum Zitat Talu\(\breve{{\rm g}}\), D.Y.: User expectations on smart TV; an empiric study on user emotions towards smart TV. Turk. Online J. Design Art Commun. 11(2), 424–442 (2021) Talu\(\breve{{\rm g}}\), D.Y.: User expectations on smart TV; an empiric study on user emotions towards smart TV. Turk. Online J. Design Art Commun. 11(2), 424–442 (2021)
300.
Zurück zum Zitat Borgotallo, R., Pero, R.D., Messina, A., Negro, F., Vignaroli, L., Aroyo, L., Aart, C., Conconi, A.: Personalized semantic news: Combining semantics and television. In: International Conference on User Centric Media, pp. 137–140. Springer (2009) Borgotallo, R., Pero, R.D., Messina, A., Negro, F., Vignaroli, L., Aroyo, L., Aart, C., Conconi, A.: Personalized semantic news: Combining semantics and television. In: International Conference on User Centric Media, pp. 137–140. Springer (2009)
Metadaten
Titel
Data-driven personalisation of television content: a survey
verfasst von
Lyndon Nixon
Jeremy Foss
Konstantinos Apostolidis
Vasileios Mezaris
Publikationsdatum
23.04.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 6/2022
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-022-00926-6

Weitere Artikel der Ausgabe 6/2022

Multimedia Systems 6/2022 Zur Ausgabe