nach oben

Multimedia Systems

Erschienen in:

23.04.2022 | Special Issue Article

Data-driven personalisation of television content: a survey

verfasst von: Lyndon Nixon, Jeremy Foss, Konstantinos Apostolidis, Vasileios Mezaris

Erschienen in: Multimedia Systems | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This survey considers the vision of TV broadcasting where content is personalised and personalisation is data-driven, looks at the AI and data technologies making this possible and surveys the current uptake and usage of those technologies. We examine the current state-of-the-art in standards and best practices for data-driven technologies and identify remaining limitations and gaps for research and innovation. Our hope is that this survey provides an overview of the current state of AI and data-driven technologies for use within broadcasters and media organisations. It also provides a pathway to the needed research and innovation activities to fulfil the vision of data-driven personalisation of TV content.

Vorheriger Artikel Towards automatic placement of media objects in a personalised TV experience

Nächster Artikel Micro-expression recognition based on SqueezeNet and C3D

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://aimagelab.ing.unimore.it/imagelab/researchActivity.asp?idActivity=19.

https://paperswithcode.com/task/semantic-segmentation.

https://paperswithcode.com/sota/image-classification-on-imagenet, last accessed 1 Feb 2022.

https://www.theatlantic.com/technology/archive/2021/04/artificial-intelligence-misreading-human-emotion/618696/.

https://www.iti.gr/iti/projects/LinkedTV.html.

https://google.github.io/mediapipe/solutions/autoflip.

https://retv-project.eu/wp-content/uploads/2020/01/ReTV_D6.2_final.pdf.

https://retv-project.eu/portfolio-item/dynamiccontentinsertion/.

http://multimedia3.iti.gr/video_fragmentation/service/start.html.

http://multimedia2.iti.gr/onlinevideoanalysis_v5/service/start.html.

https://www.playment.io/.

https://wikidata.org.

https://www.dbpedia-spotlight.org/.

https://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dynamic_semantic.html.

https://www.infoq.com/presentations/bbc-data-platform-api/.

https://cloud.google.com/video-intelligence.

https://vi.microsoft.com/.

https://aws.amazon.com/rekognition/.

https://cloud.google.com/vertex-ai/docs/start/automl-model-types#video.

https://retv-project.eu/content-adaptation-publication-online/.

http://multimedia2.iti.gr/videosummarization/service/start.html.

https://www.vsn-tv.com/en/products/vsn-wedit/.

https://www.pixop.com/blog/super-resolution-in-broadcasting.

http://cognitus-h2020.eu/.

http://multimedia2.iti.gr/videosmartcropping/service/start.html.

https://www.bbc.co.uk/rd/object-based-media, accessed 24 January 2022.

https://www.bbc.co.uk/rd/blog/2019-04-object-based-media-click-interactivity-tv.

https://ai.facebook.com/research/data2vec-a-general-framework-for-self-supervised-learning-in-speech-vision-and-language.

https://blog.google/products/ads-commerce/2021-01-privacy-sandbox/.

Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6583–6587. IEEE (2014)

Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: 2008 15th IEEE International Conference on Image Processing, pp. 45–48. IEEE (2008)

Xiao, Z.-M., Lin, K.-H., Zhou, C.-l., Lin, Q.: Shot segmentation based on HSV color model. J. Xiamen Univ. (Natural Science) 5 (2008)

Küçüktunç, O., Güdükbay, U., Ulusoy, Ö.: Fuzzy color histogram-based video segmentation. Comput. Vis. Image Underst. 114(1), 125–134 (2010)

Baber, J., Afzulpurkar, N., Dailey, M.N., Bakhtyar, M.: Shot boundary detection from videos using entropy and local descriptor. In: 2011 17th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE (2011)

e Santos, A.C.S., Pedrini, H.: Shot boundary detection for video temporal segmentation based on the weber local descriptor. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1310–1315. IEEE (2017)

Hassanien, A., Elgharib, M., Selim, A., Bae, S.-H., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017). arXiv:1705.03281

Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122. IEEE (2018)

Gygli, M.: Ridiculously fast shot boundary detection with fully convolutional neural networks. In: 2018 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–4 (2018). https://doi.org/10.1109/CBMI.2018.8516556

10.

Souček, T., Lokoč, J.: Transnet v2: an effective deep network architecture for fast shot transition detection (2020). arXiv:2008.04838

11.

Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: In Proceedings of the 27th ACM International Conference on Multimedia (MM’19), October 21–25, 2019, Nice, France, pp. 1–9 (2019). https://doi.org/10.1145/3343031.3351046

12.

Lei, X., Pan, H., Huang, X.: A dilated CNN model for image classification. IEEE Access 7, 124087–124095 (2019)

13.

Tang, S., Feng, L., Kuang, Z., Chen, Y., Zhang, W.: Fast video shot transition localization with deep structured models. In: Asian Conference on Computer Vision, pp. 577–592 (2018). Springer

14.

Gushchin, A., Antsiferova, A., Vatolin, D.: Shot boundary detection method based on a new extensive dataset and mixed features (2021). arXiv:2109.01057

15.

Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)

16.

Kishi, R.M., Trojahn, T.H., Goularte, R.: Correlation based feature fusion for the temporal video scene segmentation task. Multimed. Tools Appl. 78(11), 15623–15646 (2019)

17.

Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1199–1202 (2015)

18.

Rotman, D., Porat, D., Ashour, G., Barzelay, U.: Optimally grouped deep features using normalized cost for video scene detection. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 187–195 (2018)

19.

Apostolidis, K., Apostolidis, E., Mezaris, V.: A motion-driven approach for fine-grained temporal segmentation of user-generated videos. In: International Conference on Multimedia Modeling, pp. 29–41 (2018). Springer

20.

Peleshko, D., Soroka, K.: Research of usage of haar-like features and AdaBoost algorithm in viola-jones method of object detection. In: 2013 12th International Conference on the Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), pp. 284–286. IEEE (2013)

21.

Nguyen, T., Park, E.-A., Han, J., Park, D.-C., Min, S.-Y.: Object detection using scale invariant feature transform. In: Pan, J.-S., Krömer, P., Snášel, V. (eds.) Genetic and Evolutionary Computing, pp. 65–72. Springer, Cham (2014)

22.

Bouguila, N., Ziou, D.: A dirichlet process mixture of dirichlet distributions for classification and prediction. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 297–302. IEEE (2008)

23.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

24.

Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

25.

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

26.

Pramanik, A., Pal, S.K., Maiti, J., Mitra, P.: Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Trans. Emerg. Top. Comput. Intell. (2021)

27.

Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences, vol. 1, pp. 186–189. Citeseer (2000)

28.

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

29.

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

30.

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement (2018). arXiv:1804.02767

31.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 21–37. Springer, Cham (2016)

32.

Sanchez, S., Romero, H., Morales, A.: A review: comparison of performance metrics of pretrained models for object detection using the tensorflow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024. IOP Publishing (2020)

33.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

34.

Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

35.

Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

36.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)

37.

Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection (2020). arXiv:2004.10934

38.

Wang, C.-Y., Yeh, I.-H., Liao, H.-Y.M.: You only learn one representation: unified network for multiple tasks (2021). arXiv:2105.04206

39.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

40.

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)

41.

Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)

42.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

43.

Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: European Conference on Computer Vision (ECCV), pp. 173–190. Springer (2020)

44.

Jain, J., Singh, A., Orlov, N., Huang, Z., Li, J., Walton, S., Shi, H.: SeMask: semantically masked transformers for semantic segmentation (2021). arXiv:2112.12782

45.

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al.: Swin transformer v2: scaling up capacity and resolution (2021). arXiv:2111.09883

46.

Hao, S., Zhou, Y., Guo, Y.: A brief survey on semantic segmentation with deep learning. Neurocomputing 406, 302–321 (2020)

47.

Lan, Z.-Z., Bao, L., Yu, S.-I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. Multimed. Tools Appl. 71(1), 333–347 (2014)

48.

Daudpota, S.M., Muhammad, A., Baber, J.: Video genre identification using clustering-based shot detection algorithm. SIViP 13(7), 1413–1420 (2019)

49.

Gkalelis, N., Mezaris, V.: Subclass deep neural networks: re-enabling neglected classes in deep network training for multimedia classification. In: International Conference on Multimedia Modeling, pp. 227–238. Springer (2020)

50.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

51.

Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: An efficient deep residual-inception network for multimedia classification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 373–378. IEEE (2017)

52.

Shamsolmoali, P., Jain, D.K., Zareapoor, M., Yang, J., Alam, M.A.: High-dimensional multimedia classification using deep cnn and extended residual units. Multimed. Tools Appl. 78(17), 23867–23882 (2019)

53.

Dai, X., Yin, H., Jha, N.K.: Incremental learning using a grow-and-prune paradigm with efficient neural networks. IEEE Transactions on Emerging Topics in Computing (2020)

54.

Gkalelis, N., Mezaris, V.: Structured pruning of lstms via Eigen analysis and geometric median for mobile multimedia and deep learning applications. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 122–126. IEEE (2020)

55.

Chiodino, E., Di Luccio, D., Lieto, A., Messina, A., Pozzato, G.L., Rubinetti, D.: A knowledge-based system for the dynamic generation and classification of novel contents in multimedia broadcasting. In: ECAI 2020, pp. 680–687 (2020)

56.

Doulaty, M., Saz-Torralba, O., Ng, R.W.M., Hain, T.: Automatic genre and show identification of broadcast media. In: INTERSPEECH (2016)

57.

Yadav, A., Vishwakarma, D.K.: A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96, 106624 (2020)

58.

Mills, T.J., Pye, D., Hollinghurst, N.J., Wood, K.R.: AT_TV: broadcast television and radio retrieval. In: RIAO, pp. 1135–1144 (2000)

59.

Smeaton, A.F., Over, P., Kraaij, W.: High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia Content Analysis, pp. 1–24 (2009)

60.

Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: International Conference on Multimedia Modeling, pp. 616–621. Springer (2019)

61.

Agarwal, A., Mangal, A., et al.: Visual relationship detection using scene graphs: a survey (2020). arXiv:2005.08045

62.

Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Video summarization using deep neural networks: A survey. Proc. IEEE 109(11), 1838–1863 (2021). https://doi.org/10.1109/JPROC.2021.3117472CrossRef

63.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)

64.

Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018 (2018)

65.

Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy: Fixefficientnet (2020). arXiv:2003.08237

66.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

67.

Gkalelis, N., Goulas, A., Galanopoulos, D., Mezaris, V.: Objectgraphs: using objects and a graph convolutional network for the bottom-up recognition and explanation of events in video. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 3370–3378 (2021). https://doi.org/10.1109/CVPRW53098.2021.00376

68.

Pouyanfar, S., Chen, S.-C.: Semantic event detection using ensemble deep learning. In: 2016 IEEE International Symposium on Multimedia (ISM), pp. 203–208. IEEE (2016)

69.

Marechal, C., Mikolajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., Wegrzyn-Wolska, K.: Survey on AI-based multimodal methods for emotion detection (2019)

70.

Kwak, C.-U., Son, J.-W., Lee, A., Kim, S.-J.: Scene emotion detection using closed caption based on hierarchical attention network. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 1206–1208. IEEE (2017)

71.

Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)

72.

Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10(1), 60–75 (2017)

73.

Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., Van de Walle, R.: An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 409–412 (2016)

74.

Haynes, M., Norton, A., McParland, A., Cooper, R.: Speech-to-text for broadcasters, from research to implementation. SMPTE Motion Imaging J. 127(2), 27–33 (2018). https://doi.org/10.5594/JMI.2018.2790658CrossRef

75.

Sharma, D.P., Atkins, J.: Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng. 7(4), 220–234 (2014)

76.

Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio Speech Process. 2021(1), 1–10 (2021)

77.

Nixon, L., Mezaris, V., Thomsen, J.: Seamlessly interlinking tv and web content to enable linked television. In: ACM Int. Conf. on Interactive Experiences for Television and Online Video (TVX 2014), Adjunct Proceedings, Newcastle Upon Tyne, p. 21 (2014)

78.

Liu, A.H., Jin, S., Lai, C.-I.J., Rouditchenko, A., Oliva, A., Glass, J.: Cross-modal discrete representation learning (2021). arXiv:2106.05438

79.

Guo, W., Wang, J., Wang, S.: Deep multimodal representation learning: a survey. IEEE Access 7, 63373–63394 (2019). https://doi.org/10.1109/ACCESS.2019.2916887CrossRef

80.

Wang, Y.: Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 17(1s), 1–25 (2021)

81.

Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., Zhuang, Y.: Hierarchical cross-modal graph consistency learning for video-text retrieval. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1114–1124 (2021)

82.

Habibian, A., Mensink, T., Snoek, C.G.M.: Video2vec embeddings recognize events when examples are scarce. IEEE Trans. Pattern Anal. Mach. Intell. 39(10), 2089–2103 (2017). https://doi.org/10.1109/TPAMI.2016.2627563CrossRef

83.

Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

84.

Dong, J., Li, X., Snoek, C.G.: Word2visualvec: cross-media retrieval by visual feature prediction (2016). arXiv:1604.06838

85.

Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 336–340 (2020)

86.

Dong, J., Li, X., Xu, C., Ji, S., He, Y., Yang, G., Wang, X.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9346–9355 (2019)

87.

Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7464–7473 (2019)

88.

Ruan, L., Jin, Q.: Survey: transformer based video-language pre-training. AI Open 3, 1–13 (2022). https://doi.org/10.1016/j.aiopen.2022.01.001CrossRef

89.

Li, L., Chen, Y.-C., Cheng, Y., Gan, Z., Yu, L., Liu, J.: HERO: hierarchical encoder for video+ language omni-representation pre-training. In: EMNLP (2020)

90.

Lei, J., Li, L., Zhou, L., Gan, Z., Berg, T.L., Bansal, M., Liu, J.: Less is more: clipbert for video-and-language learning via sparse sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7331–7341 (2021)

91.

Sun, C., Baradel, F., Murphy, K., Schmid, C.: Learning video representations using contrastive bidirectional transformer (2019). arXiv:1906.05743

92.

Zhu, L., Yang, Y.: Actbert: learning global-local video-text representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8746–8755 (2020)

93.

Luo, H., Ji, L., Shi, B., Huang, H., Duan, N., Li, T., Li, J., Bharti, T., Zhou, M.: UniVL: a unified video and language pre-training model for multimodal understanding and generation (2020). arXiv:2002.06353

94.

Gao, Z., Liu, J., Chen, S., Chang, D., Zhang, H., Yuan, J.: CLIP2TV: an empirical study on transformer-based methods for video-text retrieval (2021). arXiv:2111.05610

95.

Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: a large video description dataset for bridging video and language. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5288–5296 (2016)

96.

Kim, C., Hwang, J.-N.: Object-based video abstraction for video surveillance systems. IEEE Trans. Circuits Syst. Video Technol. 12(12), 1128–1138 (2002). https://doi.org/10.1109/TCSVT.2002.806813CrossRef

97.

Ejaz, N., Tariq, T.B., Baik, S.W.: Adaptive key frame extraction for video summarization using an aggregation mechanism. J. Vis. Commun. Image Represent. 23(7), 1031–1040 (2012). https://doi.org/10.1016/j.jvcir.2012.06.013CrossRef

98.

Furini, M., Geraci, F., Montangero, M., Pellegrini, M.: Stimo: STIll and MOving video storyboard for the web scenario. Multimed. Tools Appl. 46(1), 47–69 (2010). https://doi.org/10.1007/s11042-009-0307-7CrossRef

99.

de Avila, S.E.F., Lopes, A.P.B.A., da Luz, A. Jr., de Albuquerque Araújo, A.: Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32(1), 56–68 (2011). https://doi.org/10.1016/j.patrec.2010.08.004

100.

Almeida, J., Leite, N.J., Torres, R.d.S.: Vison: VIdeo Summarization for ONline Applications. Pattern Recogn. Lett. 33(4), 397–409 (2012). https://doi.org/10.1016/j.patrec.2011.08.007

101.

Chu, W., Song, Y., Jaimes, A.: Video co-summarization: video summarization by visual co-occurrence. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3584–3592 (2015). https://doi.org/10.1109/CVPR.2015.7298981

102.

Elhamifar, E., Sapiro, G., Vidal, R.: See all by looking at a few: sparse modeling for finding representative objects. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1600–1607 (2012). https://doi.org/10.1109/CVPR.2012.6247852

103.

Ma, M., Mei, S., Wan, S., Wang, Z., Feng, D.: Video summarization via nonlinear sparse dictionary selection. IEEE Access 7, 11763–11774 (2019). https://doi.org/10.1109/ACCESS.2019.2891834CrossRef

104.

Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2513–2520 (2014). https://doi.org/10.1109/CVPR.2014.322

105.

Lai, J.-L., Yi, Y.: Key frame extraction based on visual attention model. J. Vis. Commun. Image Represent. 23(1), 114–125 (2012). https://doi.org/10.1016/j.jvcir.2011.08.005CrossRef

106.

Ejaz, N., Mehmood, I., Baik, S.W.: Feature aggregation based visual attention model for video summarization. Comput. Electr. Eng. 40(3), 993–1005 (2014). https://doi.org/10.1016/j.compeleceng.2013.10.005(Special Issue on Image and Video Processing)

107.

Zhang, Y., Tao, R., Wang, Y.: Motion-state-adaptive video summarization via spatiotemporal analysis. IEEE Trans. Circuits Syst. Video Technol. 27(6), 1340–1352 (2017). https://doi.org/10.1109/TCSVT.2016.2539638CrossRef

108.

Gygli, M., Grabner, H., Gool, L.V.: Video summarization by learning submodular mixtures of objectives. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098 (2015). https://doi.org/10.1109/CVPR.2015.7298928

109.

Li, X., Zhao, B., Lu, X.: A general framework for edited video and raw video summarization. IEEE Trans. Image Process. 26(8), 3652–3664 (2017). https://doi.org/10.1109/TIP.2017.2695887MathSciNetCrossRefMATH

110.

Elfeki, M., Borji, A.: Video summarization via actionness ranking. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, January 7–11, 2019, pp. 754–763 (2019). https://doi.org/10.1109/WACV.2019.00085

111.

Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3677–3686 (2017). https://doi.org/10.1109/ICCV.2017.395

112.

Rochan, M., Ye, L., Wang, Y.: Video summarization using fully convolutional sequence networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 358–374. Springer, Cham (2018)

113.

Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Carneiro, G., You, S. (eds.) Computer Vision—ACCV 2018 Workshops, pp. 39–54. Springer, Cham (2019)

114.

Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. In: The 13th Asian Conference on Computer Vision (ACCV’16) (2016)

115.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

116.

Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1179. https://www.aclweb.org/anthology/D14-1179

117.

Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 766–782. Springer, Cham (2016)

118.

Ji, Z., Xiong, K., Pang, Y., Li, X.: Video summarization with attention-based encoder-decoder networks. IEEE Trans. Circuits Syst. Video Technol. (2019). https://doi.org/10.1109/TCSVT.2019.2904996CrossRef

119.

Fu, T., Tai, S., Chen, H.: Attentive and adversarial learning for video summarization. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, January 7–11, 2019, pp. 1579–1587 (2019). https://doi.org/10.1109/WACV.2019.00173

120.

Feng, L., Li, Z., Kuang, Z., Zhang, W.: Extractive video summarizer with memory augmented neural networks. In: Proceedings of the 26th ACM International Conference on Multimedia. MM ’18, pp. 976–983. ACM, New York (2018). https://doi.org/10.1145/3240508.3240651

121.

Zhao, B., Li, X., Lu, X.: Hierarchical recurrent neural network for video summarization. In: Proceedings of the 2017 ACM on Multimedia Conference. MM ’17, pp. 863–871. ACM, New York (2017). https://doi.org/10.1145/3123266.3123328

122.

Zhao, B., Li, X., Lu, X.: HSA-RNN: Hierarchical structure-adaptive rnn for video summarization. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. CVPR ’18 (2018)

123.

Zhang, Y., Kampffmeyer, M., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Dtr-gan: Dilated temporal relational adversarial network for video summarization (2018). arXiv:1804.11228 [CoRR/abs]

124.

Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Ac-sum-gan: connecting actor-critic and generative adversarial networks for unsupervised video summarization. IEEE Trans. Circuits Syst. Video Technol. (2020)

125.

Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019)

126.

Jung, Y., Cho, D., Woo, S., Kweon, I.S.: Global-and-local relative position embedding for unsupervised video summarization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, August 23–28, 2020, Proceedings, Part XXV 16, pp. 167–183 (2020). Springer

127.

Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I.: Unsupervised video summarization via attention-driven adversarial learning. In: International Conference on Multimedia Modeling, pp. 492–504 (2020). Springer

128.

Apostolidis, E., Metsai, A.I., Adamantidou, E., Mezaris, V., Patras, I.: A stepwise, label-based approach for improving the adversarial training in unsupervised video summarization. In: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–25 (2019)

129.

Wang, J., Wang, W., Wang, Z., Wang, L., Feng, D., Tan, T.: Stacked memory network for video summarization. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844 (2019)

130.

Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P.: Summarizing videos with attention. In: Asian Conference on Computer Vision, pp. 39–54 (2018). Springer

131.

Liu, Y.-T., Li, Y.-J., Yang, F.-E., Chen, S.-F., Wang, Y.-C.F.: Learning hierarchical self-attention for video summarization. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3377–3381 (2019). IEEE

132.

Li, P., Ye, Q., Zhang, L., Yuan, L., Xu, X., Shao, L.: Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recogn. 111, 107677 (2021)

133.

Ji, Z., Jiao, F., Pang, Y., Shao, L.: Deep attentive and semantic preserving video summarization. Neurocomputing 405, 200–207 (2020)

134.

Apostolidis, E., Balaouras, G., Mezaris, V., Patras, I.: Combining global and local attention with positional encoding for video summarization. In: 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234. IEEE (2021)

135.

Xu, M., Jin, J.S., Luo, S., Duan, L.: Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 677–680 (2008)

136.

Xiong, B., Kalantidis, Y., Ghadiyaram, D., Grauman, K.: Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1258–1267 (2019)

137.

Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework. In: 2005 IEEE International Conference on Multimedia and Expo, p. 4. IEEE (2005)

138.

Tang, H., Kwatra, V., Sargin, M.E., Gargi, U.: Detecting highlights in sports videos: cricket as a test case. In: 2011 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2011)

139.

Wang, J., Xu, C., Chng, E., Tian, Q.: Sports highlight detection from keyword sequences using HMM. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat. No. 04TH8763), vol. 1, pp. 599–602. IEEE (2004)

140.

Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for tv baseball programs. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 105–115 (2000)

141.

Sun, M., Farhadi, A., Seitz, S.: Ranking domain-specific highlights by analyzing edited videos. In: European Conference on Computer Vision, pp. 787–802. Springer (2014)

142.

Petkovic, M., Mihajlovic, V., Jonker, W., Djordjevic-Kajan, S.: Multi-modal extraction of highlights from tv formula 1 programs. In: Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 817–820. IEEE (2002)

143.

Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 982–990 (2016)

144.

Gygli, M., Song, Y., Cao, L.: Video2gif: automatic generation of animated gifs from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1001–1009 (2016)

145.

Jiao, Y., Li, Z., Huang, S., Yang, X., Liu, B., Zhang, T.: Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans. Multimed. 20(10), 2693–2705 (2018)

146.

Potapov, D., Douze, M., Harchaoui, Z., Schmid, C.: Category-specific video summarization. In: European Conference on Computer Vision, pp. 540–555. Springer (2014)

147.

Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4633–4641 (2015)

148.

Panda, R., Das, A., Wu, Z., Ernst, J., Roy-Chowdhury, A.K.: Weakly supervised summarization of web videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017)

149.

Hong, F.-T., Huang, X., Li, W.-H., Zheng, W.-S.: Mini-net: multiple instance ranking network for video highlight detection. In: European Conference on Computer Vision, pp. 345–360. Springer (2020)

150.

Rochan, M., Reddy, M.K.K., Ye, L., Wang, Y.: Adaptive video highlight detection by learning from user history. In: European Conference on Computer Vision, pp. 261–278. Springer (2020)

151.

Wu, L., Yang, Y., Chen, L., Lian, D., Hong, R., Wang, M.: Learning to transfer graph embeddings for inductive graph based recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1211–1220 (2020)

152.

Xu, M., Wang, H., Ni, B., Zhu, R., Sun, Z., Wang, C.: Cross-category video highlight detection via set-based learning (2021). arXiv:2108.11770

153.

Mundnich, K., Fenster, A., Khare, A., Sundaram, S.: Audiovisual highlight detection in videos. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4155–4159. IEEE (2021)

154.

Farsiu, S., Robinson, M.D., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)

155.

Farsiu, S., Elad, M., Milanfar, P.: Multiframe demosaicing and super-resolution from undersampled color images. In: Computational Imaging II, vol. 5299, pp. 222–233. International Society for Optics and Photonics (2004)

156.

Farsiu, S., Robinson, D.M., Elad, M., Milanfar, P.: Dynamic demosaicing and color superresolution of video sequences. In: Image Reconstruction from Incomplete Data III, vol. 5562, pp. 169–178. International Society for Optics and Photonics (2004)

157.

Yang, C.-Y., Huang, J.-B., Yang, M.-H.: Exploiting self-similarities for single frame super-resolution. In: Asian Conference on Computer Vision, pp. 497–510. Springer (2010)

158.

Freeman, W.T., Jones, T.R., Pasztor, E.C.: Example-based super-resolution. IEEE Comput. Graph. Appl. 22(2), 56–65 (2002)

159.

Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

160.

Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)

161.

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

162.

Rad, M.S., Bozorgtabar, B., Marti, U.-V., Basler, M., Ekenel, H.K., Thiran, J.-P.: Srobb: targeted perceptual loss for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2710–2719 (2019)

163.

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

164.

Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C.: Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

165.

Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)

166.

Gatopoulos, I., Stol, M., Tomczak, J.M.: Super-resolution variational auto-encoders (2020). arXiv:2006.05218

167.

Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1993–2001 (2016)

168.

Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021)

169.

Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., Salimans, T.: Cascaded diffusion models for high fidelity image generation (2021). arXiv:2106.15282

170.

Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement (2021). arXiv:2104.07636

171.

Chadha, A., Britto, J., Roja, M.M.: iseebetter: spatio-temporal video super-resolution using recurrent generative back-projection networks. Comput. Vis. Media 6(3), 307–317 (2020)

172.

Isobe, T., Zhu, F., Jia, X., Wang, S.: Revisiting temporal modeling for video super-resolution. In: Proceedings of the 31st British Machine Vision Conference (BMVC) (2020)

173.

Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)

174.

Rozumnyi, D., Oswald, M.R., Ferrari, V., Matas, J., Pollefeys, M.: DeFMO: deblurring and shape recovery of fast moving objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3456–3465 (2021)

175.

Liu, H., Ruan, Z., Zhao, P., Dong, C., Shang, F., Liu, Y., Yang, L.: Video super resolution based on deep learning: a comprehensive survey (2020). arXiv:2007.12928

176.

Nam, H., Park, D., Jeon, K.: Jitter-robust video retargeting with Kalman filter and attention saliency fusion network. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 858–862 (2020). https://doi.org/10.1109/ICIP40778.2020.9191354

177.

Lee, H.-S., Bae, G., Cho, S.-I., Kim, Y.-H., Kang, S.: Smartgrid: video retargeting with spatiotemporal grid optimization. IEEE Access 7, 127564–127579 (2019)

178.

Rachavarapu, K.-K., Kumar, M., Gandhi, V., Subramanian, R.: Watch to edit: video retargeting using gaze. In: Computer Graphics Forum, vol. 37, pp. 205–215. Wiley Online Library (2018)

179.

Jain, E., Sheikh, Y., Shamir, A., Hodgins, J.: Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34(2), 1–12 (2015)

180.

Deselaers, T., Dreuw, P., Ney, H.: Pan, zoom, scan–time-coherent, trained automatic video cropping. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008). https://doi.org/10.1109/CVPR.2008.4587729

181.

Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 241–250 (2006)

182.

Kaur, H., Kour, S., Sen, D.: Video retargeting through spatio-temporal seam carving using kalman filter. IET Image Proc. 13(11), 1862–1871 (2019)

183.

Wang, S., Tang, Z., Dong, W., Yao, J.: Multi-operator video retargeting method based on improved seam carving. In: 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), pp. 1609–1614 (2020). https://doi.org/10.1109/ITOEC49072.2020.9141774

184.

Wang, Y.-S., Lin, H.-C., Sorkine, O., Lee, T.-Y.: Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 Papers, pp. 1–9 (2010)

185.

Kopf, S., Haenselmann, T., Kiess, J., Guthier, B., Effelsberg, W.: Algorithms for video retargeting. Multimed. Tools Appl. 51(2), 819–861 (2011). https://doi.org/10.1007/s11042-010-0717-6CrossRef

186.

Kiess, J., Guthier, B., Kopf, S., Effelsberg, W.: SeamCrop for image retargeting. In: Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI, vol. 8304, p. 83040. International Society for Optics and Photonics (2012)

187.

Nam, S.-H., Ahn, W., Yu, I.-J., Kwon, M.-J., Son, M., Lee, H.-K.: Deep convolutional neural network for identifying seam-carving forgery. IEEE Trans. Circuits Syst. Video Technol. (2020)

188.

Apostolidis, K., Mezaris, V.: A fast smart-cropping method and dataset for video retargeting. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2618–2622. IEEE (2021)

189.

Chou, Y.-C., Fang, C.-Y., Su, P.-C., Chien, Y.-C.: Content-based cropping using visual saliency and blur detection. In: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media), pp. 1–6. IEEE (2017)

190.

Zhu, T., Zhang, D., Hu, Y., Wang, T., Jiang, X., Zhu, J., Li, J.: Horizontal-to-vertical video conversion. IEEE Trans. Multimed. (2021)

191.

Smyth, B., Cotter, P.: Case-studies on the evolution of the personalized electronic program guide 6, 53–71 (2004). https://doi.org/10.1007/1-4020-2164-X_3

192.

Kim, E., Pyo, S., Park, E., Kim, M.: An automatic recommendation scheme of tv program contents for (ip) tv personalization. IEEE Trans. Broadcast. 57(3), 674–684 (2011)

193.

Soares, M., Viana, P.: Tv recommendation and personalization systems: integrating broadcast and video on-demand services. Adv. Electr. Comput. Eng. 14(1), 115–120 (2014)

194.

Hsu, S.H., Wen, M.-H., Lin, H.-C., Lee, C.-C., Lee, C.-H.: Aimed-a personalized tv recommendation system. In: European Conference on Interactive Television, pp. 166–174. Springer (2007)

195.

Aharon, M., Hillel, E., Kagian, A., Lempel, R., Makabee, H., Nissim, R.: Watch-it-next: a contextual tv recommendation system. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–195. Springer (2015)

196.

Aroyo, L., Nixon, L., Miller, L.: NoTube: the television experience enhanced by online social and semantic data. In: 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), pp. 269–273. IEEE (2011)

197.

Veloso, B., Malheiro, B., Burguillo, J., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data, pp. 501–510 (2018). https://doi.org/10.1007/978-3-319-77712-2_47

198.

Gonçalves, D., Costa, M., Couto, F.M.: A flexible recommendation system for cable tv (2016). arXiv:1609.02451

199.

Maccatrozzo, V., Terstall, M., Aroyo, L., Schreiber, G.: Sirup: serendipity in recommendations via user perceptions. IUI ’17, pp. 35–44. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3025171.3025185

200.

Armstrong, M., Brooks, M., Churnside, A., Evans, M., Melchior, F., Shotton, M.: Object-based broadcasting-curation, responsiveness and user experience (2014)

201.

Cox, J., Jones, R., Northwood, C., Tutcher, J., Robinson, B.: Object-based production: a personalised interactive cooking application. In: Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 79–80 (2017)

202.

Ursu, M., Smith, D., Hook, J., Concannon, S., Gray, J.: Authoring interactive fictional stories in object-based media (OBM). In: ACM International Conference on Interactive Media Experiences, pp. 127–137 (2020)

203.

Silzle, A., Weitnauer, M., Warusfel, O., Bleisteiner, W., Herberger, T., Epain, N., Duval, B., Bogaards, N., Baume, C., Herzog, U., et al.: Orpheus audio project: piloting an end-to-end object-based audio broadcasting chain. In: IBC Conference, Amsterdam, September, pp. 14–18 (2017)

204.

Chen, X., Nguyen, T.V., Shen, Z., Kankanhalli, M.: Livesense: contextual advertising in live streaming videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 392–400 (2019)

205.

Akgul, T., Ozcan, S., Iplik, A.: A cloud-based end-to-end server-side dynamic ad insertion platform for live content. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 361–364 (2020)

206.

Carvalho, P., Pereira, A., Viana, P.: Automatic tv logo identification for advertisement detection without prior data. Appl. Sci. 11(16), 7494 (2021)

207.

Park, S., Cho, K.: Framework for personalized broadcast notice based on contents metadata. In: Proceedings of the Korea Contents Association Conference, pp. 445–446. The Korea Contents Association (2014)

208.

Hunter, J.: Adding multimedia to the semantic web: Building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working. SWWS’01, pp. 261–283. CEUR-WS.org, Aachen, DEU (2001)

209.

EBU-MIM: EBU-MIM semantic web activity report. Technical report, EBU-MIM (2015). Accessed 30 Sept 2021

210.

Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, pp. 782–792 (2011)

211.

Brasoveanu, A.M., Weichselbraun, A., Nixon, L.: In media res: a corpus for evaluating named entity linking with creative works. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 355–364 (2020)

212.

Nixon, L., Troncy, R.: Survey of semantic media annotation tools for the web: towards new media applications with linked media. In: European Semantic Web Conference, pp. 100–114. Springer (2014)

213.

Collyda, C., Apostolidis, K., Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V.: A web service for video summarization. In: ACM International Conference on Interactive Media Experiences, pp. 148–153 (2020)

214.

R&D, B.: Object-Based Media. https://www.bbc.co.uk/rd/object-based-media. Accessed 30 Sept 2021

215.

Jackson, W.: Object-Based Media Transforms Audio Content Creation. https://www.radioworld.com/news-and-business/objectbased-media-transforms-audio-content-creation (2017). Accessed 30 Sept 2021

216.

Axonista: Object-based broadcasting (2016). https://medium.com/axonista-hq/object-based-broadcasting-e4dd91b2b2e9. Accessed 30 Sept 2021

217.

Armstrong, M.: Object-based media: a toolkit for building responsive content. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference 32, pp. 1–2 (2018)

218.

Cox, J., Brooks, M., Forrester, I., Armstrong, M.: Moving object-based media production from one-off examples to scalable workflows. SMPTE Motion Imaging J. 127(4), 32–37 (2018)

219.

Carter, J., Ramdhany, R., Lomas, M., Pearce, T., Shephard, J., Sparks, M.: Universal access for object-based media experiences. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 382–385 (2020)

220.

Zwicklbauer, M., Lamm, W., Gordon, M., Apostolidis, K., Philipp, B., Mezaris, V.: Video analysis for interactive story creation: the sandmännchen showcase. In: Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery, pp. 17–24 (2020)

221.

Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J., Gama, J.: Personalised dynamic viewer profiling for streamed data. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) Trends and Advances in Information Systems and Technologies, pp. 501–510. Springer, Cham (2018)

222.

Veloso, B., Malheiro, B., Burguillo, J.C., Foss, J.: Product placement platform for personalised advertising. New European Media (NEM) Summit 2016 (2016)

223.

Malheiro, B., Foss, J., Burguillo, J.: B2B platform for media content personalisation. In: B2B Platform for Media Content Personalisation (2013)

224.

R&D, B.: StoryKit. https://www.bbc.co.uk/rd/projects/object-based-media-toolkit June2021. Accessed 30 Sept 2021

225.

Stewart, S.: Video game industry silently taking over entertainment world. Verfügbar unter ejinsight. com/eji/article/id/2280405/20191022 (2019)

226.

Witkowski, W.: Videogames are a bigger industry than movies and north American sports combined, thanks to the pandemic. MarketWatch (2020)

227.

Ward, L., Paradis, M., Shirley, B., Russon, L., Moore, R., Davies, R.: Casualty accessible and enhanced (A&E) audio: trialling object-based accessible tv audio. In: Audio Engineering Society Convention, p. 147. Audio Engineering Society (2019)

228.

Montagud, M., Núñez, J.A., Karavellas, T., Jurado, I., Fernández, S.: Convergence between tv and vr: enabling truly immersive and social experiences. In: Workshop on Virtual Reality, Co-located with ACM TVX 2018 (2018)

229.

Kudumakis, P., Wilmering, T., Sandler, M., Foss, J.: MPEG IPR ontologies for media trading and personalization. In: International Workshop on Data-Driven Personalization of Television (DataTV2019), ACM International Conference on Interactive Experiences for Television and Online Video (TVX2019) (2019)

230.

MAP.: MAP Marketplace (2021). https://map-marketplace.mog-technologies.com/makefilmhistory/auth/login. Accessed 28 Oct 2021

231.

ISO/IEC.: Information technology—multimedia framework (MPEG-21)—part 19: Media value chain ontology/amd 1 extensions on time-segments and multi-track audio’. Standard, International Organization for Standardization (2018). Accessed 30 Sept 2021

232.

ISO/IEC.: Information technology—multimedia framework (MPEG-21)—media contract ontology. standard, International Organization for Standardization (2017). Accessed 30 Sept 2021

233.

Core, D.: Dublin Core Media Initiative. https://dublincore.org/. Accessed 30 Sept 2021

234.

dvb.org.: DVB-SI, (Service Information), DVB. https://dvb.org/?standard=specification-for-service-information-si-in-dvb-systems. Accessed 30 Sept 2021

235.

etsi.org.: TV-Anytime, ETSI (2001). https://www.etsi.org/deliver/etsi_ts/102800_102899/1028220301/01.07.01_60/ts_1028220301v010701p.pdf. Accessed 30 Sept 2021

236.

Keltsch, M.: BMF–Metadata Exchange Format Of The German Public Broadcasters (2019). https://tech.ebu.ch/publications/bmf--metadata-exchange-format-of-the-german-public-broadcasters. Accessed 30 Sept 2021

237.

ISO/IE.: MPEG-7, part 1 et seq. standard, International Organization for Standardization. Accessed 30 Sept 2021

238.

Chang, S.-F., Sikora, T., Purl, A.: Overview of the MPEG-7 standard. IEEE Trans. Circuits Syst. Video Technol. 11(6), 688–695 (2001)

239.

ISO/IEC.: Introduction to MPEG-7, coding of moving pictures and audio. Standard, International Organization for Standardization (March 2001). Accessed 30 Sept 2021

240.

ISO/IEC.: MPEG-I: Scene description for MPEG media, MPEG group, MPEG-I part 14. Standard, International Organization for Standardization. Accessed 30 Sept 2021

241.

ISO/IEC.: Coded representation of immersive media– part 14: scene description for mpeg media, ISO. Standard, International Organization for Standardization. Accessed 30 Sept 2021

242.

Group, M.: MPEG group, coded representation of immersive media. standard, MPEG standards (2020). Accessed 30 Sept 2021

243.

Group, M.: MPEG-I: Versatile video coding, MPEG-I part 3, MPEG group. Standard, MPEG standards. Accessed 30 Sept 2021

244.

Wieckowski, A., Ma, J., Schwarz, H., Marpe, D., Wiegand, T.: Fast partitioning decision strategies for the upcoming versatile video coding (VVC) standard. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4130–4134. IEEE (2019)

245.

EBU.: EBU Core. https://tech.ebu.ch/MetadataEbuCore. Accessed 30 Sept 2021

246.

EBU.: EBU Ontologies. https://www.ebu.ch/metadata/ontologies/ebucore/. Accessed 30 Sept 2021

247.

Core, D.: Dublin Core Media Initiative (2021). https://dublincore.org/. Accessed 30 Sept 2021

248.

W3C: Web Ontology Language (OWL). https://www.w3.org/OWL/. Accessed 30 Sept 2021

249.

EBU. EBU Tech 3293—BUCore (2020). http://tech.ebu.ch/docs/tech/tech3293.pdf. Accessed 30 Sept 2021

250.

EBU.: EBU Tech 3293–RDF/OWL. http://www.ebu.ch/metadata/ontologies/ebucore/. Accessed 30 Sept (2021)

251.

EBU.: EBU Tech 3332–Music (209). http://tech.ebu.ch/docs/tech/tech3332v1_1.pdf. Accessed 30 Sept 2021

252.

EBU.: EBU Tech 3336—Classification Schemes (2011). http://tech.ebu.ch/docs/tech/tech3336.pdf. Accessed 30 Sept 2021

253.

EBU.: EBU Tech 3349–Acquisition Metadata (2012). http://tech.ebu.ch/docs/tech/tech3349.pdf. Accessed 30 Sept 2021

254.

EBU.: EBU tech 3351–ccdm. Technical report, EBU (August 2020). Accessed 30 Sept 2021

255.

EBU.: EBU Tech 3352–Identifiers in BWF (2012). http://tech.ebu.ch/docs/tech/tech3352.pdf. Accessed 30 Sept 2021

256.

MPEG-I.: MPEG-I: Scene Description for MPEG Media, MPEG Group, MPEG-I Part 14. https://www.mpegstandards.org/standards/MPEG-I/14/. Accessed 30 Sept 2021

257.

Khronos.org.: glTF–GL Transmission Format. Khronos (2017). https://www.khronos.org/api/index_2017/gltf. Accessed 30 Sept 2021

258.

ISO/IEC.: Information technology–multimedia framework (MPEG-21)–contract expression language. Standard, International Organization for Standardization (2016). Accessed 30 Sept 2021

259.

Rodríguez-Doncel, V.: Overview of the mpeg-21 media contract ontology. In: Overview of the MPEG-21 Media Contract Ontology (2016)

260.

mpeg.chiariglione.org.: Media Value Chain Ontology (2011). https://mpeg.chiariglione.org/standards/mpeg-21/media-value-chain-ontology. Accessed 30 Sept 2021

261.

Community, M.: Picture, Audio and Data Coding by Artificial Intelligence (MPAI). https://mpai.community/. Accessed 30 Sept 2021

262.

org., M.: MPAI–Visual Object and Scene Description. https://mpai.community/standards/mpai-osd/. Accessed 30 Sept 2021

263.

Shou, M.Z., Ghadiyaram, D., Wang, W., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation (2021). arXiv:2101.10511 [CoRR abs]

264.

Krishna, M.V., Bodesheim, P., Körner, M., Denzler, J.: Temporal video segmentation by event detection: a novelty detection approach. Pattern Recogn. Image Anal. 24(2), 243–255 (2014)

265.

Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. (TOG) 36(4), 1–12 (2017)

266.

Shou, M.Z., Lei, S.W., Wang, W., Ghadiyaram, D., Feiszli, M.: Generic event boundary detection: a benchmark for event segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8075–8084 (2021)

267.

Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi, K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: a dataset and benchmarks for holistic understanding of broadcast soccer videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4508–4519 (2021)

268.

Verschae, R., Ruiz-del-Solar, J.: Object detection: current and future directions. Front. Robot. AI 2, 29 (2015)

269.

Jiao, L., Zhang, R., Liu, F., Yang, S., Hou, B., Li, L., Tang, X.: New generation deep learning for video object detection: a survey. IEEE Trans. Neural Netw. Learn. Syst. (2021). https://doi.org/10.1109/TNNLS.2021.3053249CrossRef

270.

Smith, M.L., Smith, L.N., Hansen, M.F.: The quiet revolution in machine vision—a state-of-the-art survey paper, including historical review, perspectives, and future directions. Comput. Ind. 130, 103472 (2021). https://doi.org/10.1016/j.compind.2021.103472CrossRef

271.

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

272.

Kaur, P., Pannu, H.S., Malhi, A.K.: Comparative analysis on cross-modal information retrieval: a review. Comput. Sci. Rev. 39, 100336 (2021)

273.

Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)

274.

Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4581–4591 (2019)

275.

Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, A.P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv:1609.08675

276.

Rehman, S.U., Waqas, M., Tu, S., Koubaa, A., ur Rehman, O., Ahmad, J., Hanif, M., Han, Z.: Deep learning techniques for future intelligent cross-media retrieval. Technical report, CISTER-Research Centre in Realtime and Embedded Computing Systems (2020)

277.

Tu, S., ur Rehman, S., Waqas, M., Rehman, O.u., Yang, Z., Ahmad, B., Halim, Z., Zhao, W.: Optimisation-based training of evolutionary convolution neural network for visual classification applications. IET Comput. Vis. 14(5), 259–267 (2020)

278.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929 [CoRR abs]

279.

Dai, Z., Liu, H., Le, Q., Tan, M.: CoAtNet: marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 34 (2021)

280.

Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., Parisi, P., et al.: Unity perception: generate synthetic data for computer vision (2021). arXiv:2107.04259

281.

Tan, C., Xu, X., Shen, F.: A survey of zero shot detection: methods and applications. Cogn. Robot. 1, 159–167 (2021)

282.

Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–37 (2019)

283.

Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021)

284.

Hu, Y., Nie, L., Liu, M., Wang, K., Wang, Y., Hua, X.-S.: Coarse-to-fine semantic alignment for cross-modal moment localization. IEEE Trans. Image Process. 30, 5933–5943 (2021). https://doi.org/10.1109/TIP.2021.3090521CrossRef

285.

Li, Y., Yao, T., Pan, Y., Chao, H., Mei, T.: Jointly localizing and describing events for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7492–7500 (2018)

286.

Chen, S., Jiang, Y.-G.: Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8425–8435 (2021)

287.

Dong, C., Chen, X., Chen, A., Hu, F., Wang, Z., Li, X.: Multi-level visual representation with semantic-reinforced learning for video captioning. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4750–4754 (2021)

288.

Francis, D., Anh Nguyen, P., Huet, B., Ngo, C.-W.: Fusion of multimodal embeddings for ad-hoc video search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (2019)

289.

Yaliniz, G., Ikizler-Cinbis, N.: Using independently recurrent networks for reinforcement learning based unsupervised video summarization. Multimed. Tools Appl. 80(12), 17827–17847 (2021)

290.

291.

Hu, L., He, W., Zhang, L., Xu, T., Xiong, H., Chen, E.: Detecting highlighted video clips through emotion-enhanced audio-visual cues. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). https://doi.org/10.1109/ICME51207.2021.9428252

292.

Lee, R., Venieris, S.I., Lane, N.D.: Deep neural network-based enhancement for image and video streaming systems: a survey and future directions. ACM Comput. Surv. (2021). https://doi.org/10.1145/3469094CrossRef

293.

Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2113–2122 (2021)

294.

Chu, X., Zhang, B., Ma, H., Xu, R., Li, Q.: Fast, accurate and lightweight super-resolution with neural architecture search. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 59–64 (2021). https://doi.org/10.1109/ICPR48806.2021.9413080

295.

Ignatov, A., Timofte, R., Denna, M., Younes, A.: Real-time quantized image super-resolution on mobile NPUs, mobile AI 2021 challenge: Report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2525–2534 (2021)

296.

Ignatov, A., Romero, A., Kim, H., Timofte, R.: Real-time video super-resolution on smartphones with deep learning, mobile ai 2021 challenge: report. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2535–2544 (2021)

297.

Zang, T., Zhu, Y., Liu, H., Zhang, R., Yu, J.: A survey on cross-domain recommendation: taxonomies, methods, and future directions (2021). arXiv:2108.03357 [CoRR abs]

298.

Nixon, L., Ciesielski, K., Philipp, B.: AI for audience prediction and profiling to power innovative TV content recommendation services, pp. 42–48 (2019)

299.

Talu\(\breve{{\rm g}}\), D.Y.: User expectations on smart TV; an empiric study on user emotions towards smart TV. Turk. Online J. Design Art Commun. 11(2), 424–442 (2021)

300.

Borgotallo, R., Pero, R.D., Messina, A., Negro, F., Vignaroli, L., Aroyo, L., Aart, C., Conconi, A.: Personalized semantic news: Combining semantics and television. In: International Conference on User Centric Media, pp. 137–140. Springer (2009)

301.

AMWA: AMWA Application Specification—AS-02 MXF Versioning (2011). https://static.amwa.tv/as-02-mxf-versioning-spec.pdf. Accessed 3 Feb 2022

302.

Telestream, Inc.: A Guide To The Interoperable Master Format (IMF) (2019). http://www.telestream.net/pdfs/datasheets/App-brief-Vantage-IMF.pdf. Accessed 3 Feb 2022

Titel: Data-driven personalisation of television content: a survey
verfasst von: Lyndon Nixon
Jeremy Foss
Konstantinos Apostolidis
Vasileios Mezaris
Publikationsdatum: 23.04.2022
Verlag: Springer Berlin Heidelberg
Erschienen in: Multimedia Systems / Ausgabe 6/2022
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-022-00926-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2022

D-BullyRumbler: a safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach

Multi-modal cyber-aggression detection with feature optimization by firefly algorithm

Multimedia image and video retrieval based on an improved HMM

Map modeling for full body gesture using flex sensor and machine learning algorithms

Fast stereo visual odometry based on LK optical flow and ORB-SLAM2

Understanding videos with face recognition: a complete pipeline and applications