Skip to main content

2018 | OriginalPaper | Buchkapitel

6. Automated Video Mashups: Research and Challenges

verfasst von : Mukesh Kumar Saini, Wei Tsang Ooi

Erschienen in: MediaSync

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The proliferation of video cameras, such as those embedded in smartphones and wearable devices, has made it increasingly easy for users to film interesting events (such as public performance, family events, and vacation highlights) in their daily lives. Moreover, often there are multiple cameras capturing the same event at the same time, from different views. Concatenating segments of the videos produced by these cameras together along the event time forms a video mashup, which could depict the event in a less monotonous and more informative manner. It is, however, inefficient and costly to manually create a video mashup. This chapter aims to introduce the problem of automated video mashup to the readers, survey the state-of-the-art research work in this area, and outline the set of open challenges that remain to be solved. It provides a comprehensive introduction to practitioners, researchers, and graduate students who are interested in the research and challenges of automated video mashup.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that the audio and video samples captured at the same time are generated at a different time at the source due to the difference in the speed of light and the speed of sound. Humans, however, have learned to compensate for the difference in normal settings.
 
2
See [42] for a discussion on narration, story, and events.
 
Literatur
1.
Zurück zum Zitat Nakano, T., Murofushi, S., Goto, M., Morishima, S.: Dancereproducer: an automatic mashup music video generation system by reusing dance video clips on the web. In: Sound and Music Computing Conference (SMC), pp. 183–189 (2011) Nakano, T., Murofushi, S., Goto, M., Morishima, S.: Dancereproducer: an automatic mashup music video generation system by reusing dance video clips on the web. In: Sound and Music Computing Conference (SMC), pp. 183–189 (2011)
2.
Zurück zum Zitat Fu, Y., Guo, Y., Zhu, Y., Liu, F., Song, C., Zhou, Z.H.: Multi-view video summarization. IEEE Trans. Multimedia (TOMM) 12(7), 717–729 (2010) Fu, Y., Guo, Y., Zhu, Y., Liu, F., Song, C., Zhou, Z.H.: Multi-view video summarization. IEEE Trans. Multimedia (TOMM) 12(7), 717–729 (2010)
3.
Zurück zum Zitat Pritch, Y., Ratovitch, S., Hende, A., Peleg, S.: Clustered synopsis of surveillance video. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. IEEE (2009) Pritch, Y., Ratovitch, S., Hende, A., Peleg, S.: Clustered synopsis of surveillance video. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 195–200. IEEE (2009)
4.
Zurück zum Zitat Wang, X., Hirayama, T., Mase, K.: Viewpoint sequence recommendation based on contextual information for multiview video. IEEE Multimedia 22(4), 40–50 (2015) Wang, X., Hirayama, T., Mase, K.: Viewpoint sequence recommendation based on contextual information for multiview video. IEEE Multimedia 22(4), 40–50 (2015)
5.
Zurück zum Zitat Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM International Conference on Multimedia (MM), pp. 139–148. ACM (2012) Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM International Conference on Multimedia (MM), pp. 139–148. ACM (2012)
6.
Zurück zum Zitat Nguyen, D.T.D., Saini, M., Nguyen, V.T., Ooi, W.T.: Jiku director: a mobile video mashup system. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 477–478. ACM, Barcelona, Spain (2013) Nguyen, D.T.D., Saini, M., Nguyen, V.T., Ooi, W.T.: Jiku director: a mobile video mashup system. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 477–478. ACM, Barcelona, Spain (2013)
7.
Zurück zum Zitat Shrestha, P., Weda, H., Barbieri, M., Aarts, E.H., et al.: Automatic mashup generation from multiple-camera concert recordings. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 541–550. ACM, Firenze, Italy (2010) Shrestha, P., Weda, H., Barbieri, M., Aarts, E.H., et al.: Automatic mashup generation from multiple-camera concert recordings. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 541–550. ACM, Firenze, Italy (2010)
8.
Zurück zum Zitat Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Grap. (TOG) 33(4), 81:1–81:11 (2014) Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Grap. (TOG) 33(4), 81:1–81:11 (2014)
9.
Zurück zum Zitat Su, K., Naaman, M., Gurjar, A., Patel, M., Ellis, D.P.: Making a scene: alignment of complete sets of clips based on pairwise audio match. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 26. ACM, Hong Kong (2012) Su, K., Naaman, M., Gurjar, A., Patel, M., Ellis, D.P.: Making a scene: alignment of complete sets of clips based on pairwise audio match. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 26. ACM, Hong Kong (2012)
10.
Zurück zum Zitat Sinha, S.N., Pollefeys, M.: Synchronization and calibration of camera networks from silhouettes. In: International Conference on Pattern Recognition (ICPR), pp. 116–119. IEEE (2004) Sinha, S.N., Pollefeys, M.: Synchronization and calibration of camera networks from silhouettes. In: International Conference on Pattern Recognition (ICPR), pp. 116–119. IEEE (2004)
11.
Zurück zum Zitat Meyer, B., Stich, T., Magnor, M.A., Pollefeys, M.: Subframe temporal alignment of non-stationary cameras. In: British Machine Vision Conference (BMVC), pp. 1–10 (2008) Meyer, B., Stich, T., Magnor, M.A., Pollefeys, M.: Subframe temporal alignment of non-stationary cameras. In: British Machine Vision Conference (BMVC), pp. 1–10 (2008)
12.
Zurück zum Zitat Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comput. Vis. (IJCV) 68(1), 53–64 (2006) Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comput. Vis. (IJCV) 68(1), 53–64 (2006)
13.
Zurück zum Zitat Elhayek, A., Stoll, C., Kim, K., Seidel, H., Theobalt, C.: Feature-based multi-video synchronization with subframe accuracy. Pattern Recogn. 266–275 (2012) Elhayek, A., Stoll, C., Kim, K., Seidel, H., Theobalt, C.: Feature-based multi-video synchronization with subframe accuracy. Pattern Recogn. 266–275 (2012)
14.
Zurück zum Zitat Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 224–231. IEEE (2009) Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 224–231. IEEE (2009)
15.
Zurück zum Zitat Kammerl, J., Birkbeck, N., Inguva, S., Kelly, D., Crawford, A.J., Denman, H., Kokaram, A., Pantofaru, C.: Temporal synchronization of multiple audio signals. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 4603–4607. IEEE, Firenze, Italy (2014) Kammerl, J., Birkbeck, N., Inguva, S., Kelly, D., Crawford, A.J., Denman, H., Kokaram, A., Pantofaru, C.: Temporal synchronization of multiple audio signals. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 4603–4607. IEEE, Firenze, Italy (2014)
16.
Zurück zum Zitat Shrestha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 545–548. ACM, Augsburg, Germany (2007) Shrestha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 545–548. ACM, Augsburg, Germany (2007)
17.
Zurück zum Zitat Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003) Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system with an efficient search strategy. J. New Music Res. 32(2), 211–221 (2003)
18.
Zurück zum Zitat Cremer, M., Cook, R.: Machine-assisted editing of user-generated content. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 725,404–725,404–410 (2009) Cremer, M., Cook, R.: Machine-assisted editing of user-generated content. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 725,404–725,404–410 (2009)
19.
Zurück zum Zitat Laiola Guimaraes, R., Cesar, P., Bulterman, D.C., Zsombori, V., Kegel, I.: Creating personalized memories from social events: community-based support for multi-camera recordings of school concerts. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 303–312. ACM, Scottdale, AZ, USA (2011) Laiola Guimaraes, R., Cesar, P., Bulterman, D.C., Zsombori, V., Kegel, I.: Creating personalized memories from social events: community-based support for multi-camera recordings of school concerts. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 303–312. ACM, Scottdale, AZ, USA (2011)
20.
Zurück zum Zitat Korchagin, D., Garner, P.N., Dines, J.: Automatic temporal alignment of av data with confidence estimation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 269–272. IEEE (2010) Korchagin, D., Garner, P.N., Dines, J.: Automatic temporal alignment of av data with confidence estimation. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 269–272. IEEE (2010)
21.
Zurück zum Zitat Bano, S., Cavallaro, A.: Discovery and organization of multi-camera user-generated videos of the same event. Inf. Sci. 302, 108–121 (2015)CrossRef Bano, S., Cavallaro, A.: Discovery and organization of multi-camera user-generated videos of the same event. Inf. Sci. 302, 108–121 (2015)CrossRef
22.
Zurück zum Zitat Bano, S., Cavallaro, A.: Vicomp: composition of user-generated videos. Multimedia Tools Appl. (MTAP) 75(12), 1–24 (2015) Bano, S., Cavallaro, A.: Vicomp: composition of user-generated videos. Multimedia Tools Appl. (MTAP) 75(12), 1–24 (2015)
23.
Zurück zum Zitat Wu, Y., Mei, T., Xu, Y.Q., Yu, N., Li, S.: Movieup: Automatic mobile video mashup. IEEE Trans. Circ. Syst. Video Technol. 25(12), 1941–1954 (2015) Wu, Y., Mei, T., Xu, Y.Q., Yu, N., Li, S.: Movieup: Automatic mobile video mashup. IEEE Trans. Circ. Syst. Video Technol. 25(12), 1941–1954 (2015)
24.
Zurück zum Zitat Wilk, S., Kopf, S., Effelsberg, W.: Video composition by the crowd: a system to compose user-generated videos in near real-time. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 13–24. ACM, Portland, USA (2015) Wilk, S., Kopf, S., Effelsberg, W.: Video composition by the crowd: a system to compose user-generated videos in near real-time. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 13–24. ACM, Portland, USA (2015)
25.
Zurück zum Zitat Mei, T., Hua, X.S., Zhu, C.Z., Zhou, H.Q., Li, S.: Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. (CSVT) 17(6), 699–706 (2007) Mei, T., Hua, X.S., Zhu, C.Z., Zhou, H.Q., Li, S.: Home video visual quality assessment with spatiotemporal factors. IEEE Trans. Circ. Syst. Video Technol. (CSVT) 17(6), 699–706 (2007)
26.
Zurück zum Zitat Wilk, S., Effelsberg, W.: The influence of camera shakes, harmful occlusions and camera misalignment on the perceived quality in user generated video. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE, Chengdu, China (2014) Wilk, S., Effelsberg, W.: The influence of camera shakes, harmful occlusions and camera misalignment on the perceived quality in user generated video. In: IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE, Chengdu, China (2014)
27.
Zurück zum Zitat Daniyal, F., Cavallaro, A.: Multi-camera scheduling for video production. In: Conference for Visual Media Production (CVMP), pp. 11–20. IEEE (2011) Daniyal, F., Cavallaro, A.: Multi-camera scheduling for video production. In: Conference for Visual Media Production (CVMP), pp. 11–20. IEEE (2011)
28.
Zurück zum Zitat Daniyal, F., Taj, M., Cavallaro, A.: Content and task-based view selection from multiple video streams. Multimedia Tools Appl. (MTAP) 46(2–3), 235–258 (2010)CrossRef Daniyal, F., Taj, M., Cavallaro, A.: Content and task-based view selection from multiple video streams. Multimedia Tools Appl. (MTAP) 46(2–3), 235–258 (2010)CrossRef
29.
Zurück zum Zitat Goshorn, R., Goshorn, J., Goshorn, D., Aghajan, H.: Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In: ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 219–226. IEEE (2007) Goshorn, R., Goshorn, J., Goshorn, D., Aghajan, H.: Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In: ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp. 219–226. IEEE (2007)
30.
Zurück zum Zitat Jiang, H., Fels, S., Little, J.J.: Optimizing multiple object tracking and best view video synthesis. IEEE Trans. Multimedia (TOMM) 10(6), 997–1012 (2008) Jiang, H., Fels, S., Little, J.J.: Optimizing multiple object tracking and best view video synthesis. IEEE Trans. Multimedia (TOMM) 10(6), 997–1012 (2008)
31.
Zurück zum Zitat Vihavainen, S., Mate, S., Seppälä, L., Cricri, F., Curcio, I.D.: We want more: human-computer collaboration in mobile social video remixing of music concerts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–296. ACM (2011) Vihavainen, S., Mate, S., Seppälä, L., Cricri, F., Curcio, I.D.: We want more: human-computer collaboration in mobile social video remixing of music concerts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 287–296. ACM (2011)
32.
Zurück zum Zitat Lerch, A.: An introduction to audio content analysis: applications in signal processing and music informatics. Wiley (2012) Lerch, A.: An introduction to audio content analysis: applications in signal processing and music informatics. Wiley (2012)
33.
Zurück zum Zitat Dmytyk, E.: On film editing: an introduction to the art of film construction (1984) Dmytyk, E.: On film editing: an introduction to the art of film construction (1984)
34.
Zurück zum Zitat Canini, L., Benini, S., Leonardi, R.: Classifying cinematographic shot types. Multimedia Tools Appl. (MTAP) 62(1), 51–73 (2013) Canini, L., Benini, S., Leonardi, R.: Classifying cinematographic shot types. Multimedia Tools Appl. (MTAP) 62(1), 51–73 (2013)
35.
Zurück zum Zitat Carlier, A., Calvet, L., Nguyen, D.T.D., Ooi, W.T., Gurdjos, P., Charvillat, V.: 3d interest maps from simultaneous video recordings. In: ACM International Conference on Multimedia, pp. 577–586. ACM (2014) Carlier, A., Calvet, L., Nguyen, D.T.D., Ooi, W.T., Gurdjos, P., Charvillat, V.: 3d interest maps from simultaneous video recordings. In: ACM International Conference on Multimedia, pp. 577–586. ACM (2014)
36.
Zurück zum Zitat Zsombori, V., Frantzis, M., Guimaraes, R.L., Ursu, M.F., Cesar, P., Kegel, I., Craigie, R., Bulterman, D.C.: Automatic generation of video narratives from shared ugc. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 325–334. ACM, Eindhoven, Netherlands (2011) Zsombori, V., Frantzis, M., Guimaraes, R.L., Ursu, M.F., Cesar, P., Kegel, I., Craigie, R., Bulterman, D.C.: Automatic generation of video narratives from shared ugc. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 325–334. ACM, Eindhoven, Netherlands (2011)
37.
Zurück zum Zitat Nguyen, D.T.D., Carlier, A., Ooi, W.T., Charvillat, V.: Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps. In: Proceedings of the ACM International Conference on Multimedia, pp. 765–766. ACM, Orlando, FL, USA (2014) Nguyen, D.T.D., Carlier, A., Ooi, W.T., Charvillat, V.: Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps. In: Proceedings of the ACM International Conference on Multimedia, pp. 765–766. ACM, Orlando, FL, USA (2014)
38.
Zurück zum Zitat Beerends, J.G., De Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. (AES) 47(5), 355–362 (1999) Beerends, J.G., De Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. (AES) 47(5), 355–362 (1999)
39.
Zurück zum Zitat Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: ACM Multimedia Systems Conference (MMSys), pp. 108–113. ACM (2013) Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: ACM Multimedia Systems Conference (MMSys), pp. 108–113. ACM (2013)
40.
Zurück zum Zitat Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Trans. Graphics (TOG) 29(4), 87. ACM (2010) Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. ACM Trans. Graphics (TOG) 29(4), 87. ACM (2010)
41.
Zurück zum Zitat Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Advances in Neural Information Processing Systems (NIPS), pp. 431–439 (2012) Park, H.S., Jain, E., Sheikh, Y.: 3D social saliency from head-mounted cameras. In: Advances in Neural Information Processing Systems (NIPS), pp. 431–439 (2012)
43.
Zurück zum Zitat Frantzis, M., Zsombori, V., Ursu, M., Guimaraes, R.L., Kegel, I., Craigie, R.: Interactive video stories from user generated content: a school concert use case. In: International Conference on Interactive Digital Storytelling, pp. 183–195. Springer, Berlin (2012) Frantzis, M., Zsombori, V., Ursu, M., Guimaraes, R.L., Kegel, I., Craigie, R.: Interactive video stories from user generated content: a school concert use case. In: International Conference on Interactive Digital Storytelling, pp. 183–195. Springer, Berlin (2012)
Metadaten
Titel
Automated Video Mashups: Research and Challenges
verfasst von
Mukesh Kumar Saini
Wei Tsang Ooi
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-65840-7_6

Neuer Inhalt