Skip to main content

2016 | OriginalPaper | Buchkapitel

ActionSnapping: Motion-Based Video Synchronization

verfasst von : Jean-Charles Bazin, Alexander Sorkine-Hornung

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Video synchronization is a fundamental step for many applications in computer vision, ranging from video morphing to motion analysis. We present a novel method for synchronizing action videos where a similar action is performed by different people at different times and different locations with different local speed changes, e.g., as in sports like weightlifting, baseball pitch, or dance. Our approach extends the popular “snapping” tool of video editing software and allows users to automatically snap action videos together in a timeline based on their content. Since the action can take place at different locations, existing appearance-based methods are not appropriate. Our approach leverages motion information, and computes a nonlinear synchronization of the input videos to establish frame-to-frame temporal correspondences. We demonstrate our approach can be applied for video synchronization, video annotation, and action snapshots. Our approach has been successfully evaluated with ground truth data and a user study.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43, 16 (2011)CrossRef Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43, 16 (2011)CrossRef
2.
Zurück zum Zitat Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012) Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)
3.
Zurück zum Zitat Averbuch-Elor, H., Cohen-Or, D.: RingIt: ring-ordering casual photos of a temporal event. TOG 34, 33 (2015)CrossRefMATH Averbuch-Elor, H., Cohen-Or, D.: RingIt: ring-ordering casual photos of a temporal event. TOG 34, 33 (2015)CrossRefMATH
4.
Zurück zum Zitat Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. TOG (SIGGRAPH) 29, 87 (2010) Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. TOG (SIGGRAPH) 29, 87 (2010)
6.
Zurück zum Zitat Bazin, J.C., Malleson, C., Wang, O., Bradley, D., Beeler, T., Hilton, A., Sorkine-Hornung, A.: FaceDirector: continuous control of facial performance in video. In: ICCV (2015) Bazin, J.C., Malleson, C., Wang, O., Bradley, D., Beeler, T., Hilton, A., Sorkine-Hornung, A.: FaceDirector: continuous control of facial performance in video. In: ICCV (2015)
7.
Zurück zum Zitat Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surv. 27, 433–466 (1995)CrossRef Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surv. 27, 433–466 (1995)CrossRef
8.
Zurück zum Zitat Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIGGRAPH (1997) Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIGGRAPH (1997)
9.
Zurück zum Zitat Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. TPAMI 24, 1409–1424 (2002)CrossRef Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. TPAMI 24, 1409–1424 (2002)CrossRef
10.
Zurück zum Zitat Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. TOG (SIGGRAPH Asia) 30(6) (2011) Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. TOG (SIGGRAPH Asia) 30(6) (2011)
11.
Zurück zum Zitat Diego, F., Serrat, J., López, A.M.: Joint spatio-temporal alignment of sequences. Trans. Multimedia 15, 1377–1387 (2013)CrossRef Diego, F., Serrat, J., López, A.M.: Joint spatio-temporal alignment of sequences. Trans. Multimedia 15, 1377–1387 (2013)CrossRef
13.
Zurück zum Zitat Evangelidis, G.D., Bauckhage, C.: Efficient subframe video alignment using short descriptors. TPAMI 35, 2371–2386 (2013)CrossRef Evangelidis, G.D., Bauckhage, C.: Efficient subframe video alignment using short descriptors. TPAMI 35, 2371–2386 (2013)CrossRef
14.
Zurück zum Zitat Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis (2003) Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis (2003)
15.
Zurück zum Zitat Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: From canonical poses to 3D motion capture using a single camera. TPAMI 32, 1165–1181 (2010)CrossRef Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: From canonical poses to 3D motion capture using a single camera. TPAMI 32, 1165–1181 (2010)CrossRef
16.
Zurück zum Zitat Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. In: SIGGRAPH (1991) Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. In: SIGGRAPH (1991)
17.
Zurück zum Zitat Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: CVPR (2014) Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: CVPR (2014)
18.
Zurück zum Zitat Girshick, R.B., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011) Girshick, R.B., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011)
19.
Zurück zum Zitat Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)CrossRefMATH Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)CrossRefMATH
20.
Zurück zum Zitat Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: CVPR (2009) Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: CVPR (2009)
21.
Zurück zum Zitat Hsu, E., Pulli, K., Popovic, J.: Style translation for human motion. TOG (SIGGRAPH) 24, 1082–1089 (2005)CrossRef Hsu, E., Pulli, K., Popovic, J.: Style translation for human motion. TOG (SIGGRAPH) 24, 1082–1089 (2005)CrossRef
22.
Zurück zum Zitat Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013) Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013)
23.
Zurück zum Zitat Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M.: Being John Malkovich. In: ECCV (2010) Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M.: Being John Malkovich. In: ECCV (2010)
24.
Zurück zum Zitat Klose, F., Wang, O., Bazin, J.C., Magnor, M.A., Sorkine-Hornung, A.: Sampling based scene-space video processing. TOG (SIGGRAPH) 34, 67 (2015) Klose, F., Wang, O., Bazin, J.C., Magnor, M.A., Sorkine-Hornung, A.: Sampling based scene-space video processing. TOG (SIGGRAPH) 34, 67 (2015)
25.
26.
Zurück zum Zitat Li, F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005) Li, F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
27.
Zurück zum Zitat Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V.: Semi-automated video morphing. In: CGF (Eurographics Symposium on Rendering) (2014) Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V.: Semi-automated video morphing. In: CGF (Eurographics Symposium on Rendering) (2014)
28.
Zurück zum Zitat Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V., Yu, J.: Automating image morphing using structural similarity on a halfway domain. TOG 33, 168 (2014)CrossRef Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V., Yu, J.: Automating image morphing using structural similarity on a halfway domain. TOG 33, 168 (2014)CrossRef
29.
Zurück zum Zitat Liao, Z., Joshi, N., Hoppe, H.: Automated video looping with progressive dynamism. TOG (SIGGRAPH) 32, 4 (2013)MATH Liao, Z., Joshi, N., Hoppe, H.: Automated video looping with progressive dynamism. TOG (SIGGRAPH) 32, 4 (2013)MATH
30.
Zurück zum Zitat Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI (2011) Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI (2011)
31.
Zurück zum Zitat Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef
32.
Zurück zum Zitat Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. TOG (SIGGRAPH) 23, 309–314 (2004)CrossRef Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. TOG (SIGGRAPH) 23, 309–314 (2004)CrossRef
33.
Zurück zum Zitat Sand, P., Teller, S.J.: Video matching. TOG (SIGGRAPH) 23(3), 592–599 (2004)CrossRef Sand, P., Teller, S.J.: Video matching. TOG (SIGGRAPH) 23(3), 592–599 (2004)CrossRef
34.
Zurück zum Zitat Sand, P., Teller, S.J.: Particle video: Long-range motion estimation using point trajectories. IJCV 80, 72–91 (2008)CrossRef Sand, P., Teller, S.J.: Particle video: Long-range motion estimation using point trajectories. IJCV 80, 72–91 (2008)CrossRef
35.
Zurück zum Zitat Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007) Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)
36.
Zurück zum Zitat Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994) Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)
37.
Zurück zum Zitat Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011) Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
38.
Zurück zum Zitat Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. Technical Report CRCV-TR-12-01 (2012) Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. Technical Report CRCV-TR-12-01 (2012)
39.
Zurück zum Zitat Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: creating high-quality images from video clips. TVCG 18, 1868–1879 (2012) Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: creating high-quality images from video clips. TVCG 18, 1868–1879 (2012)
40.
Zurück zum Zitat Urtasun, R., Fleet, D.J., Fua, P.: Temporal motion models for monocular and multiview 3D human body tracking. CVIU 104, 157–177 (2006) Urtasun, R., Fleet, D.J., Fua, P.: Temporal motion models for monocular and multiview 3D human body tracking. CVIU 104, 157–177 (2006)
41.
Zurück zum Zitat Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. TPAMI 34, 480–492 (2012)CrossRef Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. TPAMI 34, 480–492 (2012)CrossRef
42.
Zurück zum Zitat Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: CVPR (2014) Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: CVPR (2014)
43.
Zurück zum Zitat Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
44.
Zurück zum Zitat Wang, O., Schroers, C., Zimmer, H., Gross, M., Sorkine-Hornung, A.: VideoSnapping: interactive synchronization of multiple videos. TOG (SIGGRAPH) 33, 77 (2014) Wang, O., Schroers, C., Zimmer, H., Gross, M., Sorkine-Hornung, A.: VideoSnapping: interactive synchronization of multiple videos. TOG (SIGGRAPH) 33, 77 (2014)
45.
Zurück zum Zitat Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013) Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013)
46.
Zurück zum Zitat Xu, X., Wan, L., Liu, X., Wong, T., Wang, L., Leung, C.: Animating animal motion from still. TOG (SIGGRAPH Asia) 27, 117 (2008) Xu, X., Wan, L., Liu, X., Wong, T., Wang, L., Leung, C.: Animating animal motion from still. TOG (SIGGRAPH Asia) 27, 117 (2008)
47.
Zurück zum Zitat Yang, F., Bourdev, L.D., Shechtman, E., Wang, J., Metaxas, D.N.: Facial expression editing in video using a temporally-smooth factorization. In: CVPR (2012) Yang, F., Bourdev, L.D., Shechtman, E., Wang, J., Metaxas, D.N.: Facial expression editing in video using a temporally-smooth factorization. In: CVPR (2012)
48.
Zurück zum Zitat Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)CrossRef Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)CrossRef
49.
Zurück zum Zitat Zhou, F., De la Torre, F.: Canonical time warping for alignment of human behavior. In: NIPS (2009) Zhou, F., De la Torre, F.: Canonical time warping for alignment of human behavior. In: NIPS (2009)
50.
Zurück zum Zitat Zhou, F., De la Torre, F.: Generalized time warping for multi-modal alignment of human motion. In: CVPR (2012) Zhou, F., De la Torre, F.: Generalized time warping for multi-modal alignment of human motion. In: CVPR (2012)
Metadaten
Titel
ActionSnapping: Motion-Based Video Synchronization
verfasst von
Jean-Charles Bazin
Alexander Sorkine-Hornung
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46454-1_10