Skip to main content
Top

2016 | OriginalPaper | Chapter

ActionSnapping: Motion-Based Video Synchronization

Authors : Jean-Charles Bazin, Alexander Sorkine-Hornung

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Video synchronization is a fundamental step for many applications in computer vision, ranging from video morphing to motion analysis. We present a novel method for synchronizing action videos where a similar action is performed by different people at different times and different locations with different local speed changes, e.g., as in sports like weightlifting, baseball pitch, or dance. Our approach extends the popular “snapping” tool of video editing software and allows users to automatically snap action videos together in a timeline based on their content. Since the action can take place at different locations, existing appearance-based methods are not appropriate. Our approach leverages motion information, and computes a nonlinear synchronization of the input videos to establish frame-to-frame temporal correspondences. We demonstrate our approach can be applied for video synchronization, video annotation, and action snapshots. Our approach has been successfully evaluated with ground truth data and a user study.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43, 16 (2011)CrossRef Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43, 16 (2011)CrossRef
2.
go back to reference Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012) Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: CVPR (2012)
3.
go back to reference Averbuch-Elor, H., Cohen-Or, D.: RingIt: ring-ordering casual photos of a temporal event. TOG 34, 33 (2015)CrossRefMATH Averbuch-Elor, H., Cohen-Or, D.: RingIt: ring-ordering casual photos of a temporal event. TOG 34, 33 (2015)CrossRefMATH
4.
go back to reference Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. TOG (SIGGRAPH) 29, 87 (2010) Ballan, L., Brostow, G.J., Puwein, J., Pollefeys, M.: Unstructured video-based rendering: interactive exploration of casually captured videos. TOG (SIGGRAPH) 29, 87 (2010)
6.
go back to reference Bazin, J.C., Malleson, C., Wang, O., Bradley, D., Beeler, T., Hilton, A., Sorkine-Hornung, A.: FaceDirector: continuous control of facial performance in video. In: ICCV (2015) Bazin, J.C., Malleson, C., Wang, O., Bradley, D., Beeler, T., Hilton, A., Sorkine-Hornung, A.: FaceDirector: continuous control of facial performance in video. In: ICCV (2015)
7.
go back to reference Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surv. 27, 433–466 (1995)CrossRef Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surv. 27, 433–466 (1995)CrossRef
8.
go back to reference Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIGGRAPH (1997) Bregler, C., Covell, M., Slaney, M.: Video rewrite: driving visual speech with audio. In: SIGGRAPH (1997)
9.
go back to reference Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. TPAMI 24, 1409–1424 (2002)CrossRef Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. TPAMI 24, 1409–1424 (2002)CrossRef
10.
go back to reference Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. TOG (SIGGRAPH Asia) 30(6) (2011) Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. TOG (SIGGRAPH Asia) 30(6) (2011)
11.
go back to reference Diego, F., Serrat, J., López, A.M.: Joint spatio-temporal alignment of sequences. Trans. Multimedia 15, 1377–1387 (2013)CrossRef Diego, F., Serrat, J., López, A.M.: Joint spatio-temporal alignment of sequences. Trans. Multimedia 15, 1377–1387 (2013)CrossRef
13.
go back to reference Evangelidis, G.D., Bauckhage, C.: Efficient subframe video alignment using short descriptors. TPAMI 35, 2371–2386 (2013)CrossRef Evangelidis, G.D., Bauckhage, C.: Efficient subframe video alignment using short descriptors. TPAMI 35, 2371–2386 (2013)CrossRef
14.
go back to reference Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis (2003) Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis (2003)
15.
go back to reference Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: From canonical poses to 3D motion capture using a single camera. TPAMI 32, 1165–1181 (2010)CrossRef Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: From canonical poses to 3D motion capture using a single camera. TPAMI 32, 1165–1181 (2010)CrossRef
16.
go back to reference Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. In: SIGGRAPH (1991) Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. In: SIGGRAPH (1991)
17.
go back to reference Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: CVPR (2014) Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: CVPR (2014)
18.
go back to reference Girshick, R.B., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011) Girshick, R.B., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.W.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011)
19.
go back to reference Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)CrossRefMATH Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)CrossRefMATH
20.
go back to reference Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: CVPR (2009) Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., Seidel, H.: Markerless motion capture with unsynchronized moving cameras. In: CVPR (2009)
21.
go back to reference Hsu, E., Pulli, K., Popovic, J.: Style translation for human motion. TOG (SIGGRAPH) 24, 1082–1089 (2005)CrossRef Hsu, E., Pulli, K., Popovic, J.: Style translation for human motion. TOG (SIGGRAPH) 24, 1082–1089 (2005)CrossRef
22.
go back to reference Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013) Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013)
23.
go back to reference Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M.: Being John Malkovich. In: ECCV (2010) Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M.: Being John Malkovich. In: ECCV (2010)
24.
go back to reference Klose, F., Wang, O., Bazin, J.C., Magnor, M.A., Sorkine-Hornung, A.: Sampling based scene-space video processing. TOG (SIGGRAPH) 34, 67 (2015) Klose, F., Wang, O., Bazin, J.C., Magnor, M.A., Sorkine-Hornung, A.: Sampling based scene-space video processing. TOG (SIGGRAPH) 34, 67 (2015)
25.
26.
go back to reference Li, F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005) Li, F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
27.
go back to reference Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V.: Semi-automated video morphing. In: CGF (Eurographics Symposium on Rendering) (2014) Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V.: Semi-automated video morphing. In: CGF (Eurographics Symposium on Rendering) (2014)
28.
go back to reference Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V., Yu, J.: Automating image morphing using structural similarity on a halfway domain. TOG 33, 168 (2014)CrossRef Liao, J., Lima, R.S., Nehab, D., Hoppe, H., Sander, P.V., Yu, J.: Automating image morphing using structural similarity on a halfway domain. TOG 33, 168 (2014)CrossRef
29.
go back to reference Liao, Z., Joshi, N., Hoppe, H.: Automated video looping with progressive dynamism. TOG (SIGGRAPH) 32, 4 (2013)MATH Liao, Z., Joshi, N., Hoppe, H.: Automated video looping with progressive dynamism. TOG (SIGGRAPH) 32, 4 (2013)MATH
30.
go back to reference Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI (2011) Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. TPAMI (2011)
31.
go back to reference Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRef
32.
go back to reference Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. TOG (SIGGRAPH) 23, 309–314 (2004)CrossRef Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. TOG (SIGGRAPH) 23, 309–314 (2004)CrossRef
33.
go back to reference Sand, P., Teller, S.J.: Video matching. TOG (SIGGRAPH) 23(3), 592–599 (2004)CrossRef Sand, P., Teller, S.J.: Video matching. TOG (SIGGRAPH) 23(3), 592–599 (2004)CrossRef
34.
go back to reference Sand, P., Teller, S.J.: Particle video: Long-range motion estimation using point trajectories. IJCV 80, 72–91 (2008)CrossRef Sand, P., Teller, S.J.: Particle video: Long-range motion estimation using point trajectories. IJCV 80, 72–91 (2008)CrossRef
35.
go back to reference Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007) Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)
36.
go back to reference Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994) Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)
37.
go back to reference Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011) Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
38.
go back to reference Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. Technical Report CRCV-TR-12-01 (2012) Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. Technical Report CRCV-TR-12-01 (2012)
39.
go back to reference Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: creating high-quality images from video clips. TVCG 18, 1868–1879 (2012) Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: creating high-quality images from video clips. TVCG 18, 1868–1879 (2012)
40.
go back to reference Urtasun, R., Fleet, D.J., Fua, P.: Temporal motion models for monocular and multiview 3D human body tracking. CVIU 104, 157–177 (2006) Urtasun, R., Fleet, D.J., Fua, P.: Temporal motion models for monocular and multiview 3D human body tracking. CVIU 104, 157–177 (2006)
41.
go back to reference Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. TPAMI 34, 480–492 (2012)CrossRef Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. TPAMI 34, 480–492 (2012)CrossRef
42.
go back to reference Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: CVPR (2014) Wang, C., Wang, Y., Lin, Z., Yuille, A.L., Gao, W.: Robust estimation of 3D human poses from a single image. In: CVPR (2014)
43.
go back to reference Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013) Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
44.
go back to reference Wang, O., Schroers, C., Zimmer, H., Gross, M., Sorkine-Hornung, A.: VideoSnapping: interactive synchronization of multiple videos. TOG (SIGGRAPH) 33, 77 (2014) Wang, O., Schroers, C., Zimmer, H., Gross, M., Sorkine-Hornung, A.: VideoSnapping: interactive synchronization of multiple videos. TOG (SIGGRAPH) 33, 77 (2014)
45.
go back to reference Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013) Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013)
46.
go back to reference Xu, X., Wan, L., Liu, X., Wong, T., Wang, L., Leung, C.: Animating animal motion from still. TOG (SIGGRAPH Asia) 27, 117 (2008) Xu, X., Wan, L., Liu, X., Wong, T., Wang, L., Leung, C.: Animating animal motion from still. TOG (SIGGRAPH Asia) 27, 117 (2008)
47.
go back to reference Yang, F., Bourdev, L.D., Shechtman, E., Wang, J., Metaxas, D.N.: Facial expression editing in video using a temporally-smooth factorization. In: CVPR (2012) Yang, F., Bourdev, L.D., Shechtman, E., Wang, J., Metaxas, D.N.: Facial expression editing in video using a temporally-smooth factorization. In: CVPR (2012)
48.
go back to reference Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)CrossRef Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)CrossRef
49.
go back to reference Zhou, F., De la Torre, F.: Canonical time warping for alignment of human behavior. In: NIPS (2009) Zhou, F., De la Torre, F.: Canonical time warping for alignment of human behavior. In: NIPS (2009)
50.
go back to reference Zhou, F., De la Torre, F.: Generalized time warping for multi-modal alignment of human motion. In: CVPR (2012) Zhou, F., De la Torre, F.: Generalized time warping for multi-modal alignment of human motion. In: CVPR (2012)
Metadata
Title
ActionSnapping: Motion-Based Video Synchronization
Authors
Jean-Charles Bazin
Alexander Sorkine-Hornung
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46454-1_10

Premium Partner