1 Introduction
- We rethink the theory of periodic motion to arrive at a classification of periodic motion. Starting from the 3D motion field induced by an object periodically moving through space, we decompose the motion into three elementary components: divergence, curl and shear. From the motion field decomposition and the field’s temporal dynamics, we identify 9 fundamental cases of periodic motion in 3D. For the 2D perception of 3D periodic motion we consider the observer’s viewpoint relative to the motion. Two viewpoint extremes are identified, from which 18 cases of 2D repetitive appearance emerge.
- Our spatiotemporal filtering method addresses the wide variety of repetitive appearances and effectively handles non-stationary motion. Specifically, diversity in motion appearance handled by representing video as six differential motion maps that emerge from the theory. To identify the repetitive dynamics in the possibly non-stationary video, we use the continuous wavelet transform to produce a time-frequency distribution densely over the video. Directly from the wavelet responses we localize the repetitive motion and determine the repetitive contents.
- Extending beyond the video dataset of Levy and Wolf (2015), we propose a new dataset for repetition estimation, that is more realistic and challenging in terms of non-static and non-stationary videos. To encourage further research on video repetition, we will make the dataset and source code available as download.
2 Related Work
2.1 Repetition Estimation
2.2 Categorization of Motion Types
3 Repetitive Motion
3.1 Motion Field Decomposition
3.2 Intrinsic Periodic Motion in 3D
3.2.1 Motion Types
3.2.2 Motion Continuities
3.2.3 Categorization of Periodic Motion
3.3 Visual Recurrence in 2D
3.4 Non-static Repetition
3.5 Non-stationary Repetition
4 Method
4.1 Differential Geometric Motion Maps
4.2 Dense Temporal Filtering
4.3 Continuous Wavelet Transform
4.4 Combining Spectral Power Maps
4.5 Spatial Segmentation
4.6 Repetition Counting
5 Experiments
5.1 Datasets and Evaluation
YTSegments | QUVA repetition | |
---|---|---|
Number of videos | 100 | 100 |
Duration min/max (s) | 2.1/68.9 | 2.5/64.2 |
Duration avg. (s) | \(14.9 \pm 9.8\) | \(17.6 \pm 13.3\) |
Count avg. ± SD | \(10.8 \pm 6.5\) | \(12.5 \pm 10.4\) |
Count min/max | 4/51 | 4/63 |
Cycle length variation | 0.22 | 0.36 |
Camera motion | 21 | 53 |
Superposed translation | 7 | 27 |
5.2 Implementation Details
5.3 Temporal Filtering: Fourier Versus Wavelets
5.4 Viewpoint Invariance
5.5 Diversity in Motion Maps
MAE | OBOA | # Selected | |
---|---|---|---|
\({\varvec{\nabla }}{\varvec{\cdot }} \mathbf {F}\) | \(77.8 \pm 90.8\) | 0.21 | 10 |
\({\varvec{\nabla }}{\varvec{\times }} \mathbf {F}\) | \(53.0 \pm 65.5\) | 0.32 | 11 |
\(\nabla _x F_x\) | \(58.1 \pm 63.5\) | 0.29 | 15 |
\(\nabla _y F_y\) | \(59.5 \pm 68.4\) | 0.31 | 9 |
\(F_x\) | \(49.6 \pm 48.0\) | 0.35 | 25 |
\(F_y\) | \(42.0 \pm 45.3\) | 0.43 | 30 |
Oracle best | \(24.1 \pm 33.5\) | 0.63 | 100 |
5.6 Video Acceleration Sensitivity
5.7 Motion Segmentation
5.8 Comparison to the State-of-the-Art
YTSegments | QUVA repetition | |||
---|---|---|---|---|
Motion segmentation method | MAE \(\downarrow \) | OBOA \(\uparrow \) | MAE \(\downarrow \) | OBOA \(\uparrow \) |
Full-frame | \(46.0 \pm 67.2\) | 0.28 | \(60.8 \pm 49.4\) | 0.22 |
Papazoglou and Ferrari (2013) | \(13.1 \pm 20.3\) | 0.78 | \(42.6 \pm 49.2\) | 0.44 |
Tokmakov et al. (2017) | \(21.6 \pm 57.2\) | 0.76 | \(38.9 \pm 39.2\) | 0.42 |
Differential geometry (this paper) | \(\varvec{9.4 \pm 17.4}\) | 0.89 | \(\varvec{26.1 \pm 39.6}\) | 0.62 |
YTSegments | QUVA repetition | |||
---|---|---|---|---|
MAE \(\downarrow \) | OBOA \(\uparrow \) | MAE \(\downarrow \) | OBOA \(\uparrow \) | |
Pogalin et al. (2008) | \(21.9 \pm 30.1\) | 0.68 | \(38.5 \pm 37.6\) | 0.49 |
Levy and Wolf (2015) | \(\mathbf {6.5 \pm \phantom {0}9.2}\) | 0.90 | \(48.2 \pm 61.5\) | 0.45 |
This paper | \(9.4 \pm 17.4\) | 0.89 | \(\mathbf {26.1 \pm 39.6}\) | 0.62 |
YTSegments | QUVA repetition | |||
---|---|---|---|---|
MAE \(\downarrow \) | OBOA \(\uparrow \) | MAE \(\downarrow \) | OBOA \(\uparrow \) | |
TV-L\(^1\) | \(9.8 \pm 17.9\) | 0.89 | \(26.5 \pm 67.5\) | 0.67 |
EpicFlow | \(9.7 \pm 17.9\) | 0.88 | \(30.8 \pm 38.2\) | 0.55 |
FlowNet 2.0 | \(\mathbf {9.4 \pm 17.4}\) | 0.89 | \(\mathbf {26.1 \pm 39.6}\) | 0.62 |