The description of our taxonomy will start with the underlying structure deformation model category, divided into statistical and physical based models.
2.1.1 Statistical
This set of algorithms apply a statistical deformation model with no direct connection to the physical process of structure deformations. They are in general heuristically defined a priori to enforce constraints that can reduce the ill-posedness of the
nrs f m problem. The most used low-rank model in the
nrs f m literature falls into this category, utilizing the assumption that 3D deformations are well described by linear subspaces (also called basis shapes). The low-rank model was first introduced almost 20 years ago by Bregler et al. (
2000) solving
nrs f m through the formalisation of a factorization problem, as analogously proposed by Tomasi and Kanade for the rigid case (Tomasi and Kanade
1992). However, strong nonlinear deformations, such as the one appearing in articulated shapes, may drastically reduce the effectiveness of such models. Moreover, the first low-rank model presented in Bregler et al. (
2000) acted mainly as a constraint over the spatial distribution of the deforming point cloud and it did not restrict the temporal variations of the deforming object.
Differently, Gotardo and Martinez (
2011a) had the intuition to use the very same DCT bases to model camera and deformation motion instead, assuming those factors are smooth in a video sequence. This approach was later expanded on by explicitly modeling a set of complementary rank-3 spaces, and to constrain the magnitude of deformations in the basis shapes (Gotardo and Martinez
2011c). An extension of this framework, increased the generalization of the model to non-linear deformations, with a kernel transformation on the 3D shape space using radial basis functions (Gotardo and Martinez
2011b). This switch of perspective addressed the main issue of increasing the number of available DCT bases, allowing more diverse motions, while not restricting the complexity of deformations. Later, further extension and optimization have been made to low-rank and DCT based approaches. Valmadre and Lucey (
2012) noticed that the trajectory should be a low-frequency signal, thus laying the ground for an automatic selection of DCT basis rank via penalizing the trajectory’s response to one or more high-pass filters. Moreover, spatio-temporal constraints have been imposed both for temporal and spatial deformations (Akhter et al.
2012).
A related idea proposed by Li et al. (
2018) attempts at grouping recurrent deformations in order to better describe deformations. At its core, the method has an additional clustering step that links together similar deformations. Recently a new prior model, related to the Kronecker–Markov structure of the covariance of time-varying 3D point, very well generalizes several priors introduced previously (Simon et al.
2017). Another recent improvement is given by Dawud Ansari et al. (
2017) usage of DCT basis in conjunction with singular value thresholding for camera pose estimation.
Similar spatial and temporal priors have been introduced as regularization terms while optimizing a cost function solving for the
nrs f m problem, mainly using a low-rank model only. Torresani et al. (
2008) proposed a probabilistic PCA model for modelling deformations by marginalizing some of the variables, assuming Gaussian distributions for both noise and deformations. Moreover, in the same framework, a linear dynamical model was used to represent the deformation at the current frame as a linear function of the previous. Brand and Bhotika (
2001) penalizes deformations over the mean shape of the object by introducing sensible parameters over the degree of flexibility of the shape. Del Bue et al. (
2005a) instead compute a more robust non-rigid factorization, using a 3D mean shape as a prior for
nrs f m (Del Bue
2013). In a non-linear optimization framework, Olsen and Bartoli (
2008) include
\(l_2\) penalties both on the frame-by-frame deformations and on the closeness of the reconstructed points in 3D given their 2D projections. Of course, penalty costs introduce a new set of hyper-parameters that weights the terms, implying the need for further tuning, that can be impracticable when cross-validation is not an option. Regularization has also been introduced in formulations of Bundle Adjustment for
nrs f m (Aanæs and Kahl
2002) by including smoothness deformations via
\(l_2\) penalties mainly (Del Bue et al.
2007) or constraints over the rigidity of pre-segmented points in the measurement (Del Bue et al.
2006).
Another important statistical principal is enforcing that low-rank bases are independent. In the coarse to fine approach of Bartoli et al. (
2008), base shapes are computed sequentially by adding the basis, which explains most of the variance in respect to the previous ones. They also impose a stopping criteria, thus, achieving the automatic computation of the overall number of bases. The concept of basis independence clearly calls for a statistical model close to independent component analysis (ICA). To this end, Brandt et al. (
2011) proposed a prior term to minimize the mutual information of each basis in the
nrs f m model. Low-rank models are indeed compact but limited in the expressiveness of complex deformations, as noted in Zhu et al. (
2014). To solve this problem, Zhu et al. (
2014) use a temporal union of subspace that associate at each cluster of frames in time a specific subspace. Such association is solved by adopting a cost function promoting self-expressiveness (Elhamifar and Vidal
2013). Similarly, both spatial and temporal union of subspaces was used also to account for independently deforming multiple shapes (Agudo and Moreno-Noguer
2017a; Kumar et al.
2017). Interestingly, such union of subspaces strategy was previously adopted to solve for the multi-body 3D reconstruction of independently moving objects (Zappella et al.
2013). Another option is to use an over-complete representation of subspaces that can still be used by imposing sparsity over the selected bases (Kong and Lucey
2016). In this way, 3D shapes in time can have a compact representation, and they can be theoretically characterized as a block sparse dictionary learning problem. In a similar spirit, Hamsici et al. (
2012) propose to use the input data for learning spatially smooth shape weights using rotation invariant kernels.
All these approaches for addressing
nrs f m with a low-rank model have provided several non-linear optimization procedures, mainly using alternating least squares (ALS), Lagrange multipliers and alternating direction method of multipliers (ADMM). Torresani et al. first proposed to alternate between the solution of camera matrices, deformation parameters and basis shapes. This first initial solution was then extended by Wang et al. (
2008) by constraining the camera matrices to be orthonormal at each iteration, while Paladini et al. (
2012) strictly enforced the matrix manifold of the camera matrices to increase the chances to converge to the global optimum of the cost function. All these methods were not designed to be strictly convergent, for this reason, a bilinear augmented multiplier method (BALM) (Del Bue et al.
2012) was introduced to be convergent while implying all the problems constraints being satisfied. Furthermore, robustness in terms of outlying data was then included to improve results in a proximal method with theoretical guarantees of convergence to a stationary point (Wang et al.
2015).
Despite the non-linearity of the problem, it is possible to relax the rank constraint with the trace norm and solve the problem with convex programming. Following this strategy, Dai et al. (
2014) provided one of the first effective closed form solutions to the low-rank problem. Although their convex solution, resulting from relaxation, did not provide the best performance, a following iterative optimization scheme gave improved results. In this respect, Kumar et al. (
2017) proposed a further improvement on their previous approach, where deformations are represented as a spatio-temporal union of subspaces rather than a single subspace. Thus complex deformation can be represented as the union of several simple ones as already described in the previous paragraphs. To notice that evaluation is performed with synthetic generated data only.
Later Kumar (
2020) proposed a set of improvements over Dai et al. approach
2014. Namely, metric rectification was performed using incomplete information by choosing arbitrarily a triplet of solutions among the one available. The solution in Kumar (
2020) proposes a method to select the best among the available triplets using a rotation smoothness heuristic as a decision criteria. Then, a further improvement is algorithmic. Instead of using Dai et al. strategy with a matrix shrinkage operator that equally penalizes all the singular values, the method in Kumar (
2020) introduces a weighted nuclear norm function during optimisation. More recently Ornhag and Olsson (
2020) proposed a unified optimization framework for low-rank inducing penalties that can be readily applied to solve for
nrs f m. The main advantage of the approach is the ability to combining bias reduction in the estimation and nonconvex low-rank inducing objectives in the form of a weighted nuclear norm.
On the one hand, the procrustean normal distribution (PND) model was proposed as an effective way to implicitly separate rigid and non-rigid deformations (Lee et al.
2017; Park et al.
2018). This separation provides a relevant regularization, since rigid motion can be used to obtain a more robust camera estimation, while deformations are still sampled as a normal distribution as done similarly previously (Torresani et al.
2008). Such a separation is obtained by enforcing an alignment between the reconstructed 3D shapes at every frame. This should in practice factor out the rigid transformations from the statistical distribution of deformations. The PND model has been then extended to deal with more complex deformations and longer sequences (Cho et al.
2016).
2.1.2 Physical
Physical models represent a less studied class wrt. NRSfM, which should ideally be the most accurate for modelling nrs f m. Of course, applying the right physical model requires a knowledge of the deformation type and object material, which is information not readily available a priori.
A first class of physical models assume that the non-rigid object is a piecewise partition into parts, i.e. a collection of pre-defined or estimated patches that are mostly rigid or slightly deformable. This observation is certainly true for objects with articulated deformations, as it naturally models natural and mechanical shapes connected into parts. One of the first approaches to use this strategy is given by Varol et al. (
2009). By preselecting a set of overlapping patches from the 2D image points, and assuming each patch is rigid, homography constraints can be imposed at each patch, followed by global 3D consistency being enforced using the overlapping points. However, the rigidity of a patch, even if small, is a very hard constraint to impose and it does not generalise well for every non-rigid shape. Moreover, dense point-matches over the image sequence are required to ensure a set of overlapping points among all the patches. A relaxation to the piece-wise rigid constraint was given by Fayad et al. (
2010), assuming each patch deforming with a quadratic physical model, thus, accounting for linear and bending deformations. These methods all require an initial patch segmentation and the number of overlapping points, to this end, Russell et al. (
2011) optimize the number of patches and overlap by defining an energy based cost function. This approach was further extended and generalised to deal with general videos (Russell et al.
2014) and energy functional that includes temporal smoothing (Golyanik et al.
2019). The method of Lee et al. (
2016) instead use 3D reconstructions of multiple combinations of patches and define a 3D consensus between a set of patches. This approach provides a fast way to bypass the segmentation problem and robust mechanism to prune out wrong local 3D reconstructions. The method was further improved to account for higher degrees of missing data in the chosen patches so to generalise better the capabilities of the approach in challenging
nrs f m sequences (Cha et al.
2019).
Differently from these approaches, Taylor et al. (
2010) constructs a triangular mesh, connecting all the points, and considering each triangle as being locally rigid. Global consistency is here imposed to ensure that the vertexes of each triangle coincide in 3D. Again, this approach is to a certain extent similar to Varol et al. (
2009), which requires a dense set of points in order to comply with the local rigidity constraint.
A strong prior, which helps dramatically to mitigate the ill-posedness of the problem, is obtained by considering the deformation isometric, i.e. the metric length of curves does not change when the shape is subject to deformations (e.g. paper and metallic materials to some extent). A first solution considering a regularly sampled surface mesh model was presented in Salzmann et al. (
2007). Using an assumption that a surface can be approximated as infinitesimally planar, Chhatkuli et al. (
2014) proposed a local method that frame
nrs f m as the solution of partial differential equations (PDE) being able to deal with missing data as well. As a further update (Parashar et al.
2017) formalizes the framework in the context of Riemannian geometry, which led to a practical method for solving the problem in linear time and scaling for a relevant number of views and points. Furthermore, a convex formulation for
nrs f m with inextensible deformation constraints was implemented using second-order cone programming (SOCP), leading to a closed form solution to the problem (Chhatkuli et al.
2018). Vicente and Agapito (
2012) implemented soft inextensibility constraints in an energy minimization framework, e.g. using recently introduced techniques for discrete optimization.
Another set of approaches try to directly estimate the deformation function using high order models. Del Bue and Bartoli (
2011) extended and applied 3D warps such as the thin plate spline, to the
nrs f m problem. Starting from an approximate mean 3D reconstruction, the warping function can be constructed and the deformation at each frame can be solved by iterating between camera and 3D warp field estimation. Finally, Agudo et al. (
2016) introduced the use of finite elements models (FEM) in
nrs f m. As these models are highly parametrized, requiring the knowledge of the material properties of the object (e.g. the Young modulus), FEM needs to be approximated in order to be efficiently estimated, however, in ideal conditions it might achieve remarkable results, since FEM is a consolidated technique for modelling structural deformations. Lately, Agudo and Moreno-Noguer (
2017b) presented a duality between standard statistical rank-constrained model and a new proposed force model inspired from the Hooke’s law. However, in principle, their physical model can account for a wider range of deformations than rank-based statistical approaches.