Top

International Journal of Computer Vision

Published in:

02-08-2019

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

Authors: Yuan Gao, Alan L. Yuille

Published in: International Journal of Computer Vision | Issue 10/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Many man-made objects have intrinsic symmetries and often Manhattan structure. By assuming an orthographic or a weak perspective projection model, this paper addresses the estimation of 3D structures and camera projection using symmetry and/or Manhattan structure cues, for the two cases when the input is a single image or multiple images from the same category, e.g. multiple different cars from various viewpoints. More specifically, analysis on the single image case shows that Manhattan alone is sufficient to recover the camera projection and then the 3D structure can be reconstructed uniquely by exploiting symmetry. But Manhattan structure can be hard to observe from a single image due to occlusion. Hence, we extend to the multiple-image case which can also exploit symmetry but does not require Manhattan structure. We propose novel structure from motion methods for both rigid and non-rigid object deformations, which exploit symmetry and use multiple images from the same object category as input. We perform experiments on the Pascal3D+ dataset with either human labeled 2D keypoints or with 2D keypoints localized from a convolutional neural network. The results show that our methods which exploit symmetry significantly outperform the baseline methods.

previous article Motion-Compensated Spatio-Temporal Filtering for Multi-Image and Multimodal Super-Resolution

next article Learning Transparent Object Matting

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

However, the general framework in Hong and Fitzgibbon (2015) cannot be used to SfM directly, because it did not constrain that all the keypoints within the same frame should have the same translation. Instead, Hong and Fitzgibbon (2015) focused on better optimization of rank-r matrix factorization and better runtime.

Note that we set hard constraints on \(\mathbb {{\bar{S}}}\) and \(\mathbb {{\bar{S}}}^{\dag }\), i.e. replace \(\mathbb {{\bar{S}}}^{\dag }\) by \({\mathcal {A}}_P \mathbb {{\bar{S}}}\) in Eq. (57), because it can be guaranteed by our Sym-RSfM initialization in Sect. 6. While the initialization on \({\mathbf {V}}\) and \({\mathbf {V}}^{\dag }\) by PCA cannot guarantee such a desirable property, thus a Language multiplier term is used for the constraint on \({\mathbf {V}}\) and \({\mathbf {V}}^{\dag }\) in the following Eq. (61).

For the subtypes of more categories, please refer to the Pascal3D+ official website at http://cvgl.stanford.edu/projects/pascal3d.html.

For the rigid case, as we use the images from the same subtype as input (so that we can reasonably assume rigid deformation among them), therefore, we also report the rotation error according to subtype for the rigid experiments.

As there is no baseline method for comparison, we also calculate the average rotation errors measured by averaged geodesic distance \(\frac{1}{N} \sum _{n=1}^{N} ||\log ({R_n^{\text {aligned}}}^\top R_n^*) ||_\text {F} / \sqrt{2}\), which represents the angle difference between two rotation matrices. The results show that the rotation error is 4.1766 degree in average.

As analyzed in Remark 10 and Eq. (38), the relationship between the number of allowed deformation bases K and the number of keypoint pairs P follows: \(K \le P/3\).

This is because the self-occluded information/features can be recovered by the training images from a different viewpoint, but the training data cannot exhaustively retain various occlusions introduced by other objects or various truncated types.

They are not directly comparable because (i) Tables 1 and 2 use 2D annotations from (Bourdev et al. 2010) [the same as those used in Kar et al. (2015)], while the keypoint localization network for Tables 4 and 5 is trained on 2D annotations from Pascal3D+ (Xiang et al. 2014). (ii) We exclude the occluded-by-others and truncated objects in Tables 4 and 5 [the same as those in Pavlakos et al. (2017)] because the stacked hourglass network (Newell et al. 2016) does not produce satisfied results on those images.

Agudo, A., Agapito, L., Calvo, B., & Montiel, J. (2014). Good vibrations: A modal analysis approach for sequential non-rigid structure from motion. In CVPR (pp. 1558–1565).

Akhter, I., Sheikh, Y., & Khan, S. (2009). In defense of orthonormality constraints for nonrigid structure from motion. In CVPR.

Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In NIPS.

Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2011). Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1442–1456.CrossRef

Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.MATH

Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In ECCV.

Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In CVPR.

Ceylan, D., Mitra, N. J., Zheng, Y., & Pauly, M. (2014). Coupled structure-from-motion and 3D symmetry detection for urban facades. ACM Transactions on Graphics, 33, 2. https://doi.org/10.1145/2517348.CrossRefMATH

Chen, X., & Yuille, A. L. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744).

Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by bayesian inference. In ICCV.

Coughlan, J. M., & Yuille, A. L. (2003). Manhattan world: Orientation and outlier detection by bayesian inference. Neural Computation, 15(5), 1063–1088.CrossRef

Dai, Y., Li, H., & He, M. (2012). A simple prior-free method for non-rigid structure-from-motion factorization. In CVPR.

Dai, Y., Li, H., & He, M. (2014). A simple prior-free method for non-rigid structure-from-motion factorization. International Journal of Computer Vision, 107, 101–122.MathSciNetCrossRefMATH

Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR.

Gao, Y., Ma, J., Zhao, M., Liu, W., & Yuille, A. L. (2019). NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In CVPR.

Gao, Y., & Yuille, A. L. (2016). Symmetry non-rigid structure from motion for category-specific object structure estimation. In ECCV.

Gao, Y., & Yuille, A. L. (2017). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. In IEEE international conference on computer vision and pattern recognition.

Gordon, G. G. (1990). Shape from symmetry. In Proceedings of SPIE.

Gotardo, P., & Martinez, A. (2011). Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2051–2065.CrossRef

Grossmann, E., Ortin, D., & Santos-Victor, J. (2002). Single and multi-view reconstruction of structured scenes. In ACCV.

Grossmann, E., & Santos-Victor, J. (2002). Maximum likehood 3D reconstruction from one or more images under geometric constraints. In BMVC.

Grossmann, E., & Santos-Victor, J. (2005). Least-squares 3D reconstruction from one or more views and geometric clues. Computer Vision and Image Understanding, 99(2), 151–174.CrossRef

Hamsici, O. C., Gotardo, P. F., & Martinez, A. M. (2012). Learning spatially-smooth mappings in non-rigid structure from motion. In ECCV (pp. 260–273).

Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.CrossRefMATH

Hong, J. H., & Fitzgibbon, A. (2015). Secrets of matrix factorization: Approximations, numerics, manifold optimization and random restarts. In ICCV.

Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60, 241–265.CrossRef

Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In CVPR.

Kontsevich, L. L. (1993). Pairwise comparison technique: A simple solution for depth reconstruction. JOSA A, 10(6), 1129–1135.CrossRef

Kontsevich, L. L., Kontsevich, M. L., & Shen, A. K. (1987). Two algorithms for reconstructing shapes. Optoelectronics, Instrumentation and Data Processing, 5, 76–81.

Li, Y., & Pizlo, Z. (2007). Reconstruction of shapes of 3D symmetric objects by using planarity and compactness constraints. In Proceedings of SPIE-IS&T electronic imaging.

Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013). Robust estimation of nonrigid transformation for point set registration. In CVPR (pp. 2147–2154).

Marques, M., & Costeira, J. (2009). Estimating 3D shape from degenerate sequences with missing data. Computer Vision and Image Understanding, 113(2), 261–272.CrossRef

Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.CrossRef

Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.CrossRefMATH

Morris, D. D., Kanatani, K., & Kanade, T. (2001). Gauge fixing for accurate 3D estimation. In CVPR.

Mukherjee, D. P., Zisserman, A., & Brady, M. (1995). Shape from symmetry: Detecting and exploiting symmetry in affine images. Philosophical Transactions: Physical Sciences and Engineering, 351, 77–106.CrossRefMATH

Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.

Olsen, S. I., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244.

Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G., & Daniilidis, K. (2017). 6-DoF object pose from semantic keypoints. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 2011–2018). IEEE.

Rosen, J. (2011). Symmetry discovered: Concepts and applications in nature and science. Mineola: Dover Publications.MATH

Schönemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31, 1–10.MathSciNetCrossRefMATH

Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In ICCV.

Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154.CrossRef

Torresani, L., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3D shape from 2D motion. In NIPS.

Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 878–892.

Vetter, T., & Poggio, T. (1994). Symmetric 3D objects are an easy case for 2D object recognition. Spatial Vision, 8, 443–453.CrossRef

Vicente, S., Carreira, J., Agapito, L., & Batista, J. (2014). Reconstructing PASCAL VOC. In CVPR.

Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3D object detection in the wild. In WACV.

Xiao, J., Chai, J., & Kanade, T. (2004). A closed-form solution to nonrigid shape and motion recovery. In ECCV.

Title: Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images
Authors: Yuan Gao
Alan L. Yuille
Publication date: 02-08-2019
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 10/2019
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01195-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 10/2019

LCEval: Learned Composite Metric for Caption Evaluation

CU-Net: Component Unmixing Network for Textile Fiber Identification

Motion-Compensated Spatio-Temporal Filtering for Multi-Image and Multimodal Super-Resolution

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Video Question Answering with Spatio-Temporal Reasoning

A Spatiotemporal Convolutional Neural Network for Automatic Pain Intensity Estimation from Facial Dynamics

Premium Partner