Skip to main content
Top
Published in: International Journal of Computer Vision 10/2019

02-08-2019

Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images

Authors: Yuan Gao, Alan L. Yuille

Published in: International Journal of Computer Vision | Issue 10/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many man-made objects have intrinsic symmetries and often Manhattan structure. By assuming an orthographic or a weak perspective projection model, this paper addresses the estimation of 3D structures and camera projection using symmetry and/or Manhattan structure cues, for the two cases when the input is a single image or multiple images from the same category, e.g. multiple different cars from various viewpoints. More specifically, analysis on the single image case shows that Manhattan alone is sufficient to recover the camera projection and then the 3D structure can be reconstructed uniquely by exploiting symmetry. But Manhattan structure can be hard to observe from a single image due to occlusion. Hence, we extend to the multiple-image case which can also exploit symmetry but does not require Manhattan structure. We propose novel structure from motion methods for both rigid and non-rigid object deformations, which exploit symmetry and use multiple images from the same object category as input. We perform experiments on the Pascal3D+ dataset with either human labeled 2D keypoints or with 2D keypoints localized from a convolutional neural network. The results show that our methods which exploit symmetry significantly outperform the baseline methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
However, the general framework in Hong and Fitzgibbon (2015) cannot be used to SfM directly, because it did not constrain that all the keypoints within the same frame should have the same translation. Instead, Hong and Fitzgibbon (2015) focused on better optimization of rank-r matrix factorization and better runtime.
 
2
Note that we set hard constraints on \(\mathbb {{\bar{S}}}\) and \(\mathbb {{\bar{S}}}^{\dag }\), i.e. replace \(\mathbb {{\bar{S}}}^{\dag }\) by \({\mathcal {A}}_P \mathbb {{\bar{S}}}\) in Eq. (57), because it can be guaranteed by our Sym-RSfM initialization in Sect. 6. While the initialization on \({\mathbf {V}}\) and \({\mathbf {V}}^{\dag }\) by PCA cannot guarantee such a desirable property, thus a Language multiplier term is used for the constraint on \({\mathbf {V}}\) and \({\mathbf {V}}^{\dag }\) in the following Eq. (61).
 
3
For the subtypes of more categories, please refer to the Pascal3D+ official website at http://​cvgl.​stanford.​edu/​projects/​pascal3d.​html.
 
4
For the rigid case, as we use the images from the same subtype as input (so that we can reasonably assume rigid deformation among them), therefore, we also report the rotation error according to subtype for the rigid experiments.
 
5
As there is no baseline method for comparison, we also calculate the average rotation errors measured by averaged geodesic distance \(\frac{1}{N} \sum _{n=1}^{N} ||\log ({R_n^{\text {aligned}}}^\top R_n^*) ||_\text {F} / \sqrt{2}\), which represents the angle difference between two rotation matrices. The results show that the rotation error is 4.1766 degree in average.
 
6
As analyzed in Remark 10 and Eq. (38), the relationship between the number of allowed deformation bases K and the number of keypoint pairs P follows: \(K \le P/3\).
 
7
This is because the self-occluded information/features can be recovered by the training images from a different viewpoint, but the training data cannot exhaustively retain various occlusions introduced by other objects or various truncated types.
 
8
They are not directly comparable because (i) Tables 1 and 2 use 2D annotations from (Bourdev et al. 2010) [the same as those used in Kar et al. (2015)], while the keypoint localization network for Tables 4 and 5 is trained on 2D annotations from Pascal3D+ (Xiang et al. 2014). (ii) We exclude the occluded-by-others and truncated objects in Tables 4 and 5 [the same as those in Pavlakos et al. (2017)] because the stacked hourglass network (Newell et al. 2016) does not produce satisfied results on those images.
 
Literature
go back to reference Agudo, A., Agapito, L., Calvo, B., & Montiel, J. (2014). Good vibrations: A modal analysis approach for sequential non-rigid structure from motion. In CVPR (pp. 1558–1565). Agudo, A., Agapito, L., Calvo, B., & Montiel, J. (2014). Good vibrations: A modal analysis approach for sequential non-rigid structure from motion. In CVPR (pp. 1558–1565).
go back to reference Akhter, I., Sheikh, Y., & Khan, S. (2009). In defense of orthonormality constraints for nonrigid structure from motion. In CVPR. Akhter, I., Sheikh, Y., & Khan, S. (2009). In defense of orthonormality constraints for nonrigid structure from motion. In CVPR.
go back to reference Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In NIPS. Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In NIPS.
go back to reference Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2011). Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1442–1456.CrossRef Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2011). Trajectory space: A dual representation for nonrigid structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1442–1456.CrossRef
go back to reference Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.MATH Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.MATH
go back to reference Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In ECCV. Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In ECCV.
go back to reference Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In CVPR. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In CVPR.
go back to reference Chen, X., & Yuille, A. L. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744). Chen, X., & Yuille, A. L. (2014). Articulated pose estimation by a graphical model with image dependent pairwise relations. In NIPS (pp. 1736–1744).
go back to reference Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by bayesian inference. In ICCV. Coughlan, J. M., & Yuille, A. L. (1999). Manhattan world: Compass direction from a single image by bayesian inference. In ICCV.
go back to reference Coughlan, J. M., & Yuille, A. L. (2003). Manhattan world: Orientation and outlier detection by bayesian inference. Neural Computation, 15(5), 1063–1088.CrossRef Coughlan, J. M., & Yuille, A. L. (2003). Manhattan world: Orientation and outlier detection by bayesian inference. Neural Computation, 15(5), 1063–1088.CrossRef
go back to reference Dai, Y., Li, H., & He, M. (2012). A simple prior-free method for non-rigid structure-from-motion factorization. In CVPR. Dai, Y., Li, H., & He, M. (2012). A simple prior-free method for non-rigid structure-from-motion factorization. In CVPR.
go back to reference Dai, Y., Li, H., & He, M. (2014). A simple prior-free method for non-rigid structure-from-motion factorization. International Journal of Computer Vision, 107, 101–122.MathSciNetCrossRefMATH Dai, Y., Li, H., & He, M. (2014). A simple prior-free method for non-rigid structure-from-motion factorization. International Journal of Computer Vision, 107, 101–122.MathSciNetCrossRefMATH
go back to reference Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR. Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2009). Manhattan-world stereo. In CVPR.
go back to reference Gao, Y., Ma, J., Zhao, M., Liu, W., & Yuille, A. L. (2019). NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In CVPR. Gao, Y., Ma, J., Zhao, M., Liu, W., & Yuille, A. L. (2019). NDDR-CNN: Layerwise feature fusing in multi-task CNNs by neural discriminative dimensionality reduction. In CVPR.
go back to reference Gao, Y., & Yuille, A. L. (2016). Symmetry non-rigid structure from motion for category-specific object structure estimation. In ECCV. Gao, Y., & Yuille, A. L. (2016). Symmetry non-rigid structure from motion for category-specific object structure estimation. In ECCV.
go back to reference Gao, Y., & Yuille, A. L. (2017). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. In IEEE international conference on computer vision and pattern recognition. Gao, Y., & Yuille, A. L. (2017). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. In IEEE international conference on computer vision and pattern recognition.
go back to reference Gordon, G. G. (1990). Shape from symmetry. In Proceedings of SPIE. Gordon, G. G. (1990). Shape from symmetry. In Proceedings of SPIE.
go back to reference Gotardo, P., & Martinez, A. (2011). Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2051–2065.CrossRef Gotardo, P., & Martinez, A. (2011). Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 2051–2065.CrossRef
go back to reference Grossmann, E., Ortin, D., & Santos-Victor, J. (2002). Single and multi-view reconstruction of structured scenes. In ACCV. Grossmann, E., Ortin, D., & Santos-Victor, J. (2002). Single and multi-view reconstruction of structured scenes. In ACCV.
go back to reference Grossmann, E., & Santos-Victor, J. (2002). Maximum likehood 3D reconstruction from one or more images under geometric constraints. In BMVC. Grossmann, E., & Santos-Victor, J. (2002). Maximum likehood 3D reconstruction from one or more images under geometric constraints. In BMVC.
go back to reference Grossmann, E., & Santos-Victor, J. (2005). Least-squares 3D reconstruction from one or more views and geometric clues. Computer Vision and Image Understanding, 99(2), 151–174.CrossRef Grossmann, E., & Santos-Victor, J. (2005). Least-squares 3D reconstruction from one or more views and geometric clues. Computer Vision and Image Understanding, 99(2), 151–174.CrossRef
go back to reference Hamsici, O. C., Gotardo, P. F., & Martinez, A. M. (2012). Learning spatially-smooth mappings in non-rigid structure from motion. In ECCV (pp. 260–273). Hamsici, O. C., Gotardo, P. F., & Martinez, A. M. (2012). Learning spatially-smooth mappings in non-rigid structure from motion. In ECCV (pp. 260–273).
go back to reference Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.CrossRefMATH Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.CrossRefMATH
go back to reference Hong, J. H., & Fitzgibbon, A. (2015). Secrets of matrix factorization: Approximations, numerics, manifold optimization and random restarts. In ICCV. Hong, J. H., & Fitzgibbon, A. (2015). Secrets of matrix factorization: Approximations, numerics, manifold optimization and random restarts. In ICCV.
go back to reference Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60, 241–265.CrossRef Hong, W., Yang, A. Y., Huang, K., & Ma, Y. (2004). On symmetry and multiple-view geometry: Structure, pose, and calibration from a single image. International Journal of Computer Vision, 60, 241–265.CrossRef
go back to reference Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In CVPR. Kar, A., Tulsiani, S., Carreira, J., & Malik, J. (2015). Category-specific object reconstruction from a single image. In CVPR.
go back to reference Kontsevich, L. L. (1993). Pairwise comparison technique: A simple solution for depth reconstruction. JOSA A, 10(6), 1129–1135.CrossRef Kontsevich, L. L. (1993). Pairwise comparison technique: A simple solution for depth reconstruction. JOSA A, 10(6), 1129–1135.CrossRef
go back to reference Kontsevich, L. L., Kontsevich, M. L., & Shen, A. K. (1987). Two algorithms for reconstructing shapes. Optoelectronics, Instrumentation and Data Processing, 5, 76–81. Kontsevich, L. L., Kontsevich, M. L., & Shen, A. K. (1987). Two algorithms for reconstructing shapes. Optoelectronics, Instrumentation and Data Processing, 5, 76–81.
go back to reference Li, Y., & Pizlo, Z. (2007). Reconstruction of shapes of 3D symmetric objects by using planarity and compactness constraints. In Proceedings of SPIE-IS&T electronic imaging. Li, Y., & Pizlo, Z. (2007). Reconstruction of shapes of 3D symmetric objects by using planarity and compactness constraints. In Proceedings of SPIE-IS&T electronic imaging.
go back to reference Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013). Robust estimation of nonrigid transformation for point set registration. In CVPR (pp. 2147–2154). Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013). Robust estimation of nonrigid transformation for point set registration. In CVPR (pp. 2147–2154).
go back to reference Marques, M., & Costeira, J. (2009). Estimating 3D shape from degenerate sequences with missing data. Computer Vision and Image Understanding, 113(2), 261–272.CrossRef Marques, M., & Costeira, J. (2009). Estimating 3D shape from degenerate sequences with missing data. Computer Vision and Image Understanding, 113(2), 261–272.CrossRef
go back to reference Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.CrossRef Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.CrossRef
go back to reference Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.CrossRefMATH Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.CrossRefMATH
go back to reference Morris, D. D., Kanatani, K., & Kanade, T. (2001). Gauge fixing for accurate 3D estimation. In CVPR. Morris, D. D., Kanatani, K., & Kanade, T. (2001). Gauge fixing for accurate 3D estimation. In CVPR.
go back to reference Mukherjee, D. P., Zisserman, A., & Brady, M. (1995). Shape from symmetry: Detecting and exploiting symmetry in affine images. Philosophical Transactions: Physical Sciences and Engineering, 351, 77–106.CrossRefMATH Mukherjee, D. P., Zisserman, A., & Brady, M. (1995). Shape from symmetry: Detecting and exploiting symmetry in affine images. Philosophical Transactions: Physical Sciences and Engineering, 351, 77–106.CrossRefMATH
go back to reference Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer. Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision (pp. 483–499). Springer.
go back to reference Olsen, S. I., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244. Olsen, S. I., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244.
go back to reference Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G., & Daniilidis, K. (2017). 6-DoF object pose from semantic keypoints. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 2011–2018). IEEE. Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G., & Daniilidis, K. (2017). 6-DoF object pose from semantic keypoints. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 2011–2018). IEEE.
go back to reference Rosen, J. (2011). Symmetry discovered: Concepts and applications in nature and science. Mineola: Dover Publications.MATH Rosen, J. (2011). Symmetry discovered: Concepts and applications in nature and science. Mineola: Dover Publications.MATH
go back to reference Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In ICCV. Thrun, S., & Wegbreit, B. (2005). Shape from symmetry. In ICCV.
go back to reference Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154.CrossRef Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154.CrossRef
go back to reference Torresani, L., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3D shape from 2D motion. In NIPS. Torresani, L., Hertzmann, A., & Bregler, C. (2003). Learning non-rigid 3D shape from 2D motion. In NIPS.
go back to reference Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 878–892. Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 878–892.
go back to reference Vetter, T., & Poggio, T. (1994). Symmetric 3D objects are an easy case for 2D object recognition. Spatial Vision, 8, 443–453.CrossRef Vetter, T., & Poggio, T. (1994). Symmetric 3D objects are an easy case for 2D object recognition. Spatial Vision, 8, 443–453.CrossRef
go back to reference Vicente, S., Carreira, J., Agapito, L., & Batista, J. (2014). Reconstructing PASCAL VOC. In CVPR. Vicente, S., Carreira, J., Agapito, L., & Batista, J. (2014). Reconstructing PASCAL VOC. In CVPR.
go back to reference Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3D object detection in the wild. In WACV. Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3D object detection in the wild. In WACV.
go back to reference Xiao, J., Chai, J., & Kanade, T. (2004). A closed-form solution to nonrigid shape and motion recovery. In ECCV. Xiao, J., Chai, J., & Kanade, T. (2004). A closed-form solution to nonrigid shape and motion recovery. In ECCV.
Metadata
Title
Estimation of 3D Category-Specific Object Structure: Symmetry, Manhattan and/or Multiple Images
Authors
Yuan Gao
Alan L. Yuille
Publication date
02-08-2019
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 10/2019
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01195-z

Other articles of this Issue 10/2019

International Journal of Computer Vision 10/2019 Go to the issue

Premium Partner