Skip to main content
Erschienen in: International Journal of Computer Vision 3/2020

06.09.2019

Learning SO(3) Equivariant Representations with Spherical CNNs

verfasst von: Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, Kostas Daniilidis

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it. We model 3D data with multi-valued spherical functions and we propose a novel spherical convolutional network that implements exact convolutions on the sphere by realizing them in the spherical harmonic domain. Resulting filters have local symmetry and are localized by enforcing smooth spectra. We apply a novel pooling on the spectral domain and our operations are independent of the underlying spherical resolution throughout the network. We show that networks with much lower capacity and without requiring data augmentation can exhibit performance comparable to the state of the art in standard 3D shape retrieval and classification benchmarks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The first version of this work was submitted to CVPR on 11/15/2017, shortly after we became aware of Cohen et al. (2018) ICLR submission on 10/27/2017.
 
2
In a CNN setting, f represents inputs/feature maps, and h the learned filters.
 
3
For the experiments in Table 6, one epoch for the WAP model in the first row takes 234 s, versus 132 s for the SP model in the third row, both on a Nvidia 1080 Ti.
 
Literatur
Zurück zum Zitat Arfken, G. (1966). Mathematical methods for physicists. No. v. 2 in Mathematical methods for physicists. New York: Academic Press. Arfken, G. (1966). Mathematical methods for physicists. No. v. 2 in Mathematical methods for physicists. New York: Academic Press.
Zurück zum Zitat Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Jan Latecki, L. (2016). Gift: A real-time and scalable 3d shape search engine. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5023–5032). Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Jan Latecki, L. (2016). Gift: A real-time and scalable 3d shape search engine. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5023–5032).
Zurück zum Zitat Boscaini, D., Masci, J., Rodolà, E., & Bronstein, M. (2016). Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems (pp. 3189–3197). Boscaini, D., Masci, J., Rodolà, E., & Bronstein, M. (2016). Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems (pp. 3189–3197).
Zurück zum Zitat Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.CrossRef Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.CrossRef
Zurück zum Zitat Bruna, J., Szlam, A., & LeCun, Y. (2013a). Learning stable group invariant representations with convolutional networks. arXiv preprint arXiv:1301.3537. Bruna, J., Szlam, A., & LeCun, Y. (2013a). Learning stable group invariant representations with convolutional networks. arXiv preprint arXiv:​1301.​3537.
Zurück zum Zitat Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013b). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013b). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:​1312.​6203.
Zurück zum Zitat Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013c). Spectral networks and locally connected networks on graphs. CoRR arXiv:1312.6203v3. Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013c). Spectral networks and locally connected networks on graphs. CoRR arXiv:​1312.​6203v3.
Zurück zum Zitat Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). Shapenet: An information-rich 3d model repository. CoRR arXiv:1512.03012v1. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). Shapenet: An information-rich 3d model repository. CoRR arXiv:​1512.​03012v1.
Zurück zum Zitat Cohen, T. S., Geiger, M., Köhler, J., & Welling, M. (2018). Spherical CNNs. In International conference on learning representations. Cohen, T. S., Geiger, M., Köhler, J., & Welling, M. (2018). Spherical CNNs. In International conference on learning representations.
Zurück zum Zitat Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 3844–3852). Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 3844–3852).
Zurück zum Zitat Dieleman, S., Willett, K. W., & Dambre, J. (2015). Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2), 1441–1459.CrossRef Dieleman, S., Willett, K. W., & Dambre, J. (2015). Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2), 1441–1459.CrossRef
Zurück zum Zitat Driscoll, J. R., & Healy, D. M. (1994). Computing fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15(2), 202–250.MathSciNetCrossRef Driscoll, J. R., & Healy, D. M. (1994). Computing fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15(2), 202–250.MathSciNetCrossRef
Zurück zum Zitat Frome, A., Huber, D., Kolluri, R., Bülow, T., & Malik, J. (2004). Recognizing objects in range data using regional point descriptors. In T. Pajdla & J. Matas (Eds.), Computer Vision - ECCV 2004. ECCV 2004. Lecture notes in computer science (vol. 3023). Berlin, Heidelberg: Springer. Frome, A., Huber, D., Kolluri, R., Bülow, T., & Malik, J. (2004). Recognizing objects in range data using regional point descriptors. In T. Pajdla & J. Matas (Eds.), Computer Vision - ECCV 2004. ECCV 2004. Lecture notes in computer science (vol. 3023). Berlin, Heidelberg: Springer.
Zurück zum Zitat Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3d geometric features for 3d model retrieval. In BMVC (p. 121). Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3d geometric features for 3d model retrieval. In BMVC (p. 121).
Zurück zum Zitat Gens, R., & Domingos, P. M. (2014). Deep symmetry networks. In Advances in neural information processing systems (pp. 2537–2545). Gens, R., & Domingos, P. M. (2014). Deep symmetry networks. In Advances in neural information processing systems (pp. 2537–2545).
Zurück zum Zitat Górski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke, M., et al. (2005). HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622, 759–771. https://doi.org/10.1086/427976.CrossRef Górski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke, M., et al. (2005). HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622, 759–771. https://​doi.​org/​10.​1086/​427976.CrossRef
Zurück zum Zitat Healy, D. M., Rockmore, D. N., Kostelec, P. J., & Moore, S. (2003). Ffts for the 2-sphere-improvements and variations. Journal of Fourier Analysis and Applications, 9(4), 341–385.MathSciNetCrossRef Healy, D. M., Rockmore, D. N., Kostelec, P. J., & Moore, S. (2003). Ffts for the 2-sphere-improvements and variations. Journal of Fourier Analysis and Applications, 9(4), 341–385.MathSciNetCrossRef
Zurück zum Zitat Hel-Or, Y., & Teo, P. C. (1996). Canonical decomposition of steerable functions. In Computer vision and pattern recognition, 1996. Proceedings CVPR’96, 1996 IEEE computer society conference on (pp. 809–816). IEEE. Hel-Or, Y., & Teo, P. C. (1996). Canonical decomposition of steerable functions. In Computer vision and pattern recognition, 1996. Proceedings CVPR’96, 1996 IEEE computer society conference on (pp. 809–816). IEEE.
Zurück zum Zitat Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025). Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
Zurück zum Zitat Kanezaki, A., Matsushita, Y., & Nishida, Y. (2018). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR). Kanezaki, A., Matsushita, Y., & Nishida, Y. (2018). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of IEEE international conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Kazhdan, M., & Funkhouser, T. (2002). Harmonic 3d shape matching. In ACM SIGGRAPH 2002 conference abstracts and applications (pp. 191–191). New York: ACM. Kazhdan, M., & Funkhouser, T. (2002). Harmonic 3d shape matching. In ACM SIGGRAPH 2002 conference abstracts and applications (pp. 191–191). New York: ACM.
Zurück zum Zitat Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:​1609.​02907.
Zurück zum Zitat Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In International conference on compute vision (ICCV) (pp. 863–872). Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In International conference on compute vision (ICCV) (pp. 863–872).
Zurück zum Zitat Lebedev, N., & Silverman, R. (1972). Special functions and their applications. Dover Books on Mathematics, Dover Publications. Lebedev, N., & Silverman, R. (1972). Special functions and their applications. Dover Books on Mathematics, Dover Publications.
Zurück zum Zitat Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999). Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999).
Zurück zum Zitat Makadia, A., & Daniilidis, K. (2010). Spherical correlation of visual representations for 3d model retrieval. International Journal of Computer Vision, 89(2), 193–210.CrossRef Makadia, A., & Daniilidis, K. (2010). Spherical correlation of visual representations for 3d model retrieval. International Journal of Computer Vision, 89(2), 193–210.CrossRef
Zurück zum Zitat Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops (pp. 37–45). Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops (pp. 37–45).
Zurück zum Zitat Maturana, D., & Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems, IROS 2015, Hamburg, Germany, September 28–October 2, 2015 (pp. 922–928). https://doi.org/10.1109/IROS.2015.7353481. Maturana, D., & Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems, IROS 2015, Hamburg, Germany, September 28–October 2, 2015 (pp. 922–928). https://​doi.​org/​10.​1109/​IROS.​2015.​7353481.
Zurück zum Zitat Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., & Bronstein, M. M. (2016). Geometric deep learning on graphs and manifolds using mixture model CNNs. arXiv preprint arXiv:1611.08402. Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., & Bronstein, M. M. (2016). Geometric deep learning on graphs and manifolds using mixture model CNNs. arXiv preprint arXiv:​1611.​08402.
Zurück zum Zitat Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, LJ. (2016). Volumetric and multi-view CNNs for object classification on 3d data. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 5648–5656). https://doi.org/10.1109/CVPR.2016.609. Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, LJ. (2016). Volumetric and multi-view CNNs for object classification on 3d data. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 (pp. 5648–5656). https://​doi.​org/​10.​1109/​CVPR.​2016.​609.
Zurück zum Zitat Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Processing computer vision and pattern recognition (CVPR) (Vol. 1(2), p. 4). IEEE. Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Processing computer vision and pattern recognition (CVPR) (Vol. 1(2), p. 4). IEEE.
Zurück zum Zitat Qi, C. R., Yi, L., Su, H., & Guibas, LJ. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (pp. 5105–5114). Qi, C. R., Yi, L., Su, H., & Guibas, LJ. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems (pp. 5105–5114).
Zurück zum Zitat Savva, M., Yu, F., Su, H., Kanezaki, A., Furuya, T., et al. (2017). Shrec’17 track: Large-scale 3d shape retrieval from shapenet core55. In 10th Eurographics workshop on 3D object retrieval (pp. 1–11). Savva, M., Yu, F., Su, H., Kanezaki, A., Furuya, T., et al. (2017). Shrec’17 track: Large-scale 3d shape retrieval from shapenet core55. In 10th Eurographics workshop on 3D object retrieval (pp. 1–11).
Zurück zum Zitat Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823). Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).
Zurück zum Zitat Segman, J., Rubinstein, J., & Zeevi, Y. Y. (1992). The canonical coordinates method for pattern deformation: Theoretical and computational considerations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12), 1171–1183.CrossRef Segman, J., Rubinstein, J., & Zeevi, Y. Y. (1992). The canonical coordinates method for pattern deformation: Theoretical and computational considerations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12), 1171–1183.CrossRef
Zurück zum Zitat Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953). Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953).
Zurück zum Zitat Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3d shape retrieval. The Visual Computer, 25(8), 785–804.CrossRef Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3d shape retrieval. The Visual Computer, 25(8), 785–804.CrossRef
Zurück zum Zitat Thurston, W. P. (1997). Three-dimensional geometry and topology (Vol. 1). Princeton, NJ: Princeton University Press.CrossRef Thurston, W. P. (1997). Three-dimensional geometry and topology (Vol. 1). Princeton, NJ: Princeton University Press.CrossRef
Zurück zum Zitat Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M.M., & Solomon, J. M. (2018). Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829. Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M.M., & Solomon, J. M. (2018). Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:​1801.​07829.
Zurück zum Zitat Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2016). Harmonic networks: Deep translation and rotation equivariance. arXiv preprint arXiv:1612.04642. Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2016). Harmonic networks: Deep translation and rotation equivariance. arXiv preprint arXiv:​1612.​04642.
Zurück zum Zitat Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2017). Harmonic networks: deep translation and rotation equivariance. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 5028–5037). Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2017). Harmonic networks: deep translation and rotation equivariance. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 5028–5037).
Zurück zum Zitat Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015 (pp. 1912–1920). https://doi.org/10.1109/CVPR.2015.7298801. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015 (pp. 1912–1920). https://​doi.​org/​10.​1109/​CVPR.​2015.​7298801.
Zurück zum Zitat Yi, L., Su, H., Guo, X., & Guibas, L. (2016). SyncSpecCNN: Synchronized spectral CNN for 3d shape segmentation. arXiv preprint arXiv:1612.00606. Yi, L., Su, H., Guo, X., & Guibas, L. (2016). SyncSpecCNN: Synchronized spectral CNN for 3d shape segmentation. arXiv preprint arXiv:​1612.​00606.
Zurück zum Zitat Zhang, R. (2019). Making convolutional networks shift-invariant again. In International conference on machine learning (ICML) Zhang, R. (2019). Making convolutional networks shift-invariant again. In International conference on machine learning (ICML)
Zurück zum Zitat Zhou, Y., Ye, Q., Qiu, Q., & Jiao, J. (2017). Oriented response networks. In The IEEE conference on computer vision and pattern recognition (CVPR). Zhou, Y., Ye, Q., Qiu, Q., & Jiao, J. (2017). Oriented response networks. In The IEEE conference on computer vision and pattern recognition (CVPR).
Metadaten
Titel
Learning SO(3) Equivariant Representations with Spherical CNNs
verfasst von
Carlos Esteves
Christine Allen-Blanchette
Ameesh Makadia
Kostas Daniilidis
Publikationsdatum
06.09.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01220-1

Weitere Artikel der Ausgabe 3/2020

International Journal of Computer Vision 3/2020 Zur Ausgabe

Premium Partner