nach oben

International Journal of Computer Vision

Erschienen in:

21.12.2019

Representation Learning on Unit Ball with 3D Roto-translational Equivariance

verfasst von: Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould

Erschienen in: International Journal of Computer Vision | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Convolution is an integral operation that defines how the shape of one function is modified by another function. This powerful concept forms the basis of hierarchical feature learning in deep neural networks. Although performing convolution in Euclidean geometries is fairly straightforward, its extension to other topological spaces—such as a sphere (\(\mathbb {S}^2\)) or a unit ball (\(\mathbb {B}^3\))—entails unique challenges. In this work, we propose a novel ‘volumetric convolution’ operation that can effectively model and convolve arbitrary functions in \(\mathbb {B}^3\). We develop a theoretical framework for volumetric convolution based on Zernike polynomials and efficiently implement it as a differentiable and an easily pluggable layer in deep networks. By construction, our formulation leads to the derivation of a novel formula to measure the symmetry of a function in \(\mathbb {B}^3\) around an arbitrary axis, that is useful in function analysis tasks. We demonstrate the efficacy of proposed volumetric convolution operation on one viable use case i.e., 3D object recognition.

Vorheriger Artikel Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

Nächster Artikel Scalable Person Re-Identification by Harmonious Attention

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We refer the reader to (Cohen et al. 2018a) for an excellent review on group equivariant CNNs.

Agathos, A., Pratikakis, I., Papadakis, P., Perantonis, S. J., Azariadis, P. N., & Sapidis, N. S. (2009). Retrieval of 3D articulated objects using a graph-based representation. In 3DOR 2009 (pp. 29–36).

Ankerst, M., Kastenmüller, G., Kriegel, H. P., & Seidl, T. (1999). 3D shape histograms for similarity search and classification in spatial databases. In International symposium on spatial databases (pp. 207–226). Berlin: Springer.

Arbter, K., Snyder, W. E., Burkhardt, H., & Hirzinger, G. (1990). Application of affine-invariant fourier descriptors to recognition of 3-d objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 640–647.CrossRef

Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Latecki, L. J. (2016). Gift: A real-time and scalable 3D shape search engine. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5023–5032). IEEE.

Boomsma, W., & Frellsen, J. (2017). Spherical convolutions and their application in molecular modelling. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 3433–3443). Curran Associates, Inc. http://papers.nips.cc/paper/6935-spherical-convolutions-and-their-application-in-molecular-modelling.pdf.

Boscaini, D., Masci, J., Melzi, S., Bronstein, M. M., Castellani, U., & Vandergheynst, P. (2015). Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Computer Graphics Forum, 34, 13–23.CrossRef

Boscaini, D., Masci, J., Rodolà, E., & Bronstein, M. (2016). Learning shape correspondence with anisotropic convolutional neural networks. In Advances in neural information processing systems (pp. 3189–3197).

Brock, A., Lim, T., Ritchie, J. M., & Weston, N. (2016). Generative and discriminative Voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236.

Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017). Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.CrossRef

Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.

Canterakis, N. (1996). Complete moment invariants and pose determination for orthogonal transformations of 3D objects. In Mustererkennung 1996 (pp. 339–350). Berlin: Springer.

Canterakis, N. (1999). 3D zernike moments and zernike affine invariants for 3D image analysis and recognition. In In 11th Scandinavian conference on image analysis, Citeseer.

Carrière, M., Oudot, S. Y., & Ovsjanikov, M. (2015). Stable topological signatures for points on 3D shapes. Computer Graphics Forum, 34, 1–12.CrossRef

Cohen, T., Geiger, M., & Weiler, M. (2018a). A general theory of equivariant CNNS on homogeneous spaces. arXiv preprint arXiv:1811.02017.

Cohen, T. S., Geiger, M., Koehler, J., & Welling, M. (2018b). Spherical CNNS. In International conference on learning representations (ICLR).

Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 3844–3852). Curran Associates, Inc. http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf.

El Mallahi, M., Zouhri, A., El Affar, A., Tahiri, A., & Qjidaa, H. (2017). Radial Hahn moment invariants for 2D and 3D image recognition. International Journal of Automation and Computing, 15(3), 277–289.CrossRef

Ester, M., Kriegel, H. P., Sander, J., Xu, X., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226–231.

Esteves, C., Allen-Blanchette, C., Makadia, A., & Daniilidis, K. (2018). Learning so(3) equivariant representations with spherical CNNS. In The European conference on computer vision (ECCV).

Flusser, J., Boldys, J., & Zitová, B. (2003). Moment forms invariant to rotation and blur in arbitrary number of dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 234–246.CrossRef

Fotenos, A. F., Snyder, A. Z., Girton, L. E., Morris, J. C., & Buckner, R. L. (2005). Normative estimates of cross-sectional and longitudinal brain volume decline in aging and AD. Neurology, 64(6), 1032–1039.CrossRef

Frome, A., Huber, D., Kolluri, R., Bülow, T., & Malik, J. (2004). Recognizing objects in range data using regional point descriptors. In European conference on computer vision (pp. 224–237). Berlin: Springer.

Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3D geometric features for 3D model retrieval. In BMVC.

Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., & Azorin-Lopez, J. (2016). Pointnet: A 3D convolutional neural network for real-time object class recognition. In 2016 international joint conference on neural networks (IJCNN) (pp. 1578–1584). IEEE.

Guo, X. (1993). Three dimensional moment invariants under rigid transformation. In International conference on computer analysis of images and patterns (pp. 518–522). Berlin: Springer.CrossRef

Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., & Kwok, N. M. (2016). A comprehensive performance evaluation of 3D local feature descriptors. International Journal of Computer Vision, 116(1), 66–89.MathSciNetCrossRef

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

Henaff, M., Bruna, J., & LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163.

Hu, M. K. (1962). Visual pattern recognition by moment invariants. IRE Transactions on Information Theory, 8(2), 179–187.CrossRef

Ilse, M., Tomczak, J. M., & Welling, M. (2018). Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712.

Janssen, M. H., Janssen, A. J., Bekkers, E. J., Bescós, J. O., & Duits, R. (2018). Design and processing of invertible orientation scores of 3D images. Journal of Mathematical Imaging and Vision, 60(9), 1427–1458.MathSciNetCrossRef

Johns, E., Leutenegger, S., & Davison, A. J. (2016). Pairwise decomposition of image sequences for active multi-view recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3813–3822). IEEE.

Kanezaki, A., Matsushita, Y., & Nishida, Y. (2016). Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. arXiv preprint arXiv:1603.06208.

Khalil, M. I., & Bayoumi, M. M. (2001). A dyadic wavelet affine invariant function for 2D shape recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1152–1164.CrossRef

Khan, S. H., Hayat, M., & Barnes, N. (2018). Adversarial training of variational auto-encoders for high fidelity image generation. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 1312–1320). IEEE.

Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep KD-networks for the recognition of 3D point cloud models. In 2017 IEEE international conference on computer vision (ICCV) (pp. 863–872). IEEE.

Kondor, R. (2018). N-body networks: A covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588.

Kondor, R., Lin, Z., & Trivedi, S. (2018). Clebsch-gordan nets: A fully fourier space spherical convolutional neural network. arXiv preprint arXiv:1806.09231.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25, pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

Kurtek, S., Klassen, E., Ding, Z., & Srivastava, A. (2010). A novel Riemannian framework for shape analysis of 3D objects. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 1625–1632). IEEE.

Lavoué, G. (2012). Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer, 28(9), 931–942.CrossRef

Li, H. B., Huang, T. Z., Zhang, Y., Liu, X. P., & Gu, T. X. (2011). Chebyshev-type methods and preconditioning techniques. Applied Mathematics and Computation, 218(2), 260–270.MathSciNetCrossRef

Li, J., Chen, B. M., & Lee, G. H. (2018). So-net: Self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9397–9406).

Li, Y., Pirk, S., Su, H., Qi, C. R., & Guibas, L. J. (2016). FPNN: Field probing neural networks for 3D data. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 307–315). Curran Associates, Inc. http://papers.nips.cc/paper/6416-fpnn-fieldprobing-neural-networks-for-3d-data.pdf.

Lin, C., & Chellappa, R. (1987). Classification of partial 2-D shapes using Fourier descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 686–690.CrossRef

Liu, W., Zhang, Y.-M., Li, X., Yu, Z., Dai, B., Zhao, T., & Song, L. (2017). Deep hyperspherical learning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 3950–3960). Curran Associates, Inc. http://papers.nips.cc/paper/6984-deep-hyperspherical-learning.pdf.

Maron, H., Ben-Hamu, H., Shamir, N., & Lipman, Y. (2018). Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902.

Masci, J., Boscaini, D., Bronstein, M., & Vandergheynst, P. (2015). Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops (pp. 37–45).

Maturana, D., & Scherer, S. (2015). Voxnet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 922–928). IEEE.

Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model CNNS. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5115–5124).

Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4), 807–832.MathSciNetCrossRef

Papadakis, P., Pratikakis, I., Theoharis, T., Passalis, G., & Perantonis, S. (2008). 3D object retrieval using an efficient and compact hybrid shape descriptor. In Eurographics workshop on 3D object retrieval.

Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of computer vision and pattern recognition (CVPR) (Vol. 1(2), p. 4). IEEE.

Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, L. J. (2016). Volumetric and multi-view CNNS for object classification on 3D data. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp 5648–5656).

Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 5099–5108). Curran Associates, Inc. http://papers.nips.cc/paper/7095-pointnet-deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space.pdf.

Ramasinghe, S., Khan, S., & Barnes, N. (2019a). Volumetric convolution: Automatic representation learning in unit ball. arXiv preprint arXiv:1901.00616.

Ramasinghe, S., Khan, S., Barnes, N., & Gould, S. (2019b). Blended convolution and synthesis for efficient discrimination of 3D shapes. arXiv preprint arXiv:1908.10209.

Reininghaus, J., Huber, S., Bauer, U., & Kwitt, R. (2015). A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4741–4748).

Reiss, T. (1992). Features invariant to linear transformations in 2D and 3D. In 11th IAPR international conference on pattern recognition. Vol. III. Conference C: Image, speech and signal analysis (pp. 493–496). IEEE.

Ronchi, C., Iacono, R., & Paolucci, P. S. (1996). The “cubed sphere”: A new method for the solution of partial differential equations in spherical geometry. Journal of Computational Physics, 124(1), 93–114.MathSciNetCrossRef

Sedaghat, N., Zolfaghari, M., Amiri, E., & Brox, T. (2016). Orientation-boosted voxel nets for 3D object recognition. arXiv preprint arXiv:1604.03351.

Shi, B., Bai, S., Zhou, Z., & Bai, X. (2015). Deeppano: Deep panoramic representation for 3-D shape recognition. IEEE Signal Processing Letters, 22(12), 2339–2343.CrossRef

Simonovsky, M., & Komodakis, N. (2017). Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of CVPR.

Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M. H., & Kautz, J. (2018). Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2530–2539).

Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953).

Suk, T., & Flusser, J. (1996). Vertex-based features for recognition of projectively deformed polygons. Pattern Recognition, 29(3), 361–367.CrossRef

Tabia, H., Laga, H., Picard, D., & Gosselin, P. H. (2014). Covariance descriptors for 3D shape matching and retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4185–4192).

Tabia, H., Picard, D., Laga, H., & Gosselin, P. H. (2013). Compact vectors of locally aggregated tensors for 3D shape retrieval. In Eurographics workshop on 3D object retrieval.

Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3D shape retrieval. The Visual Computer, 25(8), 785–804.CrossRef

Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., & Riley, P. (2018). Tensor field networks: Rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219.

Tieng, Q. M., & Boles, W. W. (1995). An application of wavelet-based affine-invariant representation. Pattern Recognition Letters, 16(12), 1287–1296.CrossRef

Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In European conference on computer vision (pp. 356–369). Berlin: Springer.

Vranic, D. V., & Saupe, D. (2002). Description of 3D-shape using a complex function on the sphere. In 2002 IEEE international conference on multimedia and expo, 2002. ICME’02. Proceedings (Vol. 1, pp. 177–180) IEEE.

Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2018). Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829.

Weiler, M., Geiger, M., Welling, M., Boomsma, W., & Cohen, T. (2018). 3D steerable CNNS: Learning rotationally equivariant features in volumetric data. arXiv preprint arXiv:1807.02547.

Worrall, D. E., & Brostow, G. J. (2018). Cubenet: Equivariance to 3D rotation and translation. In European conference on computer vision.

Worrall, D. E., Garbin, S. J., Turmukhambetov, D., & Brostow, G. J. (2017). Harmonic networks: Deep translation and rotation equivariance. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7168–7177). IEEE.

Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 29, pp. 82–90). Curran Associates, Inc. http://papers.nips.cc/paper/6096-learning-a-probabilistic-latent-space-of-object-shapes-via-3d-generative-adversarial-modeling.pdf.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912–1920).

Xie, J., Fang, Y., Zhu, F., & Wong, E. (2015). Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1275–1283). IEEE.

Yang, B., Flusser, J., & Suk, T. (2015). 3D rotation invariants of Gaussian-hermite moments. Pattern Recognition Letters, 54, 18–26.CrossRef

Titel: Representation Learning on Unit Ball with 3D Roto-translational Equivariance
verfasst von: Sameera Ramasinghe
Salman Khan
Nick Barnes
Stephen Gould
Publikationsdatum: 21.12.2019
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 6/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01278-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2020

Fine-Grained Person Re-identification

Bottom-Up Scene Text Detection with Markov Clustering Networks

A Face Fairness Framework for 3D Meshes

Siamese Dense Network for Reflection Removal with Flash and No-Flash Image Pairs

Scalable Person Re-Identification by Harmonious Attention

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation