skip to main content
research-article

Exploring Deep Learning for View-Based 3D Model Retrieval

Authors Info & Claims
Published:17 February 2020Publication History
Skip Abstract Section

Abstract

In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the robust features play a decisive role in the similarity measurement. Although deep learning has achieved comprehensive success in the field of computer vision, deep learning features are used for 3D model retrieval only in a small number of works. To the best of our knowledge, there is no benchmark to evaluate these deep learning features. To tackle this problem, in this work we systematically evaluate the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods. In detail, the performance of hand-crafted features and deep learning features are compared, and then the robustness of deep learning features is assessed. Finally, the difference between single-view deep learning features and multi-view deep learning features is also evaluated. By quantitatively analyzing the performances on different datasets, it is clear that these deep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the hand-crafted features when different degrees of noise are added into the image. The exploration of latent relationships among different views in multi-view deep learning network architectures shows that the performance of multi-view deep learning outperforms that of single-view deep learning features with low computational complexity.

References

  1. Marija Mavar-Haramija, Alberto Prats-Galino, Juan A. Juanes Mendez, Anna Puigdelvoll-Sanchez, and Matteo De Notaris. 2015. Interactive 3D-PDF presentations for the simulation and quantification of extended endoscopic endonasal surgical approaches. Journal of Medical Systems 39, 10 (2015), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Gao, S. H. Li, G. T. Zhang, Y. J. Zhu, C. Wang, and H. Zhang. 2017. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications 76, 19 (2017), 20125--20148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luren Yang and Fritz Albregtsen. 1994. Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition 29, 7 (1994), 1061--1073.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Persoon and K. S. Fu. 1977. Shape discrimination using Fourier descriptors. IEEE Transactions on Systems, Man, and Cybernetics 7, 3 (1977), 170--179.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ke Lu, Qian Wang, Jian Xue, and Weiguo Pan. 2014. 3D model retrieval and classification by semi-supervised learning with content-based similarity. Information Sciences 281 (2014), 703--713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Przemyslaw Polewski, Wei Yao, Marco Heurich, Peter Krzystek, and Uwe Stilla. 2015. Detection of fallen trees in ALS point clouds using a normalized cut approach trained by simulation. ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015), 252--271.Google ScholarGoogle ScholarCross RefCross Ref
  7. Biao Leng, Changchun Du, Shuang Guo, Xiangyang Zhang, and Zhang Xiong. 2015. A powerful 3D model classification mechanism based on fusing multi-graph. Neurocomputing 168 (2015), 761--769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Anan Liu, Zhongyang Wang, Weizhi Nie, and Yuting Su. 2015. Graph-based characteristic view set extraction and matching for 3D model retrieval. Information Sciences 320 (2015), 429--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. An-An Liu, Wei-Zhi Nie, Yue Gao, and Yu-Ting Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zan Gao, Deyu Wang, Shaohua Wan, Hua Zhang, and Yinglong Wang. Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Future Generation Computer Systems 94, C (2019), 641--653.Google ScholarGoogle Scholar
  11. Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, and Wen Gao. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44--54.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ling Yu Duan, Jie Lin, Zhe Wang, Tiejun Huang, and Wen Gao. 2015. Weighted component hashing of binary aggregated descriptors for fast visual search. IEEE Transactions on Multimedia 17, 6 (2015), 828--842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ding Yun Chen, Xiao Pei Tian, Yu Te Shen, and Ouhyoung Ming. 2010. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2010), 223--232.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jau Ling Shih, Chang Hsing Lee, and Jian Tang Wang. 2007. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recognition 40, 1 (2007), 283--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alireza Khotanzad and Y. H. Hong. 1990. Invariant image recognition by Zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 5 (1990), 489--497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ryutarou Ohbuchi and Takahiko Furuya. 2009. Scale-weighted dense bag of visual features for 3D model retrieval from a partial view 3D model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 63--70.Google ScholarGoogle ScholarCross RefCross Ref
  18. David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1150.Google ScholarGoogle ScholarCross RefCross Ref
  19. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  21. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.Google ScholarGoogle Scholar
  22. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  23. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. arXiv:1901.00596.Google ScholarGoogle Scholar
  24. Shaohua Wan, Lianyong Qi, Xiaolong Xu, Chao Tong, and Zonghua Gu. 2019. Deep learning models for real-time human activity recognition with smartphones. Mobile Networks and Applications. Epub ahead of print (Dec. 30, 2019).Google ScholarGoogle Scholar
  25. Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434.Google ScholarGoogle Scholar
  26. Shaohua Wan, Zonghua Gu, and Qiang Ni. 2019. Cognitive computing and wireless communications on the edge for healthcare service robots. Computer Communications 149 (2019), 99--106.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ning Xu, An-An Liu, Yongkang Wong, Yongdong Zhang, Weizhi Nie, Yuting Su, and Mohan Kankanhalli. 2019. Dual-stream recurrent neural network for video captioning. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2482--2493.Google ScholarGoogle ScholarCross RefCross Ref
  28. Shaohua Wan and Sotirios Goudos. 2019. Faster R-CNN for multi-class fruit detection using a robotic vision system. Computer Networks 168 (2019), 107036.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2019. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2019), 494--501.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264--272.Google ScholarGoogle ScholarCross RefCross Ref
  31. Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proceedings of the ACM Conference on Multimedia. 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3D pose estimation and 3D model retrieval for objects in the wild. arXiv:1803.11493.Google ScholarGoogle Scholar
  33. Zan Gao, Kai Xin Xue, and Hua Zhang. 2017. Multi-view and multivariate Gaussian descriptor for 3D object retrieval. Multimedia Tools and Applications 1 (2017), 1--18.Google ScholarGoogle Scholar
  34. Y. Gao, J. Tang, R. Hong, S. Yan, Q. Dai, N. Zhang, and T. S. Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Petros Daras and Apostolos Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2--3 (2010), 229--247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yue Gao, Qionghai Dai, and Nai Yao Zhang. 2010. 3D model comparison using spatial structure circular descriptor. Pattern Recognition 43, 3 (2010), 1142--1151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bo Li and Henry Johan. 2013. 3D model retrieval using hybrid features and class information. Multimedia Tools and Applications 62, 3 (2013), 821--846.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Laurent Lucas, Cline Loscos, and Yannick Remion. 2013. 3D model retrieval. In 3D Video: From Capture to Diffusion. John Wiley 8 Sons, 347--368.Google ScholarGoogle Scholar
  39. S. Haykin and B. Kosko. 2001. Gradient-based learning applied to document recognition. In Intelligent Signal Processing. IEEE, Los Alamitos, CA, 306--351.Google ScholarGoogle Scholar
  40. Weizhi Nie, Qun Cao, Anan Liu, and Yuting Su. 2017. Convolutional deep learning for 3D object retrieval. Multimedia Systems 23, 3 (2017), 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. A. Liu, W. Z. Nie, Y. Gao, and Y. T. Su. 2017. View-based 3-D model retrieval: A benchmark. IEEE Transactions on Cybernetics 48, 3 (2017), 916--928.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. P. Dubuisson and A. K. Jain. 2002. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition, Vol. 1. 566--568.Google ScholarGoogle Scholar
  43. M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.Google ScholarGoogle Scholar
  44. Tarik Filali Ansary, Mohamed Daoudi, and J. P. Vandeborre. 2006. A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 9, 1 (2006), 78--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. X. Liu, M. Wang, B. C. Yin, B. Huet, and X. Li. 2015. Event-based media enrichment using an adaptive probabilistic hypergraph model. IEEE Transactions on Cybernetics 45, 11 (2015), 2461.Google ScholarGoogle ScholarCross RefCross Ref
  46. A. Liu, W. Nie, Y. Gao, and Y. Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ke Lu, Ning He, Jian Xu, Jiyang Dong, and Ling Shao. 2015. Learning view-model joint relevance for 3D object retrieval. IEEE Transactions on Image Processing 24, 5 (2015), 1449--1459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing Image Communication 26, 1 (2011), 39--47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the 10th IEEE International Conference on Computer Vision. 1482--1489.Google ScholarGoogle Scholar
  50. Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492--505.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09). 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  52. Z. Gao, H. Xuan, H. Zhang, S. Wan, and K. R. Choo. 2019. Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal 6, 6 (2019), 9280--9293.Google ScholarGoogle ScholarCross RefCross Ref
  53. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learnedmiller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945--953.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Zan Gao, Deyu Wang, Y. B. Xue, G. P Xu, H. Zhang, and Y. L. Wang. 2018. 3D object recognition based on pairwise multi-view convolutional neural networks. Journal of Visual Communication and Image Representation 56, C (2018), 305--315.Google ScholarGoogle Scholar
  55. Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-pair convolutional neural networks for multi-view based 3D object retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1--8.Google ScholarGoogle Scholar
  56. Bernt Schiele and Bastian Leibe. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2.Google ScholarGoogle Scholar
  57. Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. 2004. The Princeton shape benchmark. In Proceedings of Shape Modeling Applications. IEEE, Los Alamitos, CA, 167--178.Google ScholarGoogle ScholarCross RefCross Ref
  58. Yue Gao, Meng Wang, Rongrong Ji, Xindong Wu, and Qionghai Dai. 2013. 3-D object retrieval with Hausdorff distance learning. IEEE Transactions on Industrial Electronics 61, 4 (2013), 2088--2098.Google ScholarGoogle ScholarCross RefCross Ref
  59. Yue Gao, Meng Wang, Zheng Jun Zha, Qi Tian, Qionghai Dai, and Naiyao Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Wei-Zhi Nie, An-An Liu, and Yu-Ting Su. 2016. 3D object retrieval based on sparse coding in weak supervision. Journal of Visual Communication and Image Representation 37, C (2016), 40--45.Google ScholarGoogle Scholar

Index Terms

  1. Exploring Deep Learning for View-Based 3D Model Retrieval

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1
          February 2020
          363 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3384216
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2020
          • Accepted: 1 January 2020
          • Revised: 1 November 2019
          • Received: 1 August 2019
          Published in tomm Volume 16, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format