research-article

Exploring Deep Learning for View-Based 3D Model Retrieval

Authors:
Zan Gao

Tianjin University of Technology and Qilu University of Technology, Jinan, P.R China

Tianjin University of Technology and Qilu University of Technology, Jinan, P.R China
View Profile

,
Yinming Li

Tianjin University of Technology and Qilu University of Technology, Jinan, P.R China

Tianjin University of Technology and Qilu University of Technology, Jinan, P.R China
View Profile

,
Shaohua Wan

Zhongnan University of Economics and Law, Wuhan, China

Zhongnan University of Economics and Law, Wuhan, China

0000-0001-7013-9081
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16 Issue 1Article No.: 18pp 1–21https://doi.org/10.1145/3377876

Published:17 February 2020Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the robust features play a decisive role in the similarity measurement. Although deep learning has achieved comprehensive success in the field of computer vision, deep learning features are used for 3D model retrieval only in a small number of works. To the best of our knowledge, there is no benchmark to evaluate these deep learning features. To tackle this problem, in this work we systematically evaluate the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods. In detail, the performance of hand-crafted features and deep learning features are compared, and then the robustness of deep learning features is assessed. Finally, the difference between single-view deep learning features and multi-view deep learning features is also evaluated. By quantitatively analyzing the performances on different datasets, it is clear that these deep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the hand-crafted features when different degrees of noise are added into the image. The exploration of latent relationships among different views in multi-view deep learning network architectures shows that the performance of multi-view deep learning outperforms that of single-view deep learning features with low computational complexity.

References

Marija Mavar-Haramija, Alberto Prats-Galino, Juan A. Juanes Mendez, Anna Puigdelvoll-Sanchez, and Matteo De Notaris. 2015. Interactive 3D-PDF presentations for the simulation and quantification of extended endoscopic endonasal surgical approaches. Journal of Medical Systems 39, 10 (2015), 1--9.Google ScholarDigital Library
Z. Gao, S. H. Li, G. T. Zhang, Y. J. Zhu, C. Wang, and H. Zhang. 2017. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications 76, 19 (2017), 20125--20148.Google ScholarDigital Library
Luren Yang and Fritz Albregtsen. 1994. Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition 29, 7 (1994), 1061--1073.Google ScholarCross Ref
E. Persoon and K. S. Fu. 1977. Shape discrimination using Fourier descriptors. IEEE Transactions on Systems, Man, and Cybernetics 7, 3 (1977), 170--179.Google ScholarCross Ref
Ke Lu, Qian Wang, Jian Xue, and Weiguo Pan. 2014. 3D model retrieval and classification by semi-supervised learning with content-based similarity. Information Sciences 281 (2014), 703--713.Google ScholarDigital Library
Przemyslaw Polewski, Wei Yao, Marco Heurich, Peter Krzystek, and Uwe Stilla. 2015. Detection of fallen trees in ALS point clouds using a normalized cut approach trained by simulation. ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015), 252--271.Google ScholarCross Ref
Biao Leng, Changchun Du, Shuang Guo, Xiangyang Zhang, and Zhang Xiong. 2015. A powerful 3D model classification mechanism based on fusing multi-graph. Neurocomputing 168 (2015), 761--769.Google ScholarDigital Library
Anan Liu, Zhongyang Wang, Weizhi Nie, and Yuting Su. 2015. Graph-based characteristic view set extraction and matching for 3D model retrieval. Information Sciences 320 (2015), 429--442.Google ScholarDigital Library
An-An Liu, Wei-Zhi Nie, Yue Gao, and Yu-Ting Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.Google ScholarDigital Library
Zan Gao, Deyu Wang, Shaohua Wan, Hua Zhang, and Yinglong Wang. Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Future Generation Computer Systems 94, C (2019), 641--653.Google Scholar
Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, and Wen Gao. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44--54.Google ScholarCross Ref
Ling Yu Duan, Jie Lin, Zhe Wang, Tiejun Huang, and Wen Gao. 2015. Weighted component hashing of binary aggregated descriptors for fast visual search. IEEE Transactions on Multimedia 17, 6 (2015), 828--842.Google ScholarDigital Library
Ding Yun Chen, Xiao Pei Tian, Yu Te Shen, and Ouhyoung Ming. 2010. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2010), 223--232.Google ScholarCross Ref
Jau Ling Shih, Chang Hsing Lee, and Jian Tang Wang. 2007. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recognition 40, 1 (2007), 283--295.Google ScholarDigital Library
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893.Google ScholarDigital Library
Alireza Khotanzad and Y. H. Hong. 1990. Invariant image recognition by Zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 5 (1990), 489--497.Google ScholarDigital Library
Ryutarou Ohbuchi and Takahiko Furuya. 2009. Scale-weighted dense bag of visual features for 3D model retrieval from a partial view 3D model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 63--70.Google ScholarCross Ref
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1150.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. arXiv:1901.00596.Google Scholar
Shaohua Wan, Lianyong Qi, Xiaolong Xu, Chao Tong, and Zonghua Gu. 2019. Deep learning models for real-time human activity recognition with smartphones. Mobile Networks and Applications. Epub ahead of print (Dec. 30, 2019).Google Scholar
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434.Google Scholar
Shaohua Wan, Zonghua Gu, and Qiang Ni. 2019. Cognitive computing and wireless communications on the edge for healthcare service robots. Computer Communications 149 (2019), 99--106.Google ScholarCross Ref
Ning Xu, An-An Liu, Yongkang Wong, Yongdong Zhang, Weizhi Nie, Yuting Su, and Mohan Kankanhalli. 2019. Dual-stream recurrent neural network for video captioning. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2482--2493.Google ScholarCross Ref
Shaohua Wan and Sotirios Goudos. 2019. Faster R-CNN for multi-class fruit detection using a robotic vision system. Computer Networks 168 (2019), 107036.Google ScholarCross Ref
Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2019. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2019), 494--501.Google ScholarCross Ref
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264--272.Google ScholarCross Ref
Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proceedings of the ACM Conference on Multimedia. 1--8.Google ScholarDigital Library
Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3D pose estimation and 3D model retrieval for objects in the wild. arXiv:1803.11493.Google Scholar
Zan Gao, Kai Xin Xue, and Hua Zhang. 2017. Multi-view and multivariate Gaussian descriptor for 3D object retrieval. Multimedia Tools and Applications 1 (2017), 1--18.Google Scholar
Y. Gao, J. Tang, R. Hong, S. Yan, Q. Dai, N. Zhang, and T. S. Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.Google ScholarDigital Library
Petros Daras and Apostolos Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2--3 (2010), 229--247.Google ScholarDigital Library
Yue Gao, Qionghai Dai, and Nai Yao Zhang. 2010. 3D model comparison using spatial structure circular descriptor. Pattern Recognition 43, 3 (2010), 1142--1151.Google ScholarDigital Library
Bo Li and Henry Johan. 2013. 3D model retrieval using hybrid features and class information. Multimedia Tools and Applications 62, 3 (2013), 821--846.Google ScholarDigital Library
Laurent Lucas, Cline Loscos, and Yannick Remion. 2013. 3D model retrieval. In 3D Video: From Capture to Diffusion. John Wiley 8 Sons, 347--368.Google Scholar
S. Haykin and B. Kosko. 2001. Gradient-based learning applied to document recognition. In Intelligent Signal Processing. IEEE, Los Alamitos, CA, 306--351.Google Scholar
Weizhi Nie, Qun Cao, Anan Liu, and Yuting Su. 2017. Convolutional deep learning for 3D object retrieval. Multimedia Systems 23, 3 (2017), 1--8.Google ScholarDigital Library
A. A. Liu, W. Z. Nie, Y. Gao, and Y. T. Su. 2017. View-based 3-D model retrieval: A benchmark. IEEE Transactions on Cybernetics 48, 3 (2017), 916--928.Google ScholarCross Ref
M. P. Dubuisson and A. K. Jain. 2002. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition, Vol. 1. 566--568.Google Scholar
M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.Google Scholar
Tarik Filali Ansary, Mohamed Daoudi, and J. P. Vandeborre. 2006. A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 9, 1 (2006), 78--88.Google ScholarDigital Library
X. Liu, M. Wang, B. C. Yin, B. Huet, and X. Li. 2015. Event-based media enrichment using an adaptive probabilistic hypergraph model. IEEE Transactions on Cybernetics 45, 11 (2015), 2461.Google ScholarCross Ref
A. Liu, W. Nie, Y. Gao, and Y. Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.Google ScholarDigital Library
Ke Lu, Ning He, Jian Xu, Jiyang Dong, and Ling Shao. 2015. Learning view-model joint relevance for 3D object retrieval. IEEE Transactions on Image Processing 24, 5 (2015), 1449--1459.Google ScholarDigital Library
Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing Image Communication 26, 1 (2011), 39--47.Google ScholarDigital Library
M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the 10th IEEE International Conference on Computer Vision. 1482--1489.Google Scholar
Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492--505.Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09). 248--255.Google ScholarCross Ref
Z. Gao, H. Xuan, H. Zhang, S. Wan, and K. R. Choo. 2019. Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal 6, 6 (2019), 9280--9293.Google ScholarCross Ref
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learnedmiller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945--953.Google ScholarDigital Library
Zan Gao, Deyu Wang, Y. B. Xue, G. P Xu, H. Zhang, and Y. L. Wang. 2018. 3D object recognition based on pairwise multi-view convolutional neural networks. Journal of Visual Communication and Image Representation 56, C (2018), 305--315.Google Scholar
Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-pair convolutional neural networks for multi-view based 3D object retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1--8.Google Scholar
Bernt Schiele and Bastian Leibe. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2.Google Scholar
Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. 2004. The Princeton shape benchmark. In Proceedings of Shape Modeling Applications. IEEE, Los Alamitos, CA, 167--178.Google ScholarCross Ref
Yue Gao, Meng Wang, Rongrong Ji, Xindong Wu, and Qionghai Dai. 2013. 3-D object retrieval with Hausdorff distance learning. IEEE Transactions on Industrial Electronics 61, 4 (2013), 2088--2098.Google ScholarCross Ref
Yue Gao, Meng Wang, Zheng Jun Zha, Qi Tian, Qionghai Dai, and Naiyao Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018.Google ScholarDigital Library
Wei-Zhi Nie, An-An Liu, and Yu-Ting Su. 2016. 3D object retrieval based on sparse coding in weak supervision. Journal of Visual Communication and Image Representation 37, C (2016), 40--45.Google Scholar

Index Terms

Exploring Deep Learning for View-Based 3D Model Retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Presentation of retrieval results
    2. Retrieval tasks and goals
      1. Information extraction
2. Networks
  1. Network performance evaluation
    1. Network performance analysis

Recommendations

Multi-View Graph Matching for 3D Model Retrieval

3D model retrieval has been widely utilized in numerous domains, such as computer-aided design, digital entertainment, and virtual reality. Recently, many graph-based methods have been proposed to address this task by using multi-view information of 3D ...
Read More
Group-pair deep feature learning for multi-view 3d model retrieval
Abstract
This paper employs Convolutional Neural Networks with pooling module to extract view descriptor of 3D model, and proposes the Group-Pair Deep Feature Learning method for multi-view 3D model retrieval. In the method, view descriptor is learned by ...
Read More
View-based 3D model retrieval via supervised multi-view feature learning

With the development of the processing technologies of 3D model and the increasing of 3D model in different application flieds, 3D model retrieval is attracting more and more people's attention. In order to handle this problem, most of approaches focus ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 1
February 2020
363 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3384216
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 February 2020
- Accepted: 1 January 2020
- Revised: 1 November 2019
- Received: 1 August 2019
Published in tomm Volume 16, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D model retrieval
benchmark
deep learning features
handcrafted features
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 85
  Total Citations
  View Citations
- 1,051
  Total Downloads
- Downloads (Last 12 months)158
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Exploring Deep Learning for View-Based 3D Model Retrieval

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

Multi-View Graph Matching for 3D Model Retrieval

Group-pair deep feature learning for multi-view 3d model retrieval

View-based 3D model retrieval via supervised multi-view feature learning