nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

verfasst von : Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, Liangjun Zhang

Erschienen in: Computer Vision – ECCV 2022

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. In particular, our PCW-Net is designed for two purposes. First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation. Multi-scale receptive fields can be covered by fusing multi-scale combination volumes, thus, domain-invariant features can be extracted. Second, we construct the warping volume at the last level of the pyramid for disparity refinement. The proposed warping volume can narrow down the residue searching range from the initial disparity searching range to a fine-grained one, which can dramatically alleviate the difficulty of the network to find the correct residue in an unconstrained residue searching space. When training on synthetic datasets and generalizing to unseen real datasets, our method shows strong cross-domain generalization and outperforms existing state-of-the-arts with a large margin. After fine-tuning on the real datasets, our method ranks \(1^{st}\) on KITTI 2012, \(2^{nd}\) on KITTI 2015, and \(1^{st}\) on the Argoverse among all published methods as of 7, March 2022.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Context-Enhanced Stereo Transformer

Nächstes Kapitel Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

Nur mit Berechtigung zugänglich

http://www.cvlibs.net/datasets/kitti.

https://eval.ai/web/challenges/challenge-page/917/leaderboard/2412.

Biswas, J., Veloso, M.: Depth camera based localization and navigation for indoor mobile robots. In: IEEE International Coference on Robtics and Automation (ICRA), pp. 1697–1702 (2011)

Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)

Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2722–2730 (2015)

Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534 (2019)

Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation network. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2361–2379 (2019)CrossRef

Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. In: Advances in Neural Information Processing Systems (NIPS), pp. 22158–22169 (2020)

Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: DeepPruner: learning efficient stereo matching via differentiable patchmatch. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4384–4393 (2019)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)

Girshick, R.: Fast R-CNN. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

10.

Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2020)

11.

Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)

12.

Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)

13.

Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 66–75 (2017)

14.

Liang, Z., et al.: Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 43, 300–315 (2019)CrossRef

15.

Mao, Y., et al.: UASNet: Uncertainty adaptive sampling network for deep stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6311–6319 (2021)

16.

Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)

17.

Menze, M., Heipke, C., Geiger, A.: Object scene flow. ISPRS J. Photogramm. Remote. Sens. 140, 60–76 (2018)CrossRef

18.

Misra, D.: Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681 (2019)

19.

Nie, G.Y., et al.: Multi-level context ultra-aggregation for stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3283–3291 (2019)

20.

Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE/CVF International Conference on Computer Vision workshop (ICCV workshop), pp. 887–895 (2017)

21.

Rao, Z., Dai, Y., Shen, Z., He, R.: Rethinking training strategy in stereo matching. IEEE Trans. Neural Netw. Learn. Syst. (2022)

22.

Rao, Z., et al.: NLCA-Net: a non-local context attention network for stereo matching. APSIPA Trans. Signal Inf. Process. 9, e18 (2020)CrossRef

23.

Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. (IJCV) 47(1), 7–42 (2002)CrossRefMATH

24.

Schops, T.,et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269 (2017)

25.

Shen, Z., Dai, Y., Rao, Z.: CFNet: cascade and fused cost volume for robust stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915 (2021)

26.

Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)

27.

Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019)

28.

Wang, J., Jampani, V., Sun, D., Loop, C., Birchfield, S., Kautz, J.: Improving deep stereo network generalization with geometric priors. arXiv preprint arXiv:2008.11098 (2020)

29.

Wilson, B.,et al.: Argoverse 2.0: Next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)

30.

Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7484–7493 (2019)

31.

Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5515–5524 (2019)

32.

Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886 (2020)

33.

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016)

34.

Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)

35.

Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: the Europe Conference on Computer Vision (ECCV), pp. 420–439 (2020)

36.

Zhang, S., Wang, Z., Wang, Q., Zhang, J., Wei, G., Chu, X.: EDNet: efficient disparity estimation with cost volume combination and attention-based spatial residual. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5433–5442 (2021)

37.

Zhang, Y.,et al.: Adaptive unimodal cost volume filtering for deep stereo matching, pp. 12926–12934 (2020)

38.

Zhong, Y., et al.: Displacement-invariant cost computation for stereo matching. Int. J. Comput. Vis. 130(5), 1196–1209 (2022)CrossRef

Titel: PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching
verfasst von: Zhelun Shen
Yuchao Dai
Xibin Song
Zhibo Rao
Dingfu Zhou
Liangjun Zhang
Verlag: Springer Nature Switzerland
Buch: Computer Vision – ECCV 2022
Print ISBN: 978-3-031-19823-6

Electronic ISBN: 978-3-031-19824-3

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-031-19824-3_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner