Skip to main content

2022 | OriginalPaper | Buchkapitel

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

verfasst von : Zhelun Shen, Yuchao Dai, Xibin Song, Zhibo Rao, Dingfu Zhou, Liangjun Zhang

Erschienen in: Computer Vision – ECCV 2022

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. In particular, our PCW-Net is designed for two purposes. First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation. Multi-scale receptive fields can be covered by fusing multi-scale combination volumes, thus, domain-invariant features can be extracted. Second, we construct the warping volume at the last level of the pyramid for disparity refinement. The proposed warping volume can narrow down the residue searching range from the initial disparity searching range to a fine-grained one, which can dramatically alleviate the difficulty of the network to find the correct residue in an unconstrained residue searching space. When training on synthetic datasets and generalizing to unseen real datasets, our method shows strong cross-domain generalization and outperforms existing state-of-the-arts with a large margin. After fine-tuning on the real datasets, our method ranks \(1^{st}\) on KITTI 2012, \(2^{nd}\) on KITTI 2015, and \(1^{st}\) on the Argoverse among all published methods as of 7, March 2022.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Biswas, J., Veloso, M.: Depth camera based localization and navigation for indoor mobile robots. In: IEEE International Coference on Robtics and Automation (ICRA), pp. 1697–1702 (2011) Biswas, J., Veloso, M.: Depth camera based localization and navigation for indoor mobile robots. In: IEEE International Coference on Robtics and Automation (ICRA), pp. 1697–1702 (2011)
2.
Zurück zum Zitat Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018) Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5410–5418 (2018)
3.
Zurück zum Zitat Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2722–2730 (2015) Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2722–2730 (2015)
4.
Zurück zum Zitat Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534 (2019) Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2524–2534 (2019)
5.
Zurück zum Zitat Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation network. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2361–2379 (2019)CrossRef Cheng, X., Wang, P., Yang, R.: Learning depth with convolutional spatial propagation network. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 42(10), 2361–2379 (2019)CrossRef
6.
Zurück zum Zitat Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. In: Advances in Neural Information Processing Systems (NIPS), pp. 22158–22169 (2020) Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. In: Advances in Neural Information Processing Systems (NIPS), pp. 22158–22169 (2020)
7.
Zurück zum Zitat Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: DeepPruner: learning efficient stereo matching via differentiable patchmatch. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4384–4393 (2019) Duggal, S., Wang, S., Ma, W.C., Hu, R., Urtasun, R.: DeepPruner: learning efficient stereo matching via differentiable patchmatch. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4384–4393 (2019)
8.
Zurück zum Zitat Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012) Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)
9.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
10.
Zurück zum Zitat Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2020) Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P.: Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2020)
11.
Zurück zum Zitat Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019) Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3282 (2019)
12.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015) Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2017–2025 (2015)
13.
Zurück zum Zitat Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 66–75 (2017) Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 66–75 (2017)
14.
Zurück zum Zitat Liang, Z., et al.: Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 43, 300–315 (2019)CrossRef Liang, Z., et al.: Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 43, 300–315 (2019)CrossRef
15.
Zurück zum Zitat Mao, Y., et al.: UASNet: Uncertainty adaptive sampling network for deep stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6311–6319 (2021) Mao, Y., et al.: UASNet: Uncertainty adaptive sampling network for deep stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6311–6319 (2021)
16.
Zurück zum Zitat Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016) Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040–4048 (2016)
17.
Zurück zum Zitat Menze, M., Heipke, C., Geiger, A.: Object scene flow. ISPRS J. Photogramm. Remote. Sens. 140, 60–76 (2018)CrossRef Menze, M., Heipke, C., Geiger, A.: Object scene flow. ISPRS J. Photogramm. Remote. Sens. 140, 60–76 (2018)CrossRef
19.
Zurück zum Zitat Nie, G.Y., et al.: Multi-level context ultra-aggregation for stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3283–3291 (2019) Nie, G.Y., et al.: Multi-level context ultra-aggregation for stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3283–3291 (2019)
20.
Zurück zum Zitat Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE/CVF International Conference on Computer Vision workshop (ICCV workshop), pp. 887–895 (2017) Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: IEEE/CVF International Conference on Computer Vision workshop (ICCV workshop), pp. 887–895 (2017)
21.
Zurück zum Zitat Rao, Z., Dai, Y., Shen, Z., He, R.: Rethinking training strategy in stereo matching. IEEE Trans. Neural Netw. Learn. Syst. (2022) Rao, Z., Dai, Y., Shen, Z., He, R.: Rethinking training strategy in stereo matching. IEEE Trans. Neural Netw. Learn. Syst. (2022)
22.
Zurück zum Zitat Rao, Z., et al.: NLCA-Net: a non-local context attention network for stereo matching. APSIPA Trans. Signal Inf. Process. 9, e18 (2020)CrossRef Rao, Z., et al.: NLCA-Net: a non-local context attention network for stereo matching. APSIPA Trans. Signal Inf. Process. 9, e18 (2020)CrossRef
23.
Zurück zum Zitat Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. (IJCV) 47(1), 7–42 (2002)CrossRefMATH Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. (IJCV) 47(1), 7–42 (2002)CrossRefMATH
24.
Zurück zum Zitat Schops, T.,et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269 (2017) Schops, T.,et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3260–3269 (2017)
25.
Zurück zum Zitat Shen, Z., Dai, Y., Rao, Z.: CFNet: cascade and fused cost volume for robust stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915 (2021) Shen, Z., Dai, Y., Rao, Z.: CFNet: cascade and fused cost volume for robust stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13906–13915 (2021)
26.
Zurück zum Zitat Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018) Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2018)
27.
Zurück zum Zitat Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019) Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 195–204 (2019)
28.
Zurück zum Zitat Wang, J., Jampani, V., Sun, D., Loop, C., Birchfield, S., Kautz, J.: Improving deep stereo network generalization with geometric priors. arXiv preprint arXiv:2008.11098 (2020) Wang, J., Jampani, V., Sun, D., Loop, C., Birchfield, S., Kautz, J.: Improving deep stereo network generalization with geometric priors. arXiv preprint arXiv:​2008.​11098 (2020)
29.
Zurück zum Zitat Wilson, B.,et al.: Argoverse 2.0: Next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021) Wilson, B.,et al.: Argoverse 2.0: Next generation datasets for self-driving perception and forecasting. In: Advances in Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
30.
Zurück zum Zitat Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7484–7493 (2019) Wu, Z., Wu, X., Zhang, X., Wang, S., Ju, L.: Semantic stereo matching with pyramid cost volumes. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7484–7493 (2019)
31.
Zurück zum Zitat Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5515–5524 (2019) Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5515–5524 (2019)
32.
Zurück zum Zitat Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886 (2020) Yang, J., Mao, W., Alvarez, J.M., Liu, M.: Cost volume pyramid based depth inference for multi-view stereo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4877–4886 (2020)
33.
Zurück zum Zitat Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016) Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (ICLR) (2016)
34.
Zurück zum Zitat Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019) Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194 (2019)
35.
Zurück zum Zitat Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: the Europe Conference on Computer Vision (ECCV), pp. 420–439 (2020) Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. In: the Europe Conference on Computer Vision (ECCV), pp. 420–439 (2020)
36.
Zurück zum Zitat Zhang, S., Wang, Z., Wang, Q., Zhang, J., Wei, G., Chu, X.: EDNet: efficient disparity estimation with cost volume combination and attention-based spatial residual. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5433–5442 (2021) Zhang, S., Wang, Z., Wang, Q., Zhang, J., Wei, G., Chu, X.: EDNet: efficient disparity estimation with cost volume combination and attention-based spatial residual. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5433–5442 (2021)
37.
Zurück zum Zitat Zhang, Y.,et al.: Adaptive unimodal cost volume filtering for deep stereo matching, pp. 12926–12934 (2020) Zhang, Y.,et al.: Adaptive unimodal cost volume filtering for deep stereo matching, pp. 12926–12934 (2020)
38.
Zurück zum Zitat Zhong, Y., et al.: Displacement-invariant cost computation for stereo matching. Int. J. Comput. Vis. 130(5), 1196–1209 (2022)CrossRef Zhong, Y., et al.: Displacement-invariant cost computation for stereo matching. Int. J. Comput. Vis. 130(5), 1196–1209 (2022)CrossRef
Metadaten
Titel
PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching
verfasst von
Zhelun Shen
Yuchao Dai
Xibin Song
Zhibo Rao
Dingfu Zhou
Liangjun Zhang
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-19824-3_17

Premium Partner