Skip to main content

2022 | OriginalPaper | Buchkapitel

Deep Laparoscopic Stereo Matching with Transformers

verfasst von : Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zhiyong Wang, Zongyuan Ge

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2022

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
2.
Zurück zum Zitat Cartucho, J., Tukra, S., Li, Y., S. Elson, D., Giannarou, S.: Visionblender: a tool to efficiently generate computer vision datasets for robotic surgery. CMBBE Imaging Vis. 9(4), 331–338 (2021) Cartucho, J., Tukra, S., Li, Y., S. Elson, D., Giannarou, S.: Visionblender: a tool to efficiently generate computer vision datasets for robotic surgery. CMBBE Imaging Vis. 9(4), 331–338 (2021)
3.
Zurück zum Zitat Chaudhari, P., et al.: Entropy-SGD: biasing gradient descent into wide valleys. J. Stat. Mech. Theory Exp. 2019(12), 124018 (2019)MathSciNetCrossRef Chaudhari, P., et al.: Entropy-SGD: biasing gradient descent into wide valleys. J. Stat. Mech. Theory Exp. 2019(12), 124018 (2019)MathSciNetCrossRef
4.
Zurück zum Zitat Cheng, X., Zhong, Y., Dai, Y., Ji, P., Li, H.: Noise-aware unsupervised deep lidar-stereo fusion. In: CVPR (2019) Cheng, X., Zhong, Y., Dai, Y., Ji, P., Li, H.: Noise-aware unsupervised deep lidar-stereo fusion. In: CVPR (2019)
5.
Zurück zum Zitat Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. In: NeurIPS, vol. 33 (2020) Cheng, X., et al.: Hierarchical neural architecture search for deep stereo matching. In: NeurIPS, vol. 33 (2020)
7.
Zurück zum Zitat Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929 (2020)
9.
Zurück zum Zitat Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 2010 20th International Conference on Pattern Recognition, pp. 2366–2369. IEEE (2010) Hore, A., Ziou, D.: Image quality metrics: Psnr vs. ssim. In: 2010 20th International Conference on Pattern Recognition, pp. 2366–2369. IEEE (2010)
11.
Zurück zum Zitat Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. ICLR (2017) Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. ICLR (2017)
12.
Zurück zum Zitat Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. NeurIPS 31 (2018) Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. NeurIPS 31 (2018)
13.
Zurück zum Zitat Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: ICCV, pp. 6197–6206, October 2021 Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: ICCV, pp. 6197–6206, October 2021
14.
Zurück zum Zitat Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. arXiv preprint arXiv:2109.07547 (2021) Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. arXiv preprint arXiv:​2109.​07547 (2021)
15.
Zurück zum Zitat Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)
18.
Zurück zum Zitat Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016) Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016)
19.
Zurück zum Zitat Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015) Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)
20.
Zurück zum Zitat Nicolau, S., Soler, L., Mutter, D., Marescaux, J.: Augmented reality in laparoscopic surgical oncology. Surg. Oncol. 20(3), 189–201 (2011)CrossRef Nicolau, S., Soler, L., Mutter, D., Marescaux, J.: Augmented reality in laparoscopic surgical oncology. Surg. Oncol. 20(3), 189–201 (2011)CrossRef
21.
Zurück zum Zitat Overley, S.C., Cho, S.K., Mehta, A.I., Arnold, P.M.: Navigation and robotics in spinal surgery: where are we now? Neurosurgery 80(3S), S86–S99 (2017)CrossRef Overley, S.C., Cho, S.K., Mehta, A.I., Arnold, P.M.: Navigation and robotics in spinal surgery: where are we now? Neurosurgery 80(3S), S86–S99 (2017)CrossRef
22.
Zurück zum Zitat Qin, Z., et al.: Cosformer: Rethinking softmax in attention. In: ICLR (2022) Qin, Z., et al.: Cosformer: Rethinking softmax in attention. In: ICLR (2022)
24.
Zurück zum Zitat Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)CrossRef Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)CrossRef
25.
Zurück zum Zitat Sun, W., Qin, Z., Deng, H., Wang, J., Zhang, Y., Zhang, K., Barnes, N., Birchfield, S., Kong, L., Zhong, Y.: Vicinity vision transformer. In: arxiv. p. 2206.10552 (2022) Sun, W., Qin, Z., Deng, H., Wang, J., Zhang, Y., Zhang, K., Barnes, N., Birchfield, S., Kong, L., Zhong, Y.: Vicinity vision transformer. In: arxiv. p. 2206.10552 (2022)
26.
Zurück zum Zitat Wang, J., et al.: Deep two-view structure-from-motion revisited. In: CVPR, pp. 8953–8962, June 2021 Wang, J., et al.: Deep two-view structure-from-motion revisited. In: CVPR, pp. 8953–8962, June 2021
27.
Zurück zum Zitat Wang, J., Zhong, Y., Dai, Y., Zhang, K., Ji, P., Li, H.: Displacement-invariant matching cost learning for accurate optical flow estimation. In: NeurIPS (2020) Wang, J., Zhong, Y., Dai, Y., Zhang, K., Ji, P., Li, H.: Displacement-invariant matching cost learning for accurate optical flow estimation. In: NeurIPS (2020)
28.
Zurück zum Zitat Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: ICCV, pp. 568–578 (2021) Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: ICCV, pp. 568–578 (2021)
29.
Zurück zum Zitat Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004) Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)
31.
Zurück zum Zitat Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv preprint arXiv:1705.08260 (2017) Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv preprint arXiv:​1705.​08260 (2017)
32.
Zurück zum Zitat Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability (2017) Zhong, Y., Dai, Y., Li, H.: Self-supervised learning for stereo matching with self-improving ability (2017)
33.
Zurück zum Zitat Zhong, Y., Dai, Y., Li, H.: 3d geometry-aware semantic labeling of outdoor street scenes. In: ICPR (2018) Zhong, Y., Dai, Y., Li, H.: 3d geometry-aware semantic labeling of outdoor street scenes. In: ICPR (2018)
35.
Zurück zum Zitat Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: CVPR (2019) Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: CVPR (2019)
37.
Zurück zum Zitat Zhong, Y., et al.: Displacement-invariant cost computation for stereo matching. In: IJCV, March 2022 Zhong, Y., et al.: Displacement-invariant cost computation for stereo matching. In: IJCV, March 2022
Metadaten
Titel
Deep Laparoscopic Stereo Matching with Transformers
verfasst von
Xuelian Cheng
Yiran Zhong
Mehrtash Harandi
Tom Drummond
Zhiyong Wang
Zongyuan Ge
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-16449-1_44

Premium Partner