nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

verfasst von : Xiaojie Gao, Yueming Jin, Yonghao Long, Qi Dou, Pheng-Ann Heng

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Real-time surgical phase recognition is a fundamental task in modern operating rooms. Previous works tackle this task relying on architectures arranged in spatio-temporal order, however, the supportive benefits of intermediate spatial features are not considered. In this paper, we introduce, for the first time in surgical workflow analysis, Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate surgical phase recognition. Our hybrid embedding aggregation Transformer fuses cleverly designed spatial and temporal embeddings by allowing for active queries based on spatial information from temporal embedding sequences. More importantly, our framework processes the hybrid embeddings in parallel to achieve a high inference speed. Our method is thoroughly validated on two large surgical video datasets, i.e., Cholec80 and M2CAI16 Challenge datasets, and outperforms the state-of-the-art approaches at a processing speed of 91 fps.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Rapid Treatment Planning for Low-dose-rate Prostate Brachytherapy with TP-GAN

Nächstes Kapitel OperA: Attention-Regularized Transformers for Surgical Phase Recognition

Nur mit Berechtigung zugänglich

http://camma.u-strasbg.fr/datasets.

Zero padding is applied if necessary.

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

Bricon-Souf, N., Newman, C.R.: Context awareness in health care: a review. Int. J. Med. Informatics 76(1), 2–12 (2007)CrossRef

Charrière, K., et al.: Real-time analysis of cataract surgery videos using statistical models. Multimedia Tools Appl., 1–19 (2017). https://doi.org/10.1007/s11042-017-4793-8

Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33CrossRef

Dergachyova, O., Bouget, D., Huaulmé, A., Morandi, X., Jannin, P.: Automatic data-driven real-time segmentation and recognition of surgical workflow. Int. J. Comput. Assist. Radiol. Surg. 11(6), 1081–1089 (2016). https://doi.org/10.1007/s11548-016-1371-xCrossRef

Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)

Funke, I., et al.: Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 467–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_52CrossRef

Gao, X., Jin, Y., Dou, Q., Heng, P.A.: Automatic gesture recognition in robot-assisted surgery with reinforcement learning and tree search. In: IEEE International Conference on Robotics and Automation, pp. 8440–8446. IEEE (2020)

Han, K., et al.: A survey on visual transformer. arXiv preprint arXiv:2012.12556 (2020)

10.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

11.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

12.

Jin, Y., Cheng, K., Dou, Q., Heng, P.-A.: Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 440–448. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_49CrossRef

13.

Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2018)CrossRef

14.

Jin, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)

15.

Jin, Y., Long, Y., Chen, C., Zhao, Z., Dou, Q., Heng, P.A.: Temporal memory relation network for workflow recognition from surgical video. IEEE Trans. Med. Imaging (2021)

16.

Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)

17.

Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 156–165 (2017)

18.

Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7CrossRef

19.

Maier-Hein, L., et al.: Surgical data science for next-generation interventions. Nat. Biomed. Eng. (2017)

20.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)

21.

Padoy, N.: Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy Allied Technol. 28(2), 82–90 (2019)CrossRef

22.

Padoy, N., Blum, T., Feussner, H., Berger, M.O., Navab, N.: On-line recognition of surgical activity for monitoring in the operating room. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1718–1724 (2008)

23.

Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans. Med. Imaging 33(12), 2352–2360 (2014)CrossRef

24.

Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for surgical workflow challenge at M2CAI 2016. arXiv preprint arXiv:1610.08844 (2016)

25.

Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: MICCAI modeling and monitoring of computer assisted interventions challenge. http://camma.u-strasbg.fr/m2cai2016/

26.

Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)CrossRef

27.

Twinanda, A.P.: Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos. Ph.D. thesis, Strasbourg (2017)

28.

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

29.

Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3523–3532 (2019)

30.

Yi, F., Jiang, T.: Hard frame detection and online mapping for surgical phase recognition. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 449–457. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_50CrossRef

31.

Zhang, J., et al.: Symmetric dilated convolution for surgical gesture recognition. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 409–418. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_39CrossRef

32.

Zhao, Z., Jin, Y., Gao, X., Dou, Q., Heng, P.-A.: Learning motion flows for semi-supervised instrument segmentation from robotic surgical video. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 679–689. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_65CrossRef

33.

Zisimopoulos, O., et al.: DeepPhase: surgical phase recognition in CATARACTS videos. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 265–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_31CrossRef

Titel: Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer
verfasst von: Xiaojie Gao
Yueming Jin
Yonghao Long
Qi Dou
Pheng-Ann Heng
Verlag: Springer International Publishing
Buch: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021
Print ISBN: 978-3-030-87201-4

Electronic ISBN: 978-3-030-87202-1

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-87202-1_57

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner