nach oben

International Journal of Computer Assisted Radiology and Surgery

Erschienen in:

21.09.2022 | Original Article

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis

verfasst von: Yueming Jin, Yonghao Long, Xiaojie Gao, Danail Stoyanov, Qi Dou, Pheng-Ann Heng

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 12/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Purpose

Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial–temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation.

Methods

We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation.

Results

We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame.

Conclusion

Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial–temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.

Vorheriger Artikel Plastic hexahedral FEM for surgical simulation

Nächster Artikel SIG-Former: monocular surgical instruction generation with transformers

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S (2017) Surgical data science for next-generation interventions. Nature Biomedical Engineering

Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy & Allied Technol 28(2):82–90CrossRef

Maier-Hein L, Eisenmann M, Sarikaya D, März K, Collins T, Malpani A, Fallert J, Feussner H, Giannarou S, Mascagni P (2022) Surgical data science-from concepts toward clinical translation. Med image anal 76:102306CrossRefPubMed

Rivoir D, Bodenstedt S, Funke I, Bechtolsheim Fv, Distler M, Weitz J, Speidel S (2020) Rethinking anticipation tasks: Uncertainty-aware anticipation of sparse surgical instrument usage for context-aware assistance. In: MICCAI, pp 752–762. Springer

Yuan K, Holden M, Gao S, Lee W-S (2021) Surgical workflow anticipation using instrument interaction. In: MICCAI, pp 615–625. Springer

Forestier G, Riffaud L, Jannin P (2015) Automatic phase prediction from low-level surgical activities. IJCARS 10(6):833–841

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2017) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE TMI 36(1):86–97

Lalys F, Bouget D, Riffaud L, Jannin P (2013) Automatic knowledge-based recognition of low-level tasks in ophthalmological procedures. IJCARS 8(1):39–49

Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C-W, Heng P-A (2018) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE TMI 37(5):1114–1126

10.

Yi F, Jiang T (2019) Hard frame detection and online mapping for surgical phase recognition. In: MICCAI

11.

Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE TMI 38(4):1069–1078

12.

Funke I, Bodenstedt S, Oehme F, von Bechtolsheim F, Weitz J, Speidel S (2019) Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: MICCAI

13.

Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR, pp 156–165

14.

Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: MICCAI

15.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

16.

Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A (2020) A survey on visual transformer. arXiv preprint arXiv:2012.12556

17.

Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: ECCV, pp 214–229. Springer

18.

Wang Y, Solomon JM (2019) Deep closest point: Learning representations for point cloud registration. In: CVPR, pp 3523–3532

19.

Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C-W, Heng P-A (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRefPubMed

20.

Zhang J, Nie Y, Lyu Y, Yang X, Chang J, Zhang JJ (2021) Sd-net: joint surgical gesture recognition and skill assessment. IJCARS 16(10):1675–1682

21.

Franke S, Neumuth T (2015) Adaptive surgical process models for prediction of surgical work steps from surgical low-level activities. In: 6th Workshop on M2CAI at MICCAI

22.

Gao X, Jin Y, Long Y, Dou Q, Heng P-A (2021) Trans-svnet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: MICCAI, pp 593–603. Springer

23.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

24.

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N MICCAI M2CAI Challenge. http://camma.u-strasbg.fr/m2cai2016/

25.

Al Hajj H, Lamard M, Conze P-H, Roychowdhury S, Hu X, Maršalkaitė G, Zisimopoulos O (2019) Cataracts: Challenge on automatic tool annotation for cataract surgery. Med image anal 52:24–41CrossRefPubMed

Titel: Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis
verfasst von: Yueming Jin
Yonghao Long
Xiaojie Gao
Danail Stoyanov
Qi Dou
Pheng-Ann Heng
Publikationsdatum: 21.09.2022
Verlag: Springer International Publishing
Erschienen in: International Journal of Computer Assisted Radiology and Surgery / Ausgabe 12/2022
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI: https://doi.org/10.1007/s11548-022-02743-8

Springer Professional

Abstract

Purpose

Methods

Results

Conclusion

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 12/2022

Organ-based estimation and minimization of clinician’s X-ray dose

Two-dimensional C-arm robotic navigation system (i-Navi) in spine surgery: a pilot study

Electromagnetic tool for the endoscopic creation of colon anastomoses—development and feasibility assessment of a novel anastomosis compression implant approach

Automated identification of critical structures in laparoscopic cholecystectomy

Uncertainty estimation for margin detection in cancer surgery using mass spectrometry

Feature pyramid self-attention network for respiratory motion prediction in ultrasound image guided surgery

Premium Partner