Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 12/2022

21.09.2022 | Original Article

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis

verfasst von: Yueming Jin, Yonghao Long, Xiaojie Gao, Danail Stoyanov, Qi Dou, Pheng-Ann Heng

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 12/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Purpose

Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial–temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation.

Methods

We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation.

Results

We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame.

Conclusion

Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial–temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S (2017) Surgical data science for next-generation interventions. Nature Biomedical Engineering Maier-Hein L, Vedula SS, Speidel S, Navab N, Kikinis R, Park A, Eisenmann M, Feussner H, Forestier G, Giannarou S (2017) Surgical data science for next-generation interventions. Nature Biomedical Engineering
2.
Zurück zum Zitat Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy & Allied Technol 28(2):82–90CrossRef Padoy N (2019) Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy & Allied Technol 28(2):82–90CrossRef
3.
Zurück zum Zitat Maier-Hein L, Eisenmann M, Sarikaya D, März K, Collins T, Malpani A, Fallert J, Feussner H, Giannarou S, Mascagni P (2022) Surgical data science-from concepts toward clinical translation. Med image anal 76:102306CrossRefPubMed Maier-Hein L, Eisenmann M, Sarikaya D, März K, Collins T, Malpani A, Fallert J, Feussner H, Giannarou S, Mascagni P (2022) Surgical data science-from concepts toward clinical translation. Med image anal 76:102306CrossRefPubMed
4.
Zurück zum Zitat Rivoir D, Bodenstedt S, Funke I, Bechtolsheim Fv, Distler M, Weitz J, Speidel S (2020) Rethinking anticipation tasks: Uncertainty-aware anticipation of sparse surgical instrument usage for context-aware assistance. In: MICCAI, pp 752–762. Springer Rivoir D, Bodenstedt S, Funke I, Bechtolsheim Fv, Distler M, Weitz J, Speidel S (2020) Rethinking anticipation tasks: Uncertainty-aware anticipation of sparse surgical instrument usage for context-aware assistance. In: MICCAI, pp 752–762. Springer
5.
Zurück zum Zitat Yuan K, Holden M, Gao S, Lee W-S (2021) Surgical workflow anticipation using instrument interaction. In: MICCAI, pp 615–625. Springer Yuan K, Holden M, Gao S, Lee W-S (2021) Surgical workflow anticipation using instrument interaction. In: MICCAI, pp 615–625. Springer
6.
Zurück zum Zitat Forestier G, Riffaud L, Jannin P (2015) Automatic phase prediction from low-level surgical activities. IJCARS 10(6):833–841 Forestier G, Riffaud L, Jannin P (2015) Automatic phase prediction from low-level surgical activities. IJCARS 10(6):833–841
7.
Zurück zum Zitat Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2017) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE TMI 36(1):86–97 Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2017) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE TMI 36(1):86–97
8.
Zurück zum Zitat Lalys F, Bouget D, Riffaud L, Jannin P (2013) Automatic knowledge-based recognition of low-level tasks in ophthalmological procedures. IJCARS 8(1):39–49 Lalys F, Bouget D, Riffaud L, Jannin P (2013) Automatic knowledge-based recognition of low-level tasks in ophthalmological procedures. IJCARS 8(1):39–49
9.
Zurück zum Zitat Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C-W, Heng P-A (2018) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE TMI 37(5):1114–1126 Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C-W, Heng P-A (2018) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE TMI 37(5):1114–1126
10.
Zurück zum Zitat Yi F, Jiang T (2019) Hard frame detection and online mapping for surgical phase recognition. In: MICCAI Yi F, Jiang T (2019) Hard frame detection and online mapping for surgical phase recognition. In: MICCAI
11.
Zurück zum Zitat Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE TMI 38(4):1069–1078 Twinanda AP, Yengera G, Mutter D, Marescaux J, Padoy N (2018) Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. IEEE TMI 38(4):1069–1078
12.
Zurück zum Zitat Funke I, Bodenstedt S, Oehme F, von Bechtolsheim F, Weitz J, Speidel S (2019) Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: MICCAI Funke I, Bodenstedt S, Oehme F, von Bechtolsheim F, Weitz J, Speidel S (2019) Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: MICCAI
13.
Zurück zum Zitat Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR, pp 156–165 Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR, pp 156–165
14.
Zurück zum Zitat Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: MICCAI Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: MICCAI
15.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008
16.
Zurück zum Zitat Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A (2020) A survey on visual transformer. arXiv preprint arXiv:2012.12556 Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A (2020) A survey on visual transformer. arXiv preprint arXiv:​2012.​12556
17.
Zurück zum Zitat Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: ECCV, pp 214–229. Springer Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: ECCV, pp 214–229. Springer
18.
Zurück zum Zitat Wang Y, Solomon JM (2019) Deep closest point: Learning representations for point cloud registration. In: CVPR, pp 3523–3532 Wang Y, Solomon JM (2019) Deep closest point: Learning representations for point cloud registration. In: CVPR, pp 3523–3532
19.
Zurück zum Zitat Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C-W, Heng P-A (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRefPubMed Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C-W, Heng P-A (2020) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 59:101572CrossRefPubMed
20.
Zurück zum Zitat Zhang J, Nie Y, Lyu Y, Yang X, Chang J, Zhang JJ (2021) Sd-net: joint surgical gesture recognition and skill assessment. IJCARS 16(10):1675–1682 Zhang J, Nie Y, Lyu Y, Yang X, Chang J, Zhang JJ (2021) Sd-net: joint surgical gesture recognition and skill assessment. IJCARS 16(10):1675–1682
21.
Zurück zum Zitat Franke S, Neumuth T (2015) Adaptive surgical process models for prediction of surgical work steps from surgical low-level activities. In: 6th Workshop on M2CAI at MICCAI Franke S, Neumuth T (2015) Adaptive surgical process models for prediction of surgical work steps from surgical low-level activities. In: 6th Workshop on M2CAI at MICCAI
22.
Zurück zum Zitat Gao X, Jin Y, Long Y, Dou Q, Heng P-A (2021) Trans-svnet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: MICCAI, pp 593–603. Springer Gao X, Jin Y, Long Y, Dou Q, Heng P-A (2021) Trans-svnet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: MICCAI, pp 593–603. Springer
23.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
25.
Zurück zum Zitat Al Hajj H, Lamard M, Conze P-H, Roychowdhury S, Hu X, Maršalkaitė G, Zisimopoulos O (2019) Cataracts: Challenge on automatic tool annotation for cataract surgery. Med image anal 52:24–41CrossRefPubMed Al Hajj H, Lamard M, Conze P-H, Roychowdhury S, Hu X, Maršalkaitė G, Zisimopoulos O (2019) Cataracts: Challenge on automatic tool annotation for cataract surgery. Med image anal 52:24–41CrossRefPubMed
Metadaten
Titel
Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis
verfasst von
Yueming Jin
Yonghao Long
Xiaojie Gao
Danail Stoyanov
Qi Dou
Pheng-Ann Heng
Publikationsdatum
21.09.2022
Verlag
Springer International Publishing
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 12/2022
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-022-02743-8

Weitere Artikel der Ausgabe 12/2022

International Journal of Computer Assisted Radiology and Surgery 12/2022 Zur Ausgabe

Premium Partner