Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 9/2020

25.06.2020 | Original Article

LRTD: long-range temporal dependency based active learning for surgical workflow recognition

verfasst von: Xueying Shi, Yueming Jin, Qi Dou, Pheng-Ann Heng

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Purpose

Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations.

Methods

In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training.

Results

We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training.

Conclusion

By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Transactions on Biomedical Engineering 64(9):2025–2041PubMedCrossRef Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Transactions on Biomedical Engineering 64(9):2025–2041PubMedCrossRef
2.
Zurück zum Zitat Bodenstedt S, Rivoir D, Jenke A, Wagner M, Breucha M, Müller-Stich B, Mees ST, Weitz J, Speidel S (2019) Active learning using deep Bayesian networks for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery 14(6):1079–1087PubMedCrossRef Bodenstedt S, Rivoir D, Jenke A, Wagner M, Breucha M, Müller-Stich B, Mees ST, Weitz J, Speidel S (2019) Active learning using deep Bayesian networks for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery 14(6):1079–1087PubMedCrossRef
3.
Zurück zum Zitat Bodenstedt S, Wagner M, Katić D, Mietkowski P, Mayer B, Kenngott H, Müller-Stich B, Dillmann R, Speidel S (2017) Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis. arXiv preprint arXiv:1702.03684 Bodenstedt S, Wagner M, Katić D, Mietkowski P, Mayer B, Kenngott H, Müller-Stich B, Dillmann R, Speidel S (2017) Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis. arXiv preprint arXiv:​1702.​03684
4.
Zurück zum Zitat Bouget D, Allan M, Stoyanov D, Jannin P (2017) Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Medical Image Analysis 35:633–654PubMedCrossRef Bouget D, Allan M, Stoyanov D, Jannin P (2017) Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Medical Image Analysis 35:633–654PubMedCrossRef
5.
Zurück zum Zitat Bouget D, Benenson R, Omran M, Riffaud L, Schiele B, Jannin P (2015) Detecting surgical tools by modelling local appearance and global shape. IEEE Transactions on Medical Imaging 34(12):2603–2617PubMedCrossRef Bouget D, Benenson R, Omran M, Riffaud L, Schiele B, Jannin P (2015) Detecting surgical tools by modelling local appearance and global shape. IEEE Transactions on Medical Imaging 34(12):2603–2617PubMedCrossRef
6.
Zurück zum Zitat Bricon-Souf N, Newman CR (2007) Context awareness in health care: A review. International Journal of Medical Informatics 76(1):2–12PubMedCrossRef Bricon-Souf N, Newman CR (2007) Context awareness in health care: A review. International Journal of Medical Informatics 76(1):2–12PubMedCrossRef
7.
Zurück zum Zitat Cleary K, Kinsella A (2005) OR 2020: the operating room of the future. Journal of laparoscopic & advanced surgical techniques. Part A 15(5):495–497CrossRef Cleary K, Kinsella A (2005) OR 2020: the operating room of the future. Journal of laparoscopic & advanced surgical techniques. Part A 15(5):495–497CrossRef
8.
Zurück zum Zitat Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. International Journal of Computer Assisted Radiology and Surgery 11(6):1081–1089PubMedCrossRef Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P (2016) Automatic data-driven real-time segmentation and recognition of surgical workflow. International Journal of Computer Assisted Radiology and Surgery 11(6):1081–1089PubMedCrossRef
9.
Zurück zum Zitat Doersch C, Zisserman A (2017) Multi-task self-supervised visual learning. In IEEE International Conference on Computer Vision, pp. 2051–2060 Doersch C, Zisserman A (2017) Multi-task self-supervised visual learning. In IEEE International Conference on Computer Vision, pp. 2051–2060
10.
Zurück zum Zitat Forestier G, Riffaud L, Jannin P (2015) Automatic phase prediction from low-level surgical activities. International Journal of Computer Assisted Radiology and Surgery 10(6):833–841PubMedCrossRef Forestier G, Riffaud L, Jannin P (2015) Automatic phase prediction from low-level surgical activities. International Journal of Computer Assisted Radiology and Surgery 10(6):833–841PubMedCrossRef
11.
Zurück zum Zitat Funke I, Jenke A, Mees ST, Weitz J, Speidel S, Bodenstedt S (2018) Temporal coherence-based self-supervised learning for laparoscopic workflow analysis. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis, Springer, pp. 85–93 Funke I, Jenke A, Mees ST, Weitz J, Speidel S, Bodenstedt S (2018) Temporal coherence-based self-supervised learning for laparoscopic workflow analysis. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis, Springer, pp. 85–93
12.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
13.
Zurück zum Zitat James A, Vieira D, Lo B, Darzi A, Yang G-Z (2007) Eye-gaze driven surgical workflow segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 110–117 James A, Vieira D, Lo B, Darzi A, Yang G-Z (2007) Eye-gaze driven surgical workflow segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 110–117
14.
Zurück zum Zitat Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C-W, Heng P-A (2017) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Transactions on Medical Imaging 37(5):1114–1126CrossRef Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu C-W, Heng P-A (2017) SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Transactions on Medical Imaging 37(5):1114–1126CrossRef
15.
Zurück zum Zitat Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C-W, Heng P-A (2019) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Medical Image Analysis, page 101572 Jin Y, Li H, Dou Q, Chen H, Qin J, Fu C-W, Heng P-A (2019) Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Medical Image Analysis, page 101572
16.
Zurück zum Zitat Mahapatra D, Bozorgtabar B, Thiran J-P, Reyes M (2018) Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 580–588 Mahapatra D, Bozorgtabar B, Thiran J-P, Reyes M (2018) Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 580–588
17.
Zurück zum Zitat Quellec G, Charrière K, Lamard M, Droueche Z, Roux C, Cochener B, Cazuguel G (2014) Real-time recognition of surgical tasks in eye surgery videos. Medical Image Analysis 18(3):579–590PubMedCrossRef Quellec G, Charrière K, Lamard M, Droueche Z, Roux C, Cochener B, Cazuguel G (2014) Real-time recognition of surgical tasks in eye surgery videos. Medical Image Analysis 18(3):579–590PubMedCrossRef
18.
Zurück zum Zitat Ross T, Zimmerer D, Vemuri A, Isensee F, Wiesenfarth M, Bodenstedt S, Both F, Kessler P, Wagner M, Müller B, Kenngott H, Speidel S, Kopp-Schneider A, Maier-Hein K, Maier-Hein L (2018) Exploiting the potential of unlabeled endoscopic video data with self-supervised learning. International Journal of Computer Assisted Radiology and Surgery 13(6):925–933PubMedCrossRef Ross T, Zimmerer D, Vemuri A, Isensee F, Wiesenfarth M, Bodenstedt S, Both F, Kessler P, Wagner M, Müller B, Kenngott H, Speidel S, Kopp-Schneider A, Maier-Hein K, Maier-Hein L (2018) Exploiting the potential of unlabeled endoscopic video data with self-supervised learning. International Journal of Computer Assisted Radiology and Surgery 13(6):925–933PubMedCrossRef
19.
Zurück zum Zitat Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison
20.
Zurück zum Zitat Shi X, Dou Q, Xue C, Qin J, Chen H, Heng P-A (2019) An active learning approach for reducing annotation cost in skin lesion analysis. In International Workshop on Machine Learning in Medical Imaging, Springer, pp. 628–636 Shi X, Dou Q, Xue C, Qin J, Chen H, Heng P-A (2019) An active learning approach for reducing annotation cost in skin lesion analysis. In International Workshop on Machine Learning in Medical Imaging, Springer, pp. 628–636
21.
Zurück zum Zitat Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Transactions on Medical Imaging 36(1):86–97PubMedCrossRef Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Transactions on Medical Imaging 36(1):86–97PubMedCrossRef
22.
Zurück zum Zitat Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803
23.
Zurück zum Zitat Yang L, Zhang Y, Chen J, Zhang S, Chen DZ (2017) Suggestive annotation: A deep active learning framework for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 399–407 Yang L, Zhang Y, Chen J, Zhang S, Chen DZ (2017) Suggestive annotation: A deep active learning framework for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 399–407
24.
Zurück zum Zitat Yengera G, Mutter D, Marescaux J, Padoy N (2018) Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 Yengera G, Mutter D, Marescaux J, Padoy N (2018) Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:​1805.​08569
25.
Zurück zum Zitat Yu T, Mutter D, Marescaux J, Padoy N (2018) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv preprint arXiv:1812.00033 Yu T, Mutter D, Marescaux J, Padoy N (2018) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv preprint arXiv:​1812.​00033
26.
Zurück zum Zitat Zappella L, Béjar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Medical Image Analysis 17(7):732–745PubMedCrossRef Zappella L, Béjar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Medical Image Analysis 17(7):732–745PubMedCrossRef
27.
Zurück zum Zitat Zheng H, Yang L, Chen J, Han J, Zhang Y, Liang P, Zhao Z, Wang C, Chen DZ (2019) Biomedical image segmentation via representative annotation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 5901–5908 Zheng H, Yang L, Chen J, Han J, Zhang Y, Liang P, Zhao Z, Wang C, Chen DZ (2019) Biomedical image segmentation via representative annotation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 5901–5908
28.
Zurück zum Zitat Zhou Z, Shin JY, Zhang L, Gurudu SR, Gotway MB, Liang J, Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In IEEE Conference on Computer Vision and Pattern Recognition Zhou Z, Shin JY, Zhang L, Gurudu SR, Gotway MB, Liang J, Fine-tuning convolutional neural networks for biomedical image analysis: Actively and incrementally. In IEEE Conference on Computer Vision and Pattern Recognition
Metadaten
Titel
LRTD: long-range temporal dependency based active learning for surgical workflow recognition
verfasst von
Xueying Shi
Yueming Jin
Qi Dou
Pheng-Ann Heng
Publikationsdatum
25.06.2020
Verlag
Springer International Publishing
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 9/2020
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-020-02198-9

Weitere Artikel der Ausgabe 9/2020

International Journal of Computer Assisted Radiology and Surgery 9/2020 Zur Ausgabe

Premium Partner