Skip to main content

2021 | OriginalPaper | Buchkapitel

Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

verfasst von : Xiaojie Gao, Yueming Jin, Yonghao Long, Qi Dou, Pheng-Ann Heng

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Real-time surgical phase recognition is a fundamental task in modern operating rooms. Previous works tackle this task relying on architectures arranged in spatio-temporal order, however, the supportive benefits of intermediate spatial features are not considered. In this paper, we introduce, for the first time in surgical workflow analysis, Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate surgical phase recognition. Our hybrid embedding aggregation Transformer fuses cleverly designed spatial and temporal embeddings by allowing for active queries based on spatial information from temporal embedding sequences. More importantly, our framework processes the hybrid embeddings in parallel to achieve a high inference speed. Our method is thoroughly validated on two large surgical video datasets, i.e., Cholec80 and M2CAI16 Challenge datasets, and outperforms the state-of-the-art approaches at a processing speed of 91 fps.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
Zero padding is applied if necessary.
 
Literatur
2.
Zurück zum Zitat Bricon-Souf, N., Newman, C.R.: Context awareness in health care: a review. Int. J. Med. Informatics 76(1), 2–12 (2007)CrossRef Bricon-Souf, N., Newman, C.R.: Context awareness in health care: a review. Int. J. Med. Informatics 76(1), 2–12 (2007)CrossRef
6.
Zurück zum Zitat Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019) Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)
8.
Zurück zum Zitat Gao, X., Jin, Y., Dou, Q., Heng, P.A.: Automatic gesture recognition in robot-assisted surgery with reinforcement learning and tree search. In: IEEE International Conference on Robotics and Automation, pp. 8440–8446. IEEE (2020) Gao, X., Jin, Y., Dou, Q., Heng, P.A.: Automatic gesture recognition in robot-assisted surgery with reinforcement learning and tree search. In: IEEE International Conference on Robotics and Automation, pp. 8440–8446. IEEE (2020)
10.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
11.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
13.
Zurück zum Zitat Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2018)CrossRef Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2018)CrossRef
14.
Zurück zum Zitat Jin, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020) Jin, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)
15.
Zurück zum Zitat Jin, Y., Long, Y., Chen, C., Zhao, Z., Dou, Q., Heng, P.A.: Temporal memory relation network for workflow recognition from surgical video. IEEE Trans. Med. Imaging (2021) Jin, Y., Long, Y., Chen, C., Zhao, Z., Dou, Q., Heng, P.A.: Temporal memory relation network for workflow recognition from surgical video. IEEE Trans. Med. Imaging (2021)
16.
Zurück zum Zitat Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021) Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. arXiv preprint arXiv:​2101.​01169 (2021)
17.
Zurück zum Zitat Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 156–165 (2017) Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 156–165 (2017)
19.
Zurück zum Zitat Maier-Hein, L., et al.: Surgical data science for next-generation interventions. Nat. Biomed. Eng. (2017) Maier-Hein, L., et al.: Surgical data science for next-generation interventions. Nat. Biomed. Eng. (2017)
20.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
21.
Zurück zum Zitat Padoy, N.: Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy Allied Technol. 28(2), 82–90 (2019)CrossRef Padoy, N.: Machine and deep learning for workflow recognition during surgery. Minimally Invasive Therapy Allied Technol. 28(2), 82–90 (2019)CrossRef
22.
Zurück zum Zitat Padoy, N., Blum, T., Feussner, H., Berger, M.O., Navab, N.: On-line recognition of surgical activity for monitoring in the operating room. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1718–1724 (2008) Padoy, N., Blum, T., Feussner, H., Berger, M.O., Navab, N.: On-line recognition of surgical activity for monitoring in the operating room. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1718–1724 (2008)
23.
Zurück zum Zitat Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans. Med. Imaging 33(12), 2352–2360 (2014)CrossRef Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE Trans. Med. Imaging 33(12), 2352–2360 (2014)CrossRef
24.
Zurück zum Zitat Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for surgical workflow challenge at M2CAI 2016. arXiv preprint arXiv:1610.08844 (2016) Twinanda, A.P., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: Single-and multi-task architectures for surgical workflow challenge at M2CAI 2016. arXiv preprint arXiv:​1610.​08844 (2016)
26.
Zurück zum Zitat Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)CrossRef Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)CrossRef
27.
Zurück zum Zitat Twinanda, A.P.: Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos. Ph.D. thesis, Strasbourg (2017) Twinanda, A.P.: Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos. Ph.D. thesis, Strasbourg (2017)
28.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
29.
Zurück zum Zitat Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3523–3532 (2019) Wang, Y., Solomon, J.M.: Deep closest point: learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3523–3532 (2019)
Metadaten
Titel
Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer
verfasst von
Xiaojie Gao
Yueming Jin
Yonghao Long
Qi Dou
Pheng-Ann Heng
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-87202-1_57

Premium Partner