Skip to main content
Erschienen in: International Journal of Computer Assisted Radiology and Surgery 9/2015

01.09.2015 | Original Article

Classification approach for automatic laparoscopic video database organization

verfasst von: Andru Putra Twinanda, Jacques Marescaux, Michel de Mathelin, Nicolas Padoy

Erschienen in: International Journal of Computer Assisted Radiology and Surgery | Ausgabe 9/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Purpose

One of the advantages of minimally invasive surgery (MIS) is that the underlying digitization provides invaluable information regarding the execution of procedures in various patient-specific conditions. However, such information can only be obtained conveniently if the laparoscopic video database comes with semantic annotations, which are typically provided manually by experts. Considering the growing popularity of MIS, manual annotation becomes a laborious and costly task. In this paper, we tackle the problem of laparoscopic video classification, which consists of automatically identifying the type of abdominal surgery performed in a video. In addition to performing classifications on the full recordings of the procedures, we also carry out sub-video and video clip classifications. These classifications are carried out to investigate how many frames from a video are needed to get a good classification performance and which parts of the procedures contain more discriminative features.

Method

Our classification pipeline is as follows. First, we reject the irrelevant frames from the videos using the color properties of the video frames. Second, we extract visual features from the relevant frames. Third, we quantize the features using several feature encoding methods, i.e., vector quantization, sparse coding (SC), and Fisher encoding. Fourth, we carry out the classification using support vector machines. While the sub-video classification is carried out by uniformly downsampling the video frames, the video clip classification is carried out by taking three parts of the videos (i.e., beginning, middle, and end) and running the classification pipeline separately for every video part. Ultimately, we build our final classification model by combining the features using a multiple kernel learning (MKL) approach.

Results

To carry out the experiments, we use a dataset containing 208 videos of eight different surgeries performed by 10 different surgeons. The results show that SC with \(K\)-singular value decomposition (K-SVD) yields the best classification accuracy. The results also demonstrate that the classification accuracy only decreases by 3 % when solely 60 % of the video frames are utilized. Furthermore, it is also shown that the end part of the procedures is the most discriminative part of the surgery. Specifically, by using only the last 20 % of the video frames, a classification accuracy greater than 70 % can be achieved. Finally, the combination of all features yields the best performance of 90.38 % accuracy.

Conclusions

The SC with K-SVD provides the best representation of our videos, yielding the best accuracies for all features. In terms of information, the end part of the laparoscopic videos is the most discriminative compared to the other parts of the videos. In addition to their good performance individually, the features yield even better classification results when all of them are combined using the MKL approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
IRCAD stands for Institut de Recherche contre les Cancers de l’Appareil Digestif.
 
Literatur
1.
Zurück zum Zitat Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. Signal Process IEEE Trans 54(11):4311–4322CrossRef Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. Signal Process IEEE Trans 54(11):4311–4322CrossRef
2.
Zurück zum Zitat Akata Z, Perronnin F, Harchaoui Z, Schmid C (2014) Good practice in large-scale learning for image classification. IEEE Trans Pattern Anal Mach Intell 36(3):507–520CrossRefPubMed Akata Z, Perronnin F, Harchaoui Z, Schmid C (2014) Good practice in large-scale learning for image classification. IEEE Trans Pattern Anal Mach Intell 36(3):507–520CrossRefPubMed
3.
Zurück zum Zitat Allan M, Thompson S, Clarkson MJ, Ourselin S, Hawkes D, Kelly J, Stoyanov D (2014) 2d-3d pose tracking of rigid instruments in minimally invasive surgery. In: IPCAI, Springer International Publishing, pp 1–10 Allan M, Thompson S, Clarkson MJ, Ourselin S, Hawkes D, Kelly J, Stoyanov D (2014) 2d-3d pose tracking of rigid instruments in minimally invasive surgery. In: IPCAI, Springer International Publishing, pp 1–10
4.
Zurück zum Zitat Atasoy S, Mateus D, Meining A, Yang GZ, Navab N (2012) Endoscopic video manifolds for targeted optical biopsy. IEEE Trans Med Imaging 31(3):637–653CrossRefPubMed Atasoy S, Mateus D, Meining A, Yang GZ, Navab N (2012) Endoscopic video manifolds for targeted optical biopsy. IEEE Trans Med Imaging 31(3):637–653CrossRefPubMed
5.
Zurück zum Zitat Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: In ECCV, pp 404–417 Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: In ECCV, pp 404–417
6.
Zurück zum Zitat Blum T, Feussner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI (3), pp 400–407 Blum T, Feussner H, Navab N (2010) Modeling and segmentation of surgical workflow from laparoscopic video. In: MICCAI (3), pp 400–407
7.
Zurück zum Zitat Cabras P, Goyard D, Nageotte F, Zanne P, Doignon C (2014) Comparison of methods for estimating the position of actuated instruments in flexible endoscopic surgery. In: IROS, pp 3522–3528 Cabras P, Goyard D, Nageotte F, Zanne P, Doignon C (2014) Comparison of methods for estimating the position of actuated instruments in flexible endoscopic surgery. In: IROS, pp 3522–3528
8.
Zurück zum Zitat Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVA, pp 76.1–76.12 Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVA, pp 76.1–76.12
9.
Zurück zum Zitat Chu WS, Zhou F, De la Torre F (2012) Unsupervised temporal commonality discovery. In: ECCV Chu WS, Zhou F, De la Torre F (2012) Unsupervised temporal commonality discovery. In: ECCV
10.
Zurück zum Zitat Coates A, Ng A (2011) The importance of encoding versus training with sparse coding and vector quantization. In: ICML, pp 921–928 Coates A, Ng A (2011) The importance of encoding versus training with sparse coding and vector quantization. In: ICML, pp 921–928
11.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, pp 886–893
12.
Zurück zum Zitat Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: VS-PETS
13.
Zurück zum Zitat Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of AVC, pp 23.1–23.6 Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of AVC, pp 23.1–23.6
14.
Zurück zum Zitat Lalys F, Riffaud L, Bouget D, Jannin P (2012) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976PubMedCentralCrossRefPubMed Lalys F, Riffaud L, Bouget D, Jannin P (2012) A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 59(4):966–976PubMedCentralCrossRefPubMed
15.
Zurück zum Zitat Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR
16.
Zurück zum Zitat Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110CrossRef Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110CrossRef
17.
Zurück zum Zitat Muenzer B, Schoeffmann K, Boszormenyi L (2013) Relevance segmentation of laparoscopic videos. In: IEEE International Symposium on Multimedia, pp 84–91 Muenzer B, Schoeffmann K, Boszormenyi L (2013) Relevance segmentation of laparoscopic videos. In: IEEE International Symposium on Multimedia, pp 84–91
18.
Zurück zum Zitat Padoy N, Mateus D, Weinland D, Berger MO, Navab N (2009) Workflow monitoring based on 3D motion features. In: Workshop on video-oriented object and event classification in conjunction with ICCV 2009, pp 585–592 Padoy N, Mateus D, Weinland D, Berger MO, Navab N (2009) Workflow monitoring based on 3D motion features. In: Workshop on video-oriented object and event classification in conjunction with ICCV 2009, pp 585–592
19.
Zurück zum Zitat Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, pp 143–156 Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: ECCV, pp 143–156
20.
Zurück zum Zitat Reiter A, Allen PK, Zhao T (2012) Feature classification for tracking articulated surgical tools. In: MICCAI, vol 7511, pp 592–600 Reiter A, Allen PK, Zhao T (2012) Feature classification for tracking articulated surgical tools. In: MICCAI, vol 7511, pp 592–600
21.
Zurück zum Zitat Twinanda AP, Marescaux J, Mathelin MD, Padoy N (2014a) Towards better laparoscopic video database organization by automatic surgery classification. In: IPCAI, pp 186–194 Twinanda AP, Marescaux J, Mathelin MD, Padoy N (2014a) Towards better laparoscopic video database organization by automatic surgery classification. In: IPCAI, pp 186–194
22.
Zurück zum Zitat Twinanda AP, Mathelin MD, Padoy N (2014b) Fisher kernel based task boundary retrieval in laparoscopic database with single video query. In: MICCAI, pp 409–416 Twinanda AP, Mathelin MD, Padoy N (2014b) Fisher kernel based task boundary retrieval in laparoscopic database with single video query. In: MICCAI, pp 409–416
23.
Zurück zum Zitat Varma M, Babu RB (2009) More generality in efficient multiple kernel learning. In: ICML, ACM, pp 1065–1072 Varma M, Babu RB (2009) More generality in efficient multiple kernel learning. In: ICML, ACM, pp 1065–1072
24.
Zurück zum Zitat Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: ICM, ACM, pp 1469–1472 Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: ICM, ACM, pp 1469–1472
25.
Zurück zum Zitat Xia L, Aggarwal J (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR Xia L, Aggarwal J (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: CVPR
26.
Zurück zum Zitat Zappella L, Bejar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732–745CrossRefPubMed Zappella L, Bejar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732–745CrossRefPubMed
Metadaten
Titel
Classification approach for automatic laparoscopic video database organization
verfasst von
Andru Putra Twinanda
Jacques Marescaux
Michel de Mathelin
Nicolas Padoy
Publikationsdatum
01.09.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Computer Assisted Radiology and Surgery / Ausgabe 9/2015
Print ISSN: 1861-6410
Elektronische ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-015-1183-4

Weitere Artikel der Ausgabe 9/2015

International Journal of Computer Assisted Radiology and Surgery 9/2015 Zur Ausgabe