Elsevier

Medical Image Analysis

Volume 35, January 2017, Pages 633-654
Medical Image Analysis

Survey Paper
Vision-based and marker-less surgical tool detection and tracking: a review of the literature

https://doi.org/10.1016/j.media.2016.09.003Get rights and content

Highlights

  • In-depth state-of-the-art review of surgical tool detection from 24 recent papers.

  • Lack of a standard format regarding datasets employed and corresponding annotations.

  • Comprehensive highlighting of advantages and disadvantages of existing methods.

  • Limited consensus on a common reference format within validation methodologies.

Abstract

In recent years, tremendous progress has been made in surgical practice for example with Minimally Invasive Surgery (MIS). To overcome challenges coming from deported eye-to-hand manipulation, robotic and computer-assisted systems have been developed. Having real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy is a key ingredient for such systems. In this paper, we present a review of the literature dealing with vision-based and marker-less surgical tool detection. This paper includes three primary contributions: (1) identification and analysis of data-sets used for developing and testing detection algorithms, (2) in-depth comparison of surgical tool detection methods from the feature extraction process to the model learning strategy and highlight existing shortcomings, and (3) analysis of validation techniques employed to obtain detection performance results and establish comparison between surgical tool detectors. The papers included in the review were selected through PubMed and Google Scholar searches using the keywords: “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking” limiting results to the year range 2000 2015. Our study shows that despite significant progress over the years, the lack of established surgical tool data-sets, and reference format for performance assessment and method ranking is preventing faster improvement.

Introduction

Technological advances have had a considerable impact on modern surgical practice. In particular, the miniaturisation of surgical instruments and advanced instrument design to enable dexterous tissue manipulation have been key drivers behind reducing surgical trauma and giving rise to Minimally Invasive Surgery (MIS) (Dogangil, Davies, Rodriguez, 2010, Davis, 2000, Cleary, Nguyen, 2001). In MIS the surgeon accesses the surgical site through trocar ports, illumination is delivered via optical fibres or light-emitting diodes (LED) and the anatomy is observed through a digital video signal either from a CMOS sensor in the body or an external camera connected to a series of lenses integrated in a laparoscope. By reducing the access incisions and trauma caused by surgery, MIS has led to significant patient benefits and is likely to continue to be one of the most important criteria to the evolution of surgical techniques (Darzi and Mackay, 2002). Specialized surgical instruments are required in MIS to give the surgeon the ability to manipulate the internal anatomy, dissect, ablate and suture tissues. Most recently, such instruments have become robotics manipulators. Mastering the control and use of MIS tools and techniques takes significant training and requires the acquisition of new skills compared to open surgical approaches (Van der Meijden and Schijven, 2009). The MIS instruments deliver a reduced sense of touch from the surgical site, the endoscopic camera restricts the field-of-view (FoV) and localisation (Baumhauer et al., 2008), and the normal hand-motor axis is augmented. As well as impacting the operating surgeon, the introduction of new equipment and devices enabling MIS within the operating theatre means that the whole clinical team must be trained and qualified to operate within the augmented environment in order to avoid preventable adverse events (Kohn et al., 2000). This can have complex implications on clinical training periods and costs, the management of clinical facilities, and ultimately to patient outcomes.

To overcome some of these challenges, computer-assisted intervention (CAI) systems attempt to make effective use of pre-operative and intra-operative patient specific information from different sources, sensors and imaging modalities and to enhance the workflow, ergonomics, control and navigation capabilities during surgery (Mirota, Ishii, Hager, 2011, Stoyanov, 2012). A common requirement and difficult practical challenge for CAI systems is to have real-time knowledge of the pose of the surgical tools with respect to the anatomy and any imaging information. Different approaches for instrumental localisation have been investigated including electro-magnetic (EM) (Lahanas, Loukas, Georgiou, 2015, Fried, Kleefield, Gopal, Reardon, Ho, Kuhn, 1997) and optical tracking (Elfring et al., 2010), robot kinematics (Reiter et al., 2012b) and image-based tracking in endoscopic images, ultrasound (US) (Hu et al., 2009) and fluoroscopy (Weese et al., 1997). Image-based approaches are highly attractive because they do not require modification to the instrument design or the operating theatre and they can provide positional and motion information directly within the coordinate frame of the images used by the surgeon to operate. A major challenge for image-based techniques is robustness and in particular to the diverse range of surgical specialisations and conditions that may affect image quality and visibility. With this paper we review the current state-of-the-art in image-based and marker-less surgical instrument detection and tracking, focusing on the aspects of prior work. Our major contributions are threefold:

  • Summarising the different datasets that are available within the community as well as cohesion and convergence towards a common set of annotations following a standard format;

  • Algorithmic review highlighting the various advantages and disadvantages of each method. There are currently no comprehensive reviews on surgical instrument detection which hinders new researchers from learning about the field and additionally prevents cross-pollination of ideas between research groups;

  • Analysing the validation methodologies that have been used to produce detection results because currently there is limited consensus on a common reference format for ground truth data or comparison between methods. However, an attempt has recently been made to alleviate this problem with the introduction of the ‘Instrument Detection and Tracking’ challenge at the Endoscopic Vision workshop at MICCAI 2015.

For the review, we carried out systematic searches using the Google Scholar and PubMed databases using the keywords: “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking”. In addition to the initial search results, we followed the citations of the obtained papers and all peer reviewed English language publications between 2000 and 2015 were considered. To maintain a reasonable methodological scope, we explicitly focused on papers describing image-based and marker-less surgical tool detection techniques. Other approaches more reliant on the use of external markers, while being not in the scope of this review still represent an important portion of the literature and we provide an overview of such methods in Section 5. We considered methods applied to any surgical field using image data from any type of surgical camera (e.g. endoscope and microscope). A total of twenty-eight publications form the methodological basis for this review and we describe and classify each prior work in three categories: (i) validation data-set, (ii) detection methods, and (iii) validation methodology. The diagram in Fig. 1 shows the subdivision and structure of each category and our systematic methodology.

The geometry of imaging a surgical instrument during surgery is shown schematically in Fig. 2a and b for MIS using an endoscope and for retinal microsurgery using a surgical microscope. Endoscopes and surgical microscopes tend to be the most common surgical cameras, and can appear in monocular and stereo variations. The surgical camera is modelled as a pinhole projective camera and its coordinate system is taken as the reference coordinate system. We define detection as the estimation of a set of pose parameters which describe the position and orientation of a surgical instrument in this reference coordinate system. These parameters can for example be (x, y) translations, rotation and scale if working solely in the 2D space of the image plane or alternatively this can extend to (x, y, z) and roll, pitch, yaw of 3D pose parameters. We assume a left handed coordinate system to describe the camera coordinate system with the z axis aligned with the optical axis, unless stated otherwise.

In our review of validation data-sets and methodology components, we refer to terminologies described in Jannin et al. (2006) and Jannin and Korb (2008) where validation refers to assessing that a method fulfils the purpose for which it was intended, as opposed to verification which assesses that a method is built according to its specifications, and evaluation consisting of assessing that the method is accepted by the end-users and is reliable for a specific purpose.

In Fig. 3, surgical tools used in different setups and for different procedures are displayed where two categories emerge. First, instruments are deeply articulated and enable 6 degree of freedom (DOF) movements, such as da Vinci robotic instruments employed for minimally invasive procedures. In the second category, instruments employed can be rigid or articulated with multiple parts, usually for eye-surgery and neurosurgery.

Section snippets

Validation datasets

To describe validation data-sets, we propose to rely on four categories of information: the study conditions in which data have been acquired, the amount of data and its type, the range of challenging visual conditions covered by the data, and the type of data annotation provided. The majority of studies covered in this review focusses solely on its associated data-set, with little cross pollination of datasets between studies. Table 1 provides an overview of validation data-sets and Fig. 4

Tool detection methods

Detection of any object can be described quite generally as a parameter estimation problem over a set of image features. Broadly there are three strategies which have been used to solve the problem. The first two fit within a more holistic modelling paradigm and are separated into discriminative methods using discrete classification and generative methods which aim to regress the desired parameters in a continuous space. The third strategy encompasses ad-hoc methods that rely on empirical

Validation methodology

In order to quantify surgical tool detector performance and perform rankings in a realistic, unbiased, and informative manner, a proper and well-defined validation methodology is required. To do so, we propose to investigate existing tool detection validation methodologies through their specification phase (high-level) and computation phase (low-level). In the former, we explore the objective, the validation type and the model validation. In the latter, we examine validation criterion and its

Alternative detection methods

The instrument detection and tracking methods suggested so far in the review cover methodologies that make no modification to the design of the instruments or the surgical workflow. This is generally seen as a desirable quality (Stoyanov, 2012) as the clinical translation of this type of method is comparatively straightforward as modifications have sterilization, legal and installation challenges. However, as illustrated throughout this review there are many significant challenges around

Discussion

Image-based surgical tool detection and tracking methods have been studied for almost two decades and have made marked progress in conjunction to advances in general object detection within the computer vision community. We expect the field to grow and have increased importance because surgery as a field is fully committed to the MIS paradigm which inherently relies on cameras and imaging devices. In this paper, we have reviewed the main lines of exploration so far in image-based detection and

Conclusion

With the ever increasing use of MIS techniques there is a growing need for CAI systems in surgery. Automatic and accurate detection of surgical instruments within the coordinate system of surgical camera is critical and there are increasing efforts to develop image-based and marker-less tool detection approaches. In this paper, we have reviewed the state of the art in this field. We have discussed how computer vision techniques represent a highly promising approach of detecting, localizing and

Acknowledgements

David Bouget would like to acknowledge the financial support of Carl Zeiss Meditec AG. Max Allan would like to acknowledge the financial support of the Rabin Ezra foundation as well as the EPSRC funding for the DTP in Medical and Biomedical Imaging at UCL. Danail Stoyanov would like to acknowledge the financial support of a Royal Academy of Engineering/EPSRC Fellowship.

References (125)

  • M. Allan et al.

    Toward detection and localization of instruments in minimally invasive surgery

    Biomed. Eng., IEEE Trans.

    (2013)
  • M. Allan et al.

    2d-3d pose tracking of rigid instruments in minimally invasive surgery

    Information Processing in Computer-Assisted Interventions

    (2014)
  • M. Alsheakhali et al.

    Surgical tool detection and tracking in retinal microsurgery

    SPIE Medical Imaging

    (2015)
  • M. Ambai et al.

    Card: Compact and real-time descriptors

    Computer Vision (ICCV), 2011 IEEE International Conference on

    (2011)
  • B. Babenko et al.

    Task specific local region matching

    Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on

    (2007)
  • J.E. Bardram et al.

    Phase recognition during surgical procedures using embedded and body-worn sensors

    Pervasive Computing and Communications (PerCom), 2011 IEEE International Conference on

    (2011)
  • M. Baumhauer et al.

    Navigation in endoscopic soft tissue surgery: perspectives and limitations

    J. Endourol.

    (2008)
  • H. Bay et al.

    Speeded-up robust features (surf)

    Comput. Vision Image Understanding

    (2008)
  • R. Benenson et al.

    Seeking the strongest rigid detector

    Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on

    (2013)
  • Benenson, R., Omran, M., Hosang, J., Schiele, B., 2014. Ten years of pedestrian detection, what have we learned?arXiv...
  • A. Bhattachayya

    On a measure of divergence between two statistical population defined by their population distributions

    Bull. Calcutta Math. Soc.

    (1943)
  • C. Bibby et al.

    Robust Real-Time visual tracking using Pixel-Wise posteriors

    Proceedings of the 10th European Conference on Computer Vision

    (2008)
  • L. Bouarfa et al.

    In-vivo real-time tracking of surgical instruments in endoscopic video

    Minimally Invasive Therapy Allied Technol.

    (2012)
  • D. Bouget et al.

    Detecting surgical tools by modelling local appearance and global shape

    Med. Imaging, IEEE Trans.

    (2015)
  • T. Brox et al.

    Combined Region-and Motion-based 3D Tracking of Rigid and Articulated Objects

    IEEE Trans. Patt. Anal. Mach. Intell.

    (2010)
  • D. Burschka et al.

    Navigating inner space: 3-d assistance for minimally invasive surgery

    Rob. Auton. Syst.

    (2005)
  • A.M. Cano et al.

    Laparoscopic tool tracking method for augmented reality surgical applications

    Biomedical Simulation

    (2008)
  • A. Casals et al.

    Automatic guidance of an assistant robot in laparoscopic surgery

    Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on

    (1996)
  • T. Cashman et al.

    What shape are dolphins? building 3D morphable models from 2D images

    IEEE Trans. Patt. Anal. Mach. Intell.

    (2013)
  • T. Cashman et al.

    What shape are dolphins? building 3D morphable models from 2D images

    IEEE Trans. Patt. Anal. Mach. Intell.

    (2013)
  • M. Chmarra et al.

    Systems for tracking minimally invasive surgical instruments

    Minimally Invasive Therapy Allied Technol.

    (2007)
  • B. Christe et al.

    Testing potential interference with rfid usage in the patient care environment

    Biomed. Instrument. Technol.

    (2008)
  • K. Cleary et al.

    State of the art in surgical robotics: clinical applications and technology challenges

    Comput. Aided Surg.

    (2001)
  • D. Cremers

    Dynamical statistical shape priors for level set-based tracking

    Patt. Anal. Mach. Intell., IEEE Trans. on

    (2006)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on

    (2005)
  • A. Darzi et al.

    Recent advances in minimal access surgery

    BMJ: British Med. J.

    (2002)
  • B. Davis

    A review of robotics in surgery

    Proc. Inst. Mech. Eng.. Part H, J. of Eng. in medicine

    (2000)
  • G. Dogangil et al.

    A review of medical robotics for minimally invasive soft tissue surgery

    Proceedings of the Institution of Mechanical Engineers. Part H, J. of Eng. in Medicine

    (2010)
  • C. Doignon et al.

    Real-time segmentation of surgical instruments inside the abdominal cavity using a joint hue saturation color feature

    Real-Time Imaging

    (2005)
  • C. Doignon et al.

    Segmentation and guidance of multiple rigid objects for intra-operative endoscopic vision

    Dynamical Vision

    (2007)
  • P. Dollar et al.

    Supervised learning of edges and object boundaries

    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on

    (2006)
  • P. Dollár et al.

    Integral channel features

    Proceedings of the British Machine Vision Conference

    (2009)
  • P. Dollár et al.

    Feature mining for image classification

    Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on

    (2007)
  • P. Dollár et al.

    Pedestrian detection: an evaluation of the state of the art

    TPAMI

    (2011)
  • Duncan, R. G., Froggatt, M. E., 2010. Fiber optic position and/or shape sensing based on rayleigh...
  • R. Elfring et al.

    Assessment of optical localizer accuracy for computer aided surgery systems

    Comput. Aided Surg.

    (2010)
  • T. Fawcett

    An introduction to roc analysis

    Patt. Recognit. Lett.

    (2006)
  • V. Ferrari et al.

    Progressive search space reduction for human pose estimation

    Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on

    (2008)
  • J. Flusser et al.

    Pattern recognition by affine moment invariants

    Patt. Recognit.

    (1993)
  • C. Galleguillos et al.

    Context based object categorization: a critical survey

    Comput. Vision Image Understanding

    (2010)
  • Cited by (244)

    View all citing articles on Scopus
    1

    Present address: Department Mechanical Engineering, K.U.Leuven, 3001 Heverlee, Belgium.

    View full text