Survey PaperVision-based and marker-less surgical tool detection and tracking: a review of the literature
Graphical abstract
Introduction
Technological advances have had a considerable impact on modern surgical practice. In particular, the miniaturisation of surgical instruments and advanced instrument design to enable dexterous tissue manipulation have been key drivers behind reducing surgical trauma and giving rise to Minimally Invasive Surgery (MIS) (Dogangil, Davies, Rodriguez, 2010, Davis, 2000, Cleary, Nguyen, 2001). In MIS the surgeon accesses the surgical site through trocar ports, illumination is delivered via optical fibres or light-emitting diodes (LED) and the anatomy is observed through a digital video signal either from a CMOS sensor in the body or an external camera connected to a series of lenses integrated in a laparoscope. By reducing the access incisions and trauma caused by surgery, MIS has led to significant patient benefits and is likely to continue to be one of the most important criteria to the evolution of surgical techniques (Darzi and Mackay, 2002). Specialized surgical instruments are required in MIS to give the surgeon the ability to manipulate the internal anatomy, dissect, ablate and suture tissues. Most recently, such instruments have become robotics manipulators. Mastering the control and use of MIS tools and techniques takes significant training and requires the acquisition of new skills compared to open surgical approaches (Van der Meijden and Schijven, 2009). The MIS instruments deliver a reduced sense of touch from the surgical site, the endoscopic camera restricts the field-of-view (FoV) and localisation (Baumhauer et al., 2008), and the normal hand-motor axis is augmented. As well as impacting the operating surgeon, the introduction of new equipment and devices enabling MIS within the operating theatre means that the whole clinical team must be trained and qualified to operate within the augmented environment in order to avoid preventable adverse events (Kohn et al., 2000). This can have complex implications on clinical training periods and costs, the management of clinical facilities, and ultimately to patient outcomes.
To overcome some of these challenges, computer-assisted intervention (CAI) systems attempt to make effective use of pre-operative and intra-operative patient specific information from different sources, sensors and imaging modalities and to enhance the workflow, ergonomics, control and navigation capabilities during surgery (Mirota, Ishii, Hager, 2011, Stoyanov, 2012). A common requirement and difficult practical challenge for CAI systems is to have real-time knowledge of the pose of the surgical tools with respect to the anatomy and any imaging information. Different approaches for instrumental localisation have been investigated including electro-magnetic (EM) (Lahanas, Loukas, Georgiou, 2015, Fried, Kleefield, Gopal, Reardon, Ho, Kuhn, 1997) and optical tracking (Elfring et al., 2010), robot kinematics (Reiter et al., 2012b) and image-based tracking in endoscopic images, ultrasound (US) (Hu et al., 2009) and fluoroscopy (Weese et al., 1997). Image-based approaches are highly attractive because they do not require modification to the instrument design or the operating theatre and they can provide positional and motion information directly within the coordinate frame of the images used by the surgeon to operate. A major challenge for image-based techniques is robustness and in particular to the diverse range of surgical specialisations and conditions that may affect image quality and visibility. With this paper we review the current state-of-the-art in image-based and marker-less surgical instrument detection and tracking, focusing on the aspects of prior work. Our major contributions are threefold:
- •
Summarising the different datasets that are available within the community as well as cohesion and convergence towards a common set of annotations following a standard format;
- •
Algorithmic review highlighting the various advantages and disadvantages of each method. There are currently no comprehensive reviews on surgical instrument detection which hinders new researchers from learning about the field and additionally prevents cross-pollination of ideas between research groups;
- •
Analysing the validation methodologies that have been used to produce detection results because currently there is limited consensus on a common reference format for ground truth data or comparison between methods. However, an attempt has recently been made to alleviate this problem with the introduction of the ‘Instrument Detection and Tracking’ challenge at the Endoscopic Vision workshop at MICCAI 2015.
For the review, we carried out systematic searches using the Google Scholar and PubMed databases using the keywords: “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking”. In addition to the initial search results, we followed the citations of the obtained papers and all peer reviewed English language publications between 2000 and 2015 were considered. To maintain a reasonable methodological scope, we explicitly focused on papers describing image-based and marker-less surgical tool detection techniques. Other approaches more reliant on the use of external markers, while being not in the scope of this review still represent an important portion of the literature and we provide an overview of such methods in Section 5. We considered methods applied to any surgical field using image data from any type of surgical camera (e.g. endoscope and microscope). A total of twenty-eight publications form the methodological basis for this review and we describe and classify each prior work in three categories: (i) validation data-set, (ii) detection methods, and (iii) validation methodology. The diagram in Fig. 1 shows the subdivision and structure of each category and our systematic methodology.
The geometry of imaging a surgical instrument during surgery is shown schematically in Fig. 2a and b for MIS using an endoscope and for retinal microsurgery using a surgical microscope. Endoscopes and surgical microscopes tend to be the most common surgical cameras, and can appear in monocular and stereo variations. The surgical camera is modelled as a pinhole projective camera and its coordinate system is taken as the reference coordinate system. We define detection as the estimation of a set of pose parameters which describe the position and orientation of a surgical instrument in this reference coordinate system. These parameters can for example be (x, y) translations, rotation and scale if working solely in the 2D space of the image plane or alternatively this can extend to (x, y, z) and roll, pitch, yaw of 3D pose parameters. We assume a left handed coordinate system to describe the camera coordinate system with the z axis aligned with the optical axis, unless stated otherwise.
In our review of validation data-sets and methodology components, we refer to terminologies described in Jannin et al. (2006) and Jannin and Korb (2008) where validation refers to assessing that a method fulfils the purpose for which it was intended, as opposed to verification which assesses that a method is built according to its specifications, and evaluation consisting of assessing that the method is accepted by the end-users and is reliable for a specific purpose.
In Fig. 3, surgical tools used in different setups and for different procedures are displayed where two categories emerge. First, instruments are deeply articulated and enable 6 degree of freedom (DOF) movements, such as da Vinci robotic instruments employed for minimally invasive procedures. In the second category, instruments employed can be rigid or articulated with multiple parts, usually for eye-surgery and neurosurgery.
Section snippets
Validation datasets
To describe validation data-sets, we propose to rely on four categories of information: the study conditions in which data have been acquired, the amount of data and its type, the range of challenging visual conditions covered by the data, and the type of data annotation provided. The majority of studies covered in this review focusses solely on its associated data-set, with little cross pollination of datasets between studies. Table 1 provides an overview of validation data-sets and Fig. 4
Tool detection methods
Detection of any object can be described quite generally as a parameter estimation problem over a set of image features. Broadly there are three strategies which have been used to solve the problem. The first two fit within a more holistic modelling paradigm and are separated into discriminative methods using discrete classification and generative methods which aim to regress the desired parameters in a continuous space. The third strategy encompasses ad-hoc methods that rely on empirical
Validation methodology
In order to quantify surgical tool detector performance and perform rankings in a realistic, unbiased, and informative manner, a proper and well-defined validation methodology is required. To do so, we propose to investigate existing tool detection validation methodologies through their specification phase (high-level) and computation phase (low-level). In the former, we explore the objective, the validation type and the model validation. In the latter, we examine validation criterion and its
Alternative detection methods
The instrument detection and tracking methods suggested so far in the review cover methodologies that make no modification to the design of the instruments or the surgical workflow. This is generally seen as a desirable quality (Stoyanov, 2012) as the clinical translation of this type of method is comparatively straightforward as modifications have sterilization, legal and installation challenges. However, as illustrated throughout this review there are many significant challenges around
Discussion
Image-based surgical tool detection and tracking methods have been studied for almost two decades and have made marked progress in conjunction to advances in general object detection within the computer vision community. We expect the field to grow and have increased importance because surgery as a field is fully committed to the MIS paradigm which inherently relies on cameras and imaging devices. In this paper, we have reviewed the main lines of exploration so far in image-based detection and
Conclusion
With the ever increasing use of MIS techniques there is a growing need for CAI systems in surgery. Automatic and accurate detection of surgical instruments within the coordinate system of surgical camera is critical and there are increasing efforts to develop image-based and marker-less tool detection approaches. In this paper, we have reviewed the state of the art in this field. We have discussed how computer vision techniques represent a highly promising approach of detecting, localizing and
Acknowledgements
David Bouget would like to acknowledge the financial support of Carl Zeiss Meditec AG. Max Allan would like to acknowledge the financial support of the Rabin Ezra foundation as well as the EPSRC funding for the DTP in Medical and Biomedical Imaging at UCL. Danail Stoyanov would like to acknowledge the financial support of a Royal Academy of Engineering/EPSRC Fellowship.
References (125)
- et al.
Pedestrian detection at 100 frames per second
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
(2012) - et al.
Brief: Binary robust independent elementary features
Comput. Vision–ECCV 2010
(2010) - et al.
Detection of grey regions in color images: application to the segmentation of a surgical instrument in robotized laparoscopy
Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on
(2004) - et al.
A discriminatively trained, multiscale, deformable part model
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
(2008) - et al.
Image-guided endoscopic surgery: Results of accuracy and performance in a multicenter clinical study using an electromagnetic tracking system
Laryngoscope
(1997) - et al.
Are we ready for autonomous driving? the kitti vision benchmark suite
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
(2012) - et al.
Epnp: An accurate o (n) solution to the pnp problem
Int. J. Comput. Vision
(2009) - et al.
Csift: a sift descriptor with color invariant characteristics
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
(2006) - et al.
A real-time deformable detector
Patt. Anal. Mach. Intell., IEEE Trans.
(2012) - et al.
Image based surgical instrument pose estimation with multi-class labelling and optical flow
Toward detection and localization of instruments in minimally invasive surgery
Biomed. Eng., IEEE Trans.
2d-3d pose tracking of rigid instruments in minimally invasive surgery
Information Processing in Computer-Assisted Interventions
Surgical tool detection and tracking in retinal microsurgery
SPIE Medical Imaging
Card: Compact and real-time descriptors
Computer Vision (ICCV), 2011 IEEE International Conference on
Task specific local region matching
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on
Phase recognition during surgical procedures using embedded and body-worn sensors
Pervasive Computing and Communications (PerCom), 2011 IEEE International Conference on
Navigation in endoscopic soft tissue surgery: perspectives and limitations
J. Endourol.
Speeded-up robust features (surf)
Comput. Vision Image Understanding
Seeking the strongest rigid detector
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
On a measure of divergence between two statistical population defined by their population distributions
Bull. Calcutta Math. Soc.
Robust Real-Time visual tracking using Pixel-Wise posteriors
Proceedings of the 10th European Conference on Computer Vision
In-vivo real-time tracking of surgical instruments in endoscopic video
Minimally Invasive Therapy Allied Technol.
Detecting surgical tools by modelling local appearance and global shape
Med. Imaging, IEEE Trans.
Combined Region-and Motion-based 3D Tracking of Rigid and Articulated Objects
IEEE Trans. Patt. Anal. Mach. Intell.
Navigating inner space: 3-d assistance for minimally invasive surgery
Rob. Auton. Syst.
Laparoscopic tool tracking method for augmented reality surgical applications
Biomedical Simulation
Automatic guidance of an assistant robot in laparoscopic surgery
Robotics and Automation, 1996. Proceedings., 1996 IEEE International Conference on
What shape are dolphins? building 3D morphable models from 2D images
IEEE Trans. Patt. Anal. Mach. Intell.
What shape are dolphins? building 3D morphable models from 2D images
IEEE Trans. Patt. Anal. Mach. Intell.
Systems for tracking minimally invasive surgical instruments
Minimally Invasive Therapy Allied Technol.
Testing potential interference with rfid usage in the patient care environment
Biomed. Instrument. Technol.
State of the art in surgical robotics: clinical applications and technology challenges
Comput. Aided Surg.
Dynamical statistical shape priors for level set-based tracking
Patt. Anal. Mach. Intell., IEEE Trans. on
Histograms of oriented gradients for human detection
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
Recent advances in minimal access surgery
BMJ: British Med. J.
A review of robotics in surgery
Proc. Inst. Mech. Eng.. Part H, J. of Eng. in medicine
A review of medical robotics for minimally invasive soft tissue surgery
Proceedings of the Institution of Mechanical Engineers. Part H, J. of Eng. in Medicine
Real-time segmentation of surgical instruments inside the abdominal cavity using a joint hue saturation color feature
Real-Time Imaging
Segmentation and guidance of multiple rigid objects for intra-operative endoscopic vision
Dynamical Vision
Supervised learning of edges and object boundaries
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Integral channel features
Proceedings of the British Machine Vision Conference
Feature mining for image classification
Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on
Pedestrian detection: an evaluation of the state of the art
TPAMI
Assessment of optical localizer accuracy for computer aided surgery systems
Comput. Aided Surg.
An introduction to roc analysis
Patt. Recognit. Lett.
Progressive search space reduction for human pose estimation
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
Pattern recognition by affine moment invariants
Patt. Recognit.
Context based object categorization: a critical survey
Comput. Vision Image Understanding
Cited by (244)
Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art
2024, Computers in Biology and MedicineCFFR-Net: A channel-wise features fusion and recalibration network for surgical instruments segmentation
2023, Engineering Applications of Artificial IntelligenceClean visual field reconstruction in robot-assisted laparoscopic surgery based on dynamic prediction
2023, Computers in Biology and MedicineLearning species-definite features from digital microscopic leather images
2023, Expert Systems with ApplicationsA closed-loop minimally invasive 3D printing strategy with robust trocar identification and adaptive alignment
2023, Additive ManufacturingEnhancing surgical performance in cardiothoracic surgery with innovations from computer vision and artificial intelligence: a narrative review
2024, Journal of Cardiothoracic Surgery
- 1
Present address: Department Mechanical Engineering, K.U.Leuven, 3001 Heverlee, Belgium.