Regression forests for efficient anatomy detection and localization in computed tomography scans
Graphical abstract
Highlights
► Accurate and very fast anatomy localization in CT scans. ► Simultaneous detection of 26 anatomical structures. ► Uses a parallel regression forest algorithm. ► Assessed on a large clinical database of CT images. ► Compared with a number of state of the art algorithms.
Introduction
This paper proposes a new, parallel algorithm for the efficient detection and localization of anatomical structures (‘organs’) in 3D computed tomography studies. Localizing anatomical structures is an important step for many subsequent image analysis tasks (possibly organ-specific) such as segmentation, registration and classification. It is also crucial for managing database systems and creating intelligent navigation and visualization tools. For instance, one application is the efficient retrieval of selected portions of patients’ scans from PACS databases. When a physician wishes to inspect a particular organ, the ability to determine its position and extent automatically means that it is not necessary to retrieve the entire scan (which could comprise hundreds of MB of data) but a smaller region of interest. Thus it is possible to achieve faster user interaction while making economical use of the limited bandwidth. The proposed organ localizer could potentially be used also for tracking the amount of radiation absorbed by each organ over time. However, in its current form, the approximate representation of organs would produce indicative dose estimations.
The main contribution of this work is a new parametrization of the anatomy localization task as a multivariate, continuous parameter estimation problem. This is addressed effectively via tree-based, non-linear regression. Unlike the popular classification forests (often referred to simply as “random forests”), regression forests (Breiman et al., 1984) have not yet been used in medical image analysis. Our approach is fully probabilistic and, unlike previous techniques, e.g. (Zhou et al., 2007, Fenchel et al., 2008), is trained to maximize the confidence of output predictions. As a by-product, our method produces salient anatomical landmarks; i.e. automatically selected “anchor” regions that help localize organs of interest with high confidence. Our algorithm can localize both macroscopic anatomical regions1 (e.g. abdomen, thorax, trunk, etc.) and smaller scale structures (e.g. heart, l. adrenal gland, femoral neck, etc.) using a single, efficient model, c.f. (Feulner et al., 2009).
Motivated mostly by the semantic navigation use-case scenario, our focus in this paper is on both accuracy of prediction and speed of execution. Our goal is to achieve accurate anatomy localization in seconds on a conventional machine.
Regression approaches. Regression algorithms (Hardle, 1990) estimate functions which map input variables to continuous outputs.2 The regression paradigm fits the anatomy localization task well. In fact, its goal is to learn the non-linear mapping from voxels directly to organ position and size.
The first work to use regression for anatomy localization in images is Zhou et al. (2005). There, the authors need to define the non-linear mapping as an analytical function whose exact form is learned via regularized boosting. They also present a thorough overview of different regression techniques and discuss the superiority of boosted regression. In their later work (Zhou et al., 2007), their boosted regression technique was improved by incorporating high degree-of-freedom weak learners. The main difference between that approach and the one presented here is in the non-linear mapping. Defining a regression function analytically as done in Zhou et al., 2005, Zhou et al., 2007 has two major drawbacks: (1) the definition of the function requires critical modeling assumptions for the type of the weak learner and the regularization term, and (2) obtaining a confidence measure for the regression output is non-trivial. In contrast, our approach does not assume an analytical form for the mapping. This results in a simpler formulation with fewer modeling choices. In addition, the probabilistic nature of our method yields a natural way of associating confidence with the predicted output. In fact, the training phase of our algorithm directly maximizes the confidence of the predicted probability distribution.
A comparison between boosting, forests and cascades is found in Yin et al. (2007). To our knowledge, so far only two papers have used regression forests in imaging (Montillo and Ling, 2009, Gall and Lempitsky, 2009), neither with application to medical image analysis. For instance, Gall and Lempitsky (2009) address the problem of detecting pedestrians vs. background. For the readers who might not be familiar with regression forests we provide a short explanation in the appendix. Also, a detailed description of general decision forests and their applications may be found in Criminisi and Shotton (2013), with free research code and demos available at http://research.microsoft.com/projects/decisionforests.
Classification-based approaches. In Zhan et al. (2008) organ detection is achieved via a confidence maximizing sequential scheduling of multiple, organ-specific classifiers. In contrast, our single, tree-based regressor allows us to deal naturally with multiple anatomical structures simultaneously. As shown in the machine learning literature (Torralba et al., 2007) this encourages feature sharing and, in turn better generalization. In Seifert et al. (2009) a sequence of probabilistic boosting tree (PBT) classifiers (first for salient slices, then for landmarks) are used. In contrast, our single regressor maps directly from voxels to organ poses; latent, salient landmark regions are extracted as a by-product. In Criminisi et al. (2009) the authors achieve localization of organ centers but fail to estimate the organ extent (similar to Gall and Lempitsky (2009)). Here we present a more direct, continuous model which estimates the position of the walls of the bounding box containing each organ; thus achieving simultaneous organ localization and extent estimation.
Marginal Space Learning. One of the most popular approaches for object localization in medical images is Marginal Space Learning (MSL) proposed in Zheng et al., 2007, Zheng et al., 2009a. MSL has been demonstrated to be very useful in practice (Zheng et al., 2009b, Barbu et al., 2012). However, that algorithm has three limitations. Firstly, MSL is designed to detect a single object at a time and extending it to the joint-localization of multiple objects (e.g. more than 20) is not immediate. For example, existing extensions rely on applying the algorithm iteratively, one run for each object of interest. The order of detection is either determined through combinatorial optimization or driven by the confidence values each object attains during the detection phase (Liu et al., 2010). In contrast, our method achieves joint-localization of any number of structures without modification and without worrying about complex ordering strategies.
Secondly, MSL builds upon multiple classification stages. For instance, to detect the position of the heart we may need: (1) a classifier trained to estimate overall translation, (2) a classifier trained on translation and rotation, and (3) yet another classifier trained on translation, rotation and scale. All three classifiers need be applied for each organ in a sequence. For e.g. 20 organs we would need to train 20 × 3 = 60 different classifiers, with clear scalability issues. In contrast, we propose using a single forest regressor (with e.g. only ∼4 trees) to deal with multiple organs (here tested on 26 anatomical structures).
Thirdly, we argue that solving a localization problem via classification is not optimal. In MSL, binary classifiers are run in a sliding-window fashion. For each point the classifier produces a positive answer (point is “close” to the structure) or a negative one (point is “far” from the structure). But reducing real-valued distances to binary decisions introduces a loss. Also, defining positive and negative examples is an ambiguous task. Instead, our regression forest directly estimates the 3D displacement of each voxel from the target regions. On the flip side, it is also true that in practice learning good classifiers seems to be easier than learning good regressors. This may be due to the fact that as a community we have had much more exposure to classification tasks than regression ones. This paper shows that for the application of anatomical bounding box localization using a regression forest can be more accurate than using a classification approach.
Registration-based approaches. Although atlas-based methods have enjoyed much popularity (Fenchel et al., 2008, Shimizu et al., 2006, Yao et al., 2006), their conceptual simplicity belies the technical difficulty inherent in achieving robust, inter-subject registration. Robustness may be improved by using multi-atlas techniques (Isgum et al., 2009) but only at the expense of multiple registrations and hence increased computation time. Our algorithm incorporates atlas information within a compact tree-based model. As shown in the results section, such model is more efficient than keeping around multiple atlases and achieves anatomy localization in only a few seconds. Comparisons with global affine atlas registration methods (similar to ours in computational cost) show that our algorithm produces lower errors and more stable predictions. Next we describe details of our approach.
Section snippets
Multivariate regression forests for organ localization
This section presents mathematical notation, problem parametrization and other details of our multi-organ regression forest with application to anatomy localization in CT images.
Mathematical notation. Vectors are represented in boldface (e.g. v), matrices as teletype capitals (e.g. Λ), and sets in calligraphic style (e.g. ). The position of a voxel in a CT volume is denoted v = (vx, vy, vz).
The labeled database. The 26 anatomical structures we wish to recognize are {abdomen, l. adrenal gland, r.
Results, comparisons and validation
This section assesses the proposed algorithm in terms of accuracy, runtime speed, and memory efficiency; and compares it to alternative techniques.
Conclusion
Anatomy localization has been cast here as a non-linear regression problem where all voxel samples vote for the position of all anatomical structures. Location estimates are obtained by a multivariate regression forest algorithm that is shown to be more accurate and efficient than competing registration-based and template-matching techniques.
At the core of the algorithm is a new information-theoretic metric for regression tree learning which works by maximizing the confidence of the predictions
References (29)
- et al.
Automatic detection and segmentation of lymph nodes from ct data
IEEE Trans. Med. Imaging
(2012) - et al.
Classification and Regression Trees
(1984) - et al.
Decision Forests for Computer Vision and Medical Image Analysis
(2013) - Criminisi, A., Shotton, J., Bucciarelli, S., 2009. Decision forests with long-range spatial context for organ...
- Fenchel, M., Thesen, S., Schilling, A., 2008. Automatic labeling of anatomical structures in MR fastview images using a...
- Feulner, J., Zhou, S.K., Seifert, S., Cavallaro, A., Hornegger, J., Comaniciu, D., 2009. Estimating the Body Portion of...
- Gall, J., Lempitsky, V., 2009. Class-specific Hough forest for object detection. In: IEEE CVPR,...
- Gueld, M.O., Kohnen, M., Keysers, D., Schubert, H., Wein, B.B., Bredno, J., Lehmann, T.M., 2002. Quality of DICOM...
Applied Non-Parametric Regression
(1990)The random subspace method for constructing decision forests
IEEE Trans. PAMI
(1998)
Multi-atlas-based segmentation with local decision fusionapplication to cardiac and aortic segmentation in ct scans
IEEE Trans. Med. Imaging
elastix: a toolbox for intensity-based medical image registration
IEEE Trans. Med. Imaging
Cited by (227)
Multi-task global optimization-based method for vascular landmark detection
2024, Computerized Medical Imaging and GraphicsMachine learning and lumbar spondylolisthesis
2023, Seminars in Spine SurgeryVertebral compression fracture detection using imitation learning, patch based convolutional neural networks and majority voting
2023, Informatics in Medicine UnlockedObject recognition in medical images via anatomy-guided deep learning
2022, Medical Image AnalysisAutomatic scan range for dose-reduced multiphase CT imaging of the liver utilizing CNNs and Gaussian models
2022, Medical Image Analysis