Encoding atlases by randomized classification forests for efficient multi-atlas label propagation

doi:10.1016/j.media.2014.06.010

Medical Image Analysis

Volume 18, Issue 8, December 2014, Pages 1262-1273

https://doi.org/10.1016/j.media.2014.06.010 Get rights and content

Highlights

•
We propose encoding image atlases by randomized classification forests.
•
Scheme requires only a single registration for labeling of one target.
•
Increased efficiency through properties of the proposed encoding.
•
Evaluation of accuracy on 4 publicly available datasets.

Abstract

We propose a method for multi-atlas label propagation (MALP) based on encoding the individual atlases by randomized classification forests. Most current approaches perform a non-linear registration between all atlases and the target image, followed by a sophisticated fusion scheme. While these approaches can achieve high accuracy, in general they do so at high computational cost. This might negatively affect the scalability to large databases and experimentation. To tackle this issue, we propose to use a small and deep classification forest to encode each atlas individually in reference to an aligned probabilistic atlas, resulting in an Atlas Forest (AF). Our classifier-based encoding differs from current MALP approaches, which represent each point in the atlas either directly as a single image/label value pair, or by a set of corresponding patches. At test time, each AF produces one probabilistic label estimate, and their fusion is done by averaging. Our scheme performs only one registration per target image, achieves good results with a simple fusion scheme, and allows for efficient experimentation. In contrast to standard forest schemes, in which each tree would be trained on all atlases, our approach retains the advantages of the standard MALP framework. The target-specific selection of atlases remains possible, and incorporation of new scans is straightforward without retraining. The evaluation on four different databases shows accuracy within the range of the state of the art at a significantly lower running time.

Graphical abstract

Introduction

Labeling of healthy human brain anatomy is a crucial prerequisite for many clinical and research applications. Due to the involved effort (a fully manual labeling of a single brain takes 2–3 days (Klein and Tourville, 2012)), and increasing database sizes (e.g. ADNI, IXI, OASIS), a lot of research has been devoted to develop automatic methods for this task. While brain labeling is a general segmentation task (with a high number of labels), the standard approach for this task is multi-atlas label propagation (MALP) – see (Landman and Warfield, 2012) for an overview of the state of the art. With the atlas denoting a single labeled scan, MALP methods first derive a set of label proposals for the target image, each based on a single atlas, and then combine these proposals into a final estimate.

Currently, there are two main strategies for estimating atlas-specific label proposals. The first and larger group of methods non-linearly aligns each of the atlas images to the target image, and then – assuming one-to-one correspondence at each point – uses the atlas labels directly as label proposals, cf. e.g. (Rohlfing et al., 2004, Warfield et al., 2004, Heckemann et al., 2006). The second group of patch-based methods has recently enjoyed increased attention (Coupé et al., 2011, Rousseau et al., 2011, Wu et al., 2012). Here, the label proposal is estimated for each point in the target image by a local similarity-based search in the atlas. Patch-based approaches relax the one-to-one assumption, and aim at reducing the computational times by using linear instead of deformable alignment (Coupé et al., 2011, Rousseau et al., 2011), resulting in labeling running times of 22–130 min per target on the IBSR dataset (Rousseau et al., 2011). The fusion step, which combines the atlas-specific label proposals into a final estimate, aims to correct for inaccurate registration or labelings. While label fusion is a very active research topic, it is not the focus of this work. Additionally, some approaches perform further refinement, e.g. by learning classifiers for fine-scale class-based correction (Wang et al., 2012).

While current state of art techniques can achieve high levels of accuracy, in general they are computationally demanding. This is primarily due to the non-linear registration between all atlases and the target image, combined with the long running times for the best performing registration schemes for the problem (Klein et al., 2009). Current methods state running times of 2–20 h per single registration (Landman and Warfield, 2012). Furthermore, sophisticated fusion schemes can also be computationally expensive. State of the art approaches report fusion running times of 3–5 h (Wang et al., 2012, Asman and Landman, 2012a, Asman and Landman, 2012b).

While the major drawback of high computational costs is the scalability to large and growing databases, they also limit the amount of possible experimentation during the algorithm development phase.

Our method differs from previous MALP approaches in the way how label proposals for a single atlas are generated, and is designed with the goal of low computational cost at test time and experimentation. In this work, we focus on the question of how a single atlas is encoded. From this point of view, methods assuming one-to-one correspondence represent an atlas directly as an image/label-map pair, while patch-based methods encode it by a set of localized patch collections. Variations of the patch-based encoding include use of sparsity (Wu et al., 2012), or use of label-specific kNN search structures (Wang et al., 2013).

In contrast to previous representations, we encode a single atlas together with its relation to label priors by a small and deep classification forest – which we call an Atlas Forest (AF). Given a target image as input (and an aligned probabilistic atlas), each AF returns a probabilistic label estimate for the target. Label fusion is then performed by averaging the probability estimates obtained from different AFs. Please see Fig. 1 for an overview of our method. While patch-based methods use a static representation for each image point (i.e. a patch of fixed size), our encoding is spatially varying. In the training step, our approach learns to describe different image points by differently shaped features, depending on the point’s contextual appearance.

Compared to current MALP methods, our approach has the following important characteristics:

1.
Only one registration per target is required. This registration aligns the probabilistic atlas to the target. Since only one registration per target is required, the running time is independent of the database size in this respect. This differs conceptually from patch-based approaches, where the efficiency does not come from reducing the number of registrations, but from using affine instead of non-linear transformations.
2.
Efficient generation of atlas proposals and their fusion. For proposal generation one AF per atlas is evaluated. Due to the inherent efficiency of tree-based classifiers at test time, this is significantly more efficient than current approaches.
3.
Efficient Experimentation. A leave-one-out cross-validation of a standard MALP approach on n atlases requires registration between all images, thus scaling with $n^{2}$ . In contrast, the training of the single AFs, which is the most costly component of our approach for experimentation, scales with n (this assumes that generating the probabilistic atlas is not part of experimentation).

Besides being efficient, experiments on 4 databases in Section 3 indicate that our scheme also achieves accuracy within the range of the state of the art.

Being based on discriminative classifiers, our approach is also related to a number of works which employ machine learning techniques. Compared to the use of multi-atlas label propagation techniques discussed above, the use of machine learning for brain labeling is still relatively limited. In (Tu et al., 2008), a hybrid model is proposed, which combines a discriminative probabilistic-boosting tree (PBT) classifier (Tu, 2005) with a PCA-based generative shape model of the individual anatomical structures. In (Tu and Bai, 2010), the Auto-Context framework with the PBT classifier was applied to brain labeling, and shown to outperform (Tu et al., 2008). Recently, the use of classifiers to correct systematic mistakes of labeling methods in a post-processing step has been shown to improve accuracy (Wang et al., 2011, Wang et al., 2012).

The major difference of these works to our approach is that they use the common scheme in which all available atlases are used for the training of one classifier. This is also true of standard forest schemes (cf. e.g. (Shotton et al., 2011, Iglesias et al., 2011a, Montillo et al., 2011, Zikic et al., 2012)) which train each tree on data from all training images.

In contrast, the main idea of this paper is to use one classifier to encode a single atlas by training it only on this exemplar. This approach has three advantageous properties for the multi-atlas label propagation setting.

1.
Simple incorporation of new atlases into the database. For standard forest schemes, addition of new training data requires complete retraining or approximations. In our scenario, a new forest is simply trained on the new atlas exemplar and added to the other, previously trained AFs.
2.
Selection of atlases for target-specific evaluation is straightforward since every AF is associated with a single atlas. This property allows use of atlas-selection (Aljabar et al., 2009), which can improve accuracy and reduce the computational cost. This step seems non-obvious for standard forest schemes where predictions are not separable with respect to specific atlases.
3.
Efficient experimentation. For cross-validation, standard schemes have to be trained for every training/testing split of data, which is extremely costly. In our scenario, each AF is trained only once. Any leave-k-out test is performed simply by using the subset of $n - k$ AFs corresponding to the training data. This point can be seen as a generalization of the corresponding experimentation efficiency property in the MALP setting.

In general, training ensemble classifiers on disjunct subsets of data cannot be expected to reach higher accuracy than training each classifier on all data or overlapping subsets, especially if the subsets are different atlases. The difference in accuracy between the two models will depend on the application, and especially the similarity of the atlases to each other. Furthermore, in practice, the computational complexity of each model will also limit the possibility to set the parameters of each model, such that it performs as close as possible to its theoretical limit. In Section 3.1.2, we experimentally show that the accuracy of the proposed scheme and a ’reasonable’ standard forest scheme seems to be on approximately the same level for the brain labeling task.

The main idea of thinking about a single atlas as a classifier is already mentioned for example in (Rohlfing et al., 2005). And indeed, the action of a single warped atlas in a standard MALP setting is that of a classifier – however a very simple one: For each spatial point the warped atlas will assign the value from the corresponding warped atlas label map.

In this work, we propose the use of non-trivial machine learning-based classifiers to encode individual atlases in the MALP setting, and demonstrate that this approach exceeds the standard encoding in terms of efficiency, while maintaining high accuracy, but also has the additional advantages in comparison to standard learning schemes, as discussed in detail above.

Our work on atlas forests was originally presented in a form of a conference paper in (Zikic et al., 2013a). This article extends the previous conference publication by providing a new evaluation with a simplified system, and a detailed evaluation and analysis of the method, as well as a hopefully improved overall presentation. To our best knowledge, the only other work which considers the use of non-trivial classifiers which are trained by individual atlases is (Akhondi-Asl and Warfield, 2013). The focus of that work is on a generalization of the STAPLE fusion method (Warfield et al., 2004) to operate on probabilistic estimates rather than thresholded label estimates. To generate per-atlas probabilistic estimates, (Akhondi-Asl and Warfield, 2013) uses a Gaussian Mixture Model (GMM) of patch intensities, and trains an individual GMM for each atlas. This article has a focus on efficiency and the relation of the proposed scheme to existing machine learning schemes. It differs from previous work in technical details through use of a different classifier in combination with probabilistic atlases, and a simple averaging of probabilities as the fusion method. After describing the details of the method in the next section, we evaluate its performance and analyze it in Section 3, and discuss and summarize its properties in Section 4.

Section snippets

Method – Atlas Forests

An atlas forest (AF) encodes a single atlas by training one randomized classification forest (Breiman, 2001) exclusively on the data from the atlas. Every point in the atlas is described by its (contextual) appearance only, without considering its location (this can be seen as an even further relaxation of the one-to-one assumption, compared to patch-based approaches).

While this allows us to avoid registration of atlases to the target image, a problem with such a location-oblivious approach is

Evaluation and analysis

We evaluate our approach on four brain MRI data sets:

–
IBSR database (Section 3.1).
–
LPBA40 database (Section 3.2).
–
MICCAI 2012 Multi-Atlas Labeling Challenge (3.3).
–
MICCAI 2013 SATA Challenge (Section 3.4).

Additionally, we perform an analysis of the influence of the different method components and their variations in Sections 3.1.1 Influence of method components, 3.1.2 Comparison to the standard forest scheme, 3.1.3 Auto-context variation, 3.1.4 Parameter settings, and analyse the structure of the

Discussion and summary

When comparing the proposed method to standard forest schemes, two interesting points arise: relation of our approach to standard bagging strategies, and the issue of over-training.

Bagging is a strategy for diversifying trees through randomization, by selecting a random subset of samples for the training of each tree. Single trees are then non-linear probabilistic approximating functions for a random sample subset, and the forest prediction is their linear combination. This strategy has the

References (35)

P. Aljabar et al.
Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy
Neuroimage
(2009)
P. Coupé et al.
Patch-based segmentation using expert priors: application to hippocampus and ventricle segmentation
Neuroimage
(2011)
R. Heckemann et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
Neuroimage
(2006)
S. Joshi et al.
Unbiased diffeomorphic atlas construction for computational anatomy
Neuroimage
(2004)
A. Klein et al.
Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration
Neuroimage
(2009)
T. Rohlfing et al.
Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains
Neuroimage
(2004)
H. Wang et al.
A learning-based wrapper method to correct systematic errors in automatic image segmentation: consistently improved performance in hippocampus, cortex and brain segmentation
Neuroimage
(2011)
A. Akhondi-Asl et al.
Simultaneous truth and performance level estimation through fusion of probabilistic segmentations
IEEE TMI
(2013)
Asman, A., Landman, B., 2012a. Multi-atlas segmentation using spatial STAPLE. In: MICCAI Workshop on Multi-Atlas...
Asman, A.J., Landman, B.A., 2012b. Multi-atlas segmentation using non-local STAPLE. In: MICCAI Workshop on Multi-Atlas...

Asman, A., Akhondi-Asl, A., Wang, H., Tustison, N., Avants, B., Warfield, S.K., Landman, B., 2013. Miccai 2013...

L. Breiman

Random forests

Machine Learning

(2001)

B. Glocker et al.

Dense image registration through MRFs and efficient linear programming

MedIA

(2008)

Iglesias, J.E., Konukoglu, E., Montillo, A., Tu, Z., Criminisi, A., 2011a. Combining generative and discriminative...

J.E. Iglesias et al.

Robust brain extraction across datasets and comparison with publicly available methods

IEEE TMI

(2011)

A. Klein et al.

101 Labeled brain images and a consistent human cortical labeling protocol

Front. Brain Imag. Methods

(2012)

Cited by (83)

PSACNN: Pulse sequence adaptive fast whole brain segmentation
2019, NeuroImage
With the advent of convolutional neural networks (CNN), supervised learning methods are increasingly being used for whole brain segmentation. However, a large, manually annotated training dataset of labeled brain images required to train such supervised methods is frequently difficult to obtain or create. In addition, existing training datasets are generally acquired with a homogeneous magnetic resonance imaging (MRI) acquisition protocol. CNNs trained on such datasets are unable to generalize on test data with different acquisition protocols. Modern neuroimaging studies and clinical trials are necessarily multi-center initiatives with a wide variety of acquisition protocols. Despite stringent protocol harmonization practices, it is very difficult to standardize the gamut of MRI imaging parameters across scanners, field strengths, receive coils etc., that affect image contrast. In this paper we propose a CNN-based segmentation algorithm that, in addition to being highly accurate and fast, is also resilient to variation in the input acquisition. Our approach relies on building approximate forward models of pulse sequences that produce a typical test image. For a given pulse sequence, we use its forward model to generate plausible, synthetic training examples that appear as if they were acquired in a scanner with that pulse sequence. Sampling over a wide variety of pulse sequences results in a wide variety of augmented training examples that help build an image contrast invariant model. Our method trains a single CNN that can segment input MRI images with acquisition parameters as disparate as $T_{1}$ -weighted and $T_{2}$ -weighted contrasts with only $T_{1}$ -weighted training data. The segmentations generated are highly accurate with state-of-the-art results (overall Dice overlap $= 0.94$ ), with a fast run time ( $\approx$ 45 s), and consistent across a wide range of acquisition protocols.
Multiatlas segmentation
2019, Handbook of Medical Image Computing and Computer Assisted Intervention
Advancing radiology towards quantitative interpretation relies on innovating with image processing algorithms for context navigation (e.g., “Does this image contain a tumor?”), object localization (”Where is the spleen located?”), and segmentation (e.g., “Which voxels in the image belong to the hippocampus?”). In this chapter, we focus on the most invasive of these questions – segmentation. An algorithm that yields a segmentation of a medical image assigns each voxel in a target medical image to an object class (either deterministically or probabilistically). The results of a segmentation process can be used to address a wide variety of clinical problems from volumetry and shape analysis to radiomics and image-guided interventions. Historically, human efforts for capturing medical image segmentations (e.g., labelers or, herein, raters) have been remarkably resource intensive and subject to high degrees of interrater variability. Hence, medical images paired with their segmentations (so-called maps, charts, or, herein, atlases) are treasured resources. Creating approaches to learn algorithms based on an available and/or feasible collection of atlases has been at the forefront of medical image computing research for the better part of a quarter century. An intuitive and effective family of methods for using existing atlases to create a new algorithm is known as multiatlas segmentation. The core idea underlying multiatlas segmentation is that differences in anatomy as seen through medical images are relatively small and can be compensated through image registration. Hence, the segmentation of an atlas registered to an unseen target represents a reasonable estimation of the true segmentation for that target. When the registration process is repeated for multiple atlases, one would achieve multiple reasonable estimates of a segmentation. The field of multiatlas segmentation centers on capturing and resolving the uncertainty associated with multiple segmentation estimates and optimizing the preprocessing, registration, information integration, and postprocessing steps needed to address complexities of specific anatomies, imaging modalities, and clinical context. The intent of this chapter is to provide historical context for interpretation of modern multiatlas segmentation methods while offering a consistent notation to compare and contrast the evolving literature. Notably, while we highlight salient innovations both from practical and theoretical perspectives, this chapter does not seek to provide a comprehensive literature review.
Automatic brain labeling via multi-atlas guided fully convolutional networks
2019, Medical Image Analysis
Citation Excerpt :
The main drawback of such methods is that the labeling performance highly depends on the reliability of non-rigid registration techniques used, which is often quite time-consuming (Iglesias and Sabuncu, 2015). Patch-based methods, on the other hand, have gained increased attention in image labeling, since they can alleviate the need for high registration accuracy through exploring several neighboring patches within a local search region (Tu and Bai, 2010; Hao et al., 2014; Zikic et al., 2014; Khalifa et al., 2016; Pereira et al., 2016, Zhang et al., 2017). For such methods, affine registration of the atlases to the target image is often used.
Multi-atlas-based methods are commonly used for MR brain image labeling, which alleviates the burdening and time-consuming task of manual labeling in neuroimaging analysis studies. Traditionally, multi-atlas-based methods first register multiple atlases to the target image, and then propagate the labels from the labeled atlases to the unlabeled target image. However, the registration step involves non-rigid alignment, which is often time-consuming and might lack high accuracy. Alternatively, patch-based methods have shown promise in relaxing the demand for accurate registration, but they often require the use of hand-crafted features. Recently, deep learning techniques have demonstrated their effectiveness in image labeling, by automatically learning comprehensive appearance features from training images. In this paper, we propose a multi-atlas guided fully convolutional network (MA-FCN) for automatic image labeling, which aims at further improving the labeling performance with the aid of prior knowledge from the training atlases. Specifically, we train our MA-FCN model in a patch-based manner, where the input data consists of not only a training image patch but also a set of its neighboring (i.e., most similar) affine-aligned atlas patches. The guidance information from neighboring atlas patches can help boost the discriminative ability of the learned FCN. Experimental results on different datasets demonstrate the effectiveness of our proposed method, by significantly outperforming the conventional FCN and several state-of-the-art MR brain labeling methods.
Iterative multi-path tracking for video and volume segmentation with sparse point supervision
2018, Medical Image Analysis
Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volumetric image data. To this end, this work presents a novel framework to produce pixel-wise segmentations using minimal supervision. Our method relies on 2D point supervision, whereby a single 2D location within an object of interest is provided on each image of the data. Our method then estimates the object appearance in a semi-supervised fashion by learning object-image-specific features and by using these in a semi-supervised learning framework. Our object model is then used in a graph-based optimization problem that takes into account all provided locations and the image data in order to infer the complete pixel-wise segmentation. In practice, we solve this optimally as a tracking problem using a K-shortest path approach. Both the object model and segmentation are then refined iteratively to further improve the final segmentation. We show that by collecting 2D locations using a gaze tracker, our approach can provide state-of-the-art segmentations on a range of objects and image modalities (video and 3D volumes), and that these can then be used to train supervised machine learning classifiers.
Automatic multi-anatomical skull structure segmentation of cone-beam computed tomography scans using 3D UNETR
2022, PLoS ONE
Factorisation-Based Image Labelling
2022, Frontiers in Neuroscience

View all citing articles on Scopus

View full text

Encoding atlases by randomized classification forests for efficient multi-atlas label propagation

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Method – Atlas Forests

Evaluation and analysis

Discussion and summary

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Neuroimage

Simultaneous truth and performance level estimation through fusion of probabilistic segmentations

IEEE TMI

Random forests

Machine Learning

Dense image registration through MRFs and efficient linear programming

MedIA

Robust brain extraction across datasets and comparison with publicly available methods

IEEE TMI

101 Labeled brain images and a consistent human cortical labeling protocol

Front. Brain Imag. Methods