Introduction
-
Innovative Hybrid Model (FE1-UT, FE2-UT, and FE3-UT): Our new algorithm combines the UNET architecture with Transformers and feature enhancement techniques (MHE, CLAHE, and MBOBHE). The resulting hybrid models, named FE1-UT, FE2-UT, and FE3-UT, represent a significant advancement in the field of medical image segmentation.
-
Improved Performance: The primary focus of this research is to enhance the accuracy of results of segmentation for brain tumors. By integrating Transformers into the UNET framework, the models gain the ability to understand context and capture long-range dependencies within the data. This contextual understanding significantly improves segmentation accuracy, especially in cases involving intricate anatomical structures and indistinct features.
-
Feature Enhancement in Image Preprocessing: The study emphasizes the importance of feature enhancement during the image preprocessing stage. The use of image quality method such as MHE, CLAHE, and MBOBHE enhances the visibility of critical details within medical images, ensuring better results of segmentation.
-
Exceptional Accuracy: The models developed in this study achieve remarkable accuracy rates, exceeding 99%, on two publicly available datasets. This level of accuracy is a significant achievement in medical image segmentation and reflects the excellence of the proposed approach.
-
Section II provides an in-depth comparison of our novel methods with existing approaches.
-
In Section III, we offer a concise overview of the structure of our innovative techniques.
-
Section IV is dedicated to discussing the experimental results, including comprehensive discussions and comparisons with established methodologies.
-
Concluding the paper, we present our final remarks and conclusions in Section VI.
Related work
Image segmentation with CNN and UNET
Image segmentation with transformers
Method
Image enhancement
-
Compute the HistogramLet H be the histogram of the input image I, where H(i) represents the number of pixels with intensity i.
-
Identify Intensity Modes:Detect the peaks or modes in the histogram.Divide the Image:Divide the input image I into subregions based on the identified modes.
-
Apply Histogram Equalization:For each subregion, perform histogram equalization independently. Let's denote the subregions as I_1, I_2, ..., I_n, where n is the number of modes. Apply histogram equalization to each subregion as follows:
Divide the image into tiles
Calculate the histogram for each tile
Clip the histogram
After clipping, normalize the histogram so that its sum remains the same
Calculate the Cumulative Distribution Function (CDF)
Apply histogram equalization
Reconstruct the image
Improved U-net segmentation
Self-attention-transformer
Algorithm
Experimental setting and results
Evaluation metrics
Dataset description
Experimental parameters setting
Part | Layer Type | Layer Name | Output Shape | Number of Parameters |
---|---|---|---|---|
Encoder | Conv2D | conv2d | (None, 240, 240, 64) | 2368 |
Encoder | Conv2D | conv2d_1 | (None, 240, 240, 64) | 36928 |
Encoder | MaxPooling2D | max_pooling2d | (None, 120, 120, 64) | 0 |
Encoder | Dropout | dropout_1 | (None, 120, 120, 64) | 0 |
Encoder | Conv2D | conv2d_2 | (None, 120, 120, 128) | 73856 |
Encoder | Conv2D | conv2d_3 | (None, 120, 120, 128) | 147584 |
Encoder | MaxPooling2D | max_pooling2d_1 | (None, 60, 60, 128) | 0 |
Encoder | Dropout | dropout_2 | (None, 60, 60, 128) | 0 |
Encoder | Conv2D | conv2d_4 | (None, 60, 60, 256) | 295168 |
Encoder | Conv2D | conv2d_5 | (None, 60, 60, 256) | 590080 |
Transformer | Dropout | dropout_3 | (None, 30, 30, 256) | 0 |
Transformer | MultiHeadAttention | multi_head_attention | (None, 30, 30, 256) | 263168 |
Transformer | Dropout | dropout_4 | (None, 30, 30, 256) | 0 |
Decoder | Conv2DTranspose | conv2d_transpose | (None, 60, 60, 256) | 262400 |
Decoder | Concatenate | concatenate | (None, 60, 60, 512) | 0 |
Decoder | Conv2D | conv2d_6 | (None, 60, 60, 256) | 1179904 |
Decoder | Conv2D | conv2d_7 | (None, 60, 60, 256) | 590080 |
Decoder | Conv2DTranspose | conv2d_transpose_1 | (None, 120, 120, 128) | 131200 |
Decoder | Concatenate | concatenate_1 | (None, 120, 120, 256) | 0 |
Decoder | Conv2D | conv2d_8 | (None, 120, 120, 128) | 295040 |
Decoder | Conv2D | conv2d_9 | (None, 120, 120, 128) | 147584 |
Decoder | Conv2DTranspose | conv2d_transpose_2 | (None, 240, 240, 64) | 32832 |
Decoder | Concatenate | concatenate_2 | (None, 240, 240, 128) | 0 |
Decoder | Conv2D | conv2d_10 | (None, 240, 240, 64) | 73792 |
Decoder | Conv2D | conv2d 11 | (None, 240, 240, 64) | 36928 |
Output | Conv2D | conv2d_12 | (None, 240, 240, 4) | 260 |
Output | Activation | activation | (None, 240, 240, 4) | 0 |
Proposed methods segmentation performance evaluation
Traditional methods comparison with proposed methods
Method | MSD | BRATS |
---|---|---|
UNET | 45.21 | 90.19 |
Dense UNET | 46.27 | 91.46 |
Att -UNET | 50.08 | 85.24 |
UNET + + | 49.0 | 89.06 |
UNET3 + | 44.9 | 89.30 |
Trans-UNET | 47.22 | 92.34 |
TransU2-UNET | 47.45 | 93.85 |
UNET + CLAHE | 97.12 | 98.36 |
UNET + MBOBHE | 97.42 | 99.50 |
UNET + MPHE | 97.63 | 77.72 |
FE1-UT | 97.96 | 99.49 |
FE2-UT | 97.8 | 91.73 |
FE3-UT | 97.64 | 63.16 |
Latest methods comparison with proposed models
Algorithm | Kappa | DSC | IoU | Accuracy | Balanced Accuracy |
---|---|---|---|---|---|
FE1-UT | 0.669 | 0.670 | 0.571 | 0.806 | 0.799 |
FE2-UT | 0.677 | 0.678 | 0.578 | 0.816 | 0.809 |
FE3-UT | 0.702 | 0.703 | 0.6 | 0.846 | 0.838 |
Study [45] | 0.6021 | 0.603 | 0.5139 | 0.7254 | 0.7191 |
Study [48] | 0.6093 | 0.6102 | 0.5202 | 0.7344 | 0.7281 |
Study [49] | 0.6318 | 0.6327 | 0.54 | 0.7614 | 0.7542 |
Algorithm | Kappa | DSC | IoU | Accuracy | Balanced Accuracy |
---|---|---|---|---|---|
FE1-UT | 0.544 | 0.549 | 0.413 | 0.799 | 0.651 |
FE2-UT | 0.551 | 0.556 | 0.418 | 0.809 | 0.659 |
FE3-UT | 0.571 | 0.577 | 0.433 | 0.838 | 0.683 |
Study [45] | 0.4896 | 0.4941 | 0.3717 | 0.7191 | 0.5859 |
Study [48] | 0.4959 | 0.5004 | 0.3762 | 0.7281 | 0.5931 |
Study [49] | 0.5139 | 0.5193 | 0.3897 | 0.7542 | 0.6147 |
Ablation experiments
Model | Filter | Balanced Accuracy | F1 Score | Cohen's Kappa | Precision | Recall | Jaccard Index | ROC AUC |
---|---|---|---|---|---|---|---|---|
UNET | CLAHE | 0.9890 | 0.9869 | 0.9826 | 0.9872 | 0.9867 | 0.9742 | 0.9912 |
FE1-UT | 0.9966 | 0.9964 | 0.9952 | 0.9966 | 0.9962 | 0.9928 | 0.9975 | |
UNET | MBOBHE | 0.9967 | 0.9956 | 0.9942 | 0.9956 | 0.9956 | 0.9913 | 0.9971 |
FE2-UT | 0.9449 | 0.9242 | 0.9006 | 0.9738 | 0.8794 | 0.8591 | 0.9358 | |
UNET | MPHE | 0.5544 | 0.2545 | 0.1873 | 1.0 | 0.1458 | 0.1458 | 0.5729 |
FE3-UT | 0.5544 | 0.2545 | 0.2038 | 1.0 | 0.1458 | 0.1458 | 0.5729 |
Model | Filter | Balanced Accuracy | F1 Score | Cohen's Kappa | Precision | Recall | Jaccard Index | ROC AUC |
---|---|---|---|---|---|---|---|---|
UNET | CLAHE | 0.9808 | 0.9735 | 0.9646 | 0.9735 | 0.9735 | 0.9483 | 0.9823 |
FE1-UT | 0.9864 | 0.9819 | 0.9758 | 0.9819 | 0.9818 | 0.9644 | 0.9879 | |
UNET | MBOBHE | 0.9828 | 0.977 | 0.9693 | 0.977 | 0.977 | 0.955 | 0.9846 |
FE2-UT | 0.9853 | 0.9798 | 0.9731 | 0.9798 | 0.9798 | 0.9604 | 0.986 | |
UNET | MPHE | 0.9842 | 0.978 | 0.9706 | 0.978 | 0.978 | 0.9569 | 0.9853 |
FE3-UT | 0.9842 | 0.9785 | 0.9713 | 0.9785 | 0.9785 | 0.9578 | 0.9856 |
Discussion
Practical applications
-
Brain Tumor Detection and Segmentation: The primary focus of the study is enhancing the precision of brain tumor MRI image segmentation. This technology can be deployed in clinical settings to assist radiologists and oncologists in accurately delineating tumor boundaries, which is crucial for treatment planning and monitoring disease progression [50‐55].
-
Tumor Volume Assessment: Accurate segmentation of tumors allows for precise measurement of tumor volumes over time. This is essential for tracking treatment response, assessing disease progression, and adjusting treatment strategies accordingly.
-
Radiotherapy Planning: Medical image segmentation plays a vital role in radiotherapy planning. The technology can help radiation oncologists identify tumor regions and healthy tissues, enabling them to create treatment plans that deliver radiation therapy precisely to the affected area while sparing surrounding healthy tissue.
-
Image-Guided Surgery: Surgeons can benefit from accurate image segmentation during brain tumor surgeries. It helps in identifying tumor boundaries and guiding the surgical procedure to maximize tumor removal while minimizing damage to healthy brain tissue.
-
Automated Diagnosis: The technology can be integrated into diagnostic systems to assist healthcare providers in making accurate and timely diagnoses. This can be especially valuable in situations where timely intervention is critical, such as stroke diagnosis.
Limitations
-
Data Dependency: Deep learning models, including UNETs and Transformers, typically require substantial amounts of labeled data for training. In the medical field, obtaining large and diverse datasets can be challenging, especially for rare conditions or specific patient demographics. Limited data may hinder the model's generalizability and performance in diverse cases.
-
Computationally Intensive: Training deep learning models, particularly those with extensive layers and parameters, can be computationally intensive. This may necessitate powerful hardware and longer training times, making it less accessible for smaller healthcare facilities with limited resources.
-
Model Interpretability: Deep learning models, such as UNETs and Transformers, are often considered as "black boxes." It can be challenging to interpret the decision-making process of these models, which can be a critical concern in medical applications where transparency and interpretability are essential.
-
Overfitting: Deep learning models are susceptible to overfitting, especially when dealing with small datasets. Overfit models may perform exceedingly well on the training data but generalize poorly to new, unseen cases. Regularization techniques and data augmentation are employed to mitigate this issue, but it remains a concern.
-
Imaging Variability: Medical images can exhibit substantial variability due to differences in acquisition equipment, protocols, and conditions. The model's ability to handle such variability may be limited, potentially leading to decreased accuracy in real-world clinical settings.
-
Clinical Validation: Although the model demonstrates high accuracy on publicly available datasets, its performance in a real clinical setting might differ due to variations in image quality, patient population, and clinical practices. Clinical validation and integration into healthcare systems are critical steps that must be addressed.
-
Ethical and Privacy Concerns: The use of deep learning models in healthcare raises ethical and privacy concerns related to patient data security and consent. Proper data handling and adherence to ethical guidelines are essential when implementing such systems.
-
Algorithm Bias: If the training data is not representative of the entire population, the model may exhibit bias, potentially leading to disparities in diagnosis and treatment recommendations.
-
Deployment Challenges: Integrating deep learning models into clinical workflows and ensuring their seamless operation can be challenging. Healthcare institutions may require significant infrastructure and expertise for deployment and maintenance.