Skip to main content
Top
Published in:

Open Access 29-08-2024 | Original Article

Enhancing ocular diseases recognition with domain adaptive framework: leveraging domain confusion

Author: Zayn Wang

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2025

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article focuses on the critical need for cost-effective eye disease prevention and diagnosis, emphasizing the global burden of vision impairment. It introduces a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD), to address dataset bias in fundus image recognition. The framework uses MMD to minimize domain gaps, enhancing the accuracy of ocular disease recognition. Experiments on the OIA-ODIR dataset demonstrate the effectiveness of RMMD, achieving significant improvements in F1 and AUC scores. The article also discusses the limitations and future directions for further research, highlighting the potential of the proposed framework to improve ocular disease recognition.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Keeping healthy eyes is a critical part that cannot be ignored in our lives. Recent research [24] has shown that eye health has a strong relationship with our mental health and the quality of our lives after doing several surveys. There are about 596 million people who have been suffering from distance vision impairment, and 43 million of them are already blind. There is an estimation that by 2050, nearly 895 million people will face the difficulties of impairment and blindness. To be worse, 90% of these patients’ families live in low-income and middle-income countries (LMICs) [3]. Accordingly, there is an urgent need for cost-effective ways of eye disease prevention and diagnosis in ophthalmology. Thanks to early fundus screening, it could be one of the not only economic but also effective ways to diagnose eye diseases and take action earlier [18]. At the same time, Computer-aided diagnosis (CAD), which could help to deal with medical imaging and diagnostic radiology, is responsible for its efficiency further [8]. If there is no such research and techniques to help with eye disease diagnosis, it would largely increase the cost and slow the progress of diagnosing, which might cause more damage to patients.
Recently, the Convolutional Neural Network (CNN) that is applied in medical image recognition is developing quickly. Li et al. [19] firstly have classified medical images, lung images, through CNN and achieve higher accuracy than Unsupervised Restricted Boltzmann Machine (RBM). Shin et al. [25] have achieved outstanding performance on the mediastinal limbo node via transfer learning from nonmedical images to medical images, after taking three factors, CNN architecture, dataset scale and spatial image context, and when and why pre-trained ImageNet could be helpful to, into consideration. Not only did the researchers make progress in CNN detection and classification for medical images, but also some researchers contributed their efforts in image segmentation, noise filtration, and argumentation. Milletari et al. [22] have fully used trained end-to-end CNN in medical image segmentation to process MRI volumes. Pizurica et al. [23] have proposed a robust wavelet domain method that can filter various and some unknown types of medical image noise. Frid-Adar et al. [9] have applied GAN in synthetic liver lesion generation to argument the liver images, and then gain higher accuracy using after combining these with CNN to classify images.
Unfortunately, despite the rapid path progress of medical image recognition using CNN, domain-shift or dataset bias problems still exist. To be more specific, Dataset bias, which is proposed by Torralba and Efros [27], means that even the same thing in different datasets also might be different due to several reasons like different devices that took those images, different regions with different environments, or different compression techniques while transfer files. In other words, since different detection equipment is used in different institutions, and people in different regions have different races and genes, even the same disease datasets detected in different places might have overall different characteristics. The bias might reduce the accuracy of a pre-trained or fine-tuning model. Thus, domain adaptation is taken into consideration and there is some progress in other nonmedical fields. Hsu et al. [17] have introduced an intermediate domain and developed a weighted loss for the domain, achieving a higher accuracy using CNN. Tzeng et al. [29] have first proposed the concept of Maximum Mean Discrepancy (MMD), which could be one of the efficient methods to reduce domain bias, using an adaptation layer along with a domain confusion loss.
This paper proposes a domain adaptive framework, ResNet-50 with Maximum Mean Discrepancy (RMMD), which would be applied in early fundus screening image recognition. Based on ResNet-50, which is an efficient CNN using deep residual learning [15], we insert a new adaptation module to calculate the loss concerning classifiers and Maximum Mean Discrepancy (MMD) which reveals the difference between the source domain or dataset and the target domain, then we minimize the distance between domains as well as training strong classifiers. Apart from the above, we adopt several enhancements to optimize the network, including data transform, DropBlock regulation [12], and Focal Loss [20]. After applying the network on Ophthalmic Image Analysis-Ocular Disease Intelligent Recognition (OIA-ODIR), we reach a current state-of-the-art accuracy. Training set to on-site test set we reach 40.5145% (F1) and 81.0622% (AUC, Area Under the Receiver Operating Characteristic Curve); on-site test set to off-site test set we reach 29.3243% (F1) and 76.6677% (AUC); off-site test set to on-site test set we reach 22.9688% (F1) and 67.3581% (AUC).
Our contributions can be summarized as follows: 1. We discover a challenge that the dataset of fundus images is not big enough for a network to train, so our architecture inserts MMD layer to ResNet-50 and solves the domain bias problem while recognizing fundus diseases across different domains, which could help to enlarge the dataset after crossing different domains. 2. Through MMD layer, DropBlock, Focal Loss and some enhancements, we reach a current state-of-the-art accuracy. 3. Concluding from the results using RMMD on OIA-ODIR dataset, our architecture reveals the efficiency of diminishing the domain bias.
The rest of this paper is arranged as follows: Sect. 2 presents related works of RMMD and the datasets of OIA-ODIR. Section 3 shows how our RMMD is constructed. Section 4 reveals the results of RMMD applying in OIA-ODIR. Section 5 concludes the paper and proves the efficiency of RMMD.

2.1 Domain adaptation

As CNN has a fast development with various architectures, researchers are chasing for a more accurate and faster algorithm. However, Torralba and Efros [27] suggested that we are supposed to look back to the original proposal of datasets instead of just looking at the algorithm and then introduce the bias between different datasets by comparing. Dataset bias or domain bias is a kind of difficulty when people want to apply trained networks in one domain into different domains because of the differences between urban and rural, professional photographs and amateur snapshots from the Internet, entire scenes and single objects and so on [27].
In 2014, Tzeng et al. [29] had been actively involved in this problem and proposed MMD. After using MMD in AlexNet, their framework reached higher accuracy than regular AlexNet while applying Networks trained by the Amazon data to the Webcam data in Office31 datasets. Then, in 2015, with the rise of the unsupervised network, Ganin and Lempitsky [10] focused on the same problem in unsupervised CNN and tried to solve the problem using deep domain adaptation. In 2016, considering the costs of collecting data and generalization between different datasets, Bousmalis et al. [1] created an architecture that can extract domain-invariant features, and in 2017, Bousmalis et al. [2] also turned to unsupervised domain adaptation and proposed Generative Adversarial Network (GAN) in Pixel-Level.
After that, figuring out domain bias had been becoming a hot topic, with many researchers contributing to the subject. To make the classifier not only invariant but also discriminative to the difference between different domains, Ganin et al. [11] inserted domain adaptation into the training process of deep learning, creating Domain-Adversarial Neural Networks (DANN). At the same time, it is a big challenge for each algorithm to choose a generator, loss function, or whether to share weights across domains when adapting domains. To test these choices conveniently, Tzeng et al. [30] proposed Adversarial Discriminative Domain Adaptation (ADDA), which is able to be an unsupervised adversarial adaptation.
Zhu et al. [33] also proposed a solution for this problem but with different minds. They used Cycle-Consistent Adversarial Networks in Unpaired Image-to-Image Translation to abstract special features of object characteristics for translating to other images in different domains and then reducing the domain bias. Similarly, Chen et al. [6] adapted the domains by dividing adaptation into two levels, image-level and instance-level, then built domain adaptive Faster R-CNN. Hoffman et al. [16] also constructed Cycle-Consistent Adversarial Domain Adaptation (CyCADA) after realizing image space methods might not be able to combine high-level semantic knowledge. To reduce the domain gaps, Tsai et al. [28] also proposed a CNN adversarial learning method that is based on GAN. Besides focusing on the way through the network itself, Wang [32] also reduces the domain gaps by manually giving the specific direction, differences in individuals, and times.
To be more concrete, Guan and Liu [14] specified the domain adaptation to medical image analysis and made a survey about it. They clarified the concept and the problems brought by domain shifts in medical image datasets. Then, the paper discussed several methods like Supervised, Semi-Supervised, and Unsupervised Shallow or Deep Domain Adaptation.

2.2 Ocular diseases recognition

It is highly efficient and popular to diagnose and analyze illnesses with the help of computers and build medical datasets. Early in 1987, Chan et al. [5] proposed a means to diagnose microcalcifications by analyzing the feature from digital mammograms with computer-aided diagnostic (DAC) [4]. In this way, computers can help people to locate suspicious microcalcifications. Later, in 1990, they applied the methods to radiologists and conducted a receiver-conducting operating (ROC) to improve the accuracy.
With the development of CAD, deep learning applied in medical images, including image segmentation, argumentation, detection, classification and so on, also progressed at a fast speed. To assist the development, several datasets have been constructed for different uses. For disease analysis, there is The Autism Brain Imaging Data Exchange (ABIDE) for autism brain image studying [7], Open Access Series of Imaging Studies (OASIS) creating magnetic resonance imaging (MRI) dataset [21], etc. For image noise reduction, BrainWeb enables researchers to get brain tomographic images with different noise [31]. For image segmentation, Gu et al. [13] used the DRIVE dataset to test Context Encoder Network (CE-Net) to confirm the effect of their architecture.
However, as for ophthalmologists diagnosing, the popular way is through Optical coherence tomography (OCT) image Recognition. About 30 million OCT images are scanned and analyzed each year [26]. However, OCT scanning takes a long time to finish, and the high cost puts the burden on low-income and middle-income families. Thus, we find a more economical way for diagnoses, early fundus screening images. There comes another challenge, most early fundus screening image datasets are too small to apply deep learning. Furthermore, most datasets only focus on one disease. Fortunately, Li et al. [18] have created a dataset, Ophthalmic Image Analysis-Ocular Disease Intelligent Recognition (OIA-ODIR), which contains 5000 pairs of binocular images or 10,000 images in total. And these 10,000 images contain 8 kinds of diseases or situations, helping us to evaluate the architecture we create.

3 Methodology

Our proposed architecture is shown in Fig. 1. To bridge and reduce the domain gap, a two-branch framework, ResNet-50, is proposed with MMD loss. First, two fundus images, one from the source domain with a label and another from the target domain whose label is not needed because it is just used for preventing the network from learning specific features source domain only or confusing the network from in the source domain-specific features, are transformed then input into two respective ResNet-50 which are marked with dark orange and light orange. Then, after operation in layer3 of ResNet-50, two outputs are input into MMD calculation module (MCM) and then output the MMD. At the same time, the layer3 output that comes from the image with the label continues to finish all ResNet-50 processes and output an origin loss with its label. Finally, we calculate the final loss by summing the origin loss and MMD, which is multiplied with \(\lambda \) together, then backward the final loss so that RMMD can be trained.
Fig. 1
An overview of ResNet-50 with maximum mean discrepancy (RMMD). First, the architecture is mainly constructed by a basic network, ResNet-50 that is marked with orange, which has two branches marked with dark orange and light orange. Then, we insert the Maximum Mean Discrepancy (MMD) calculation module (MCM) into the network, which is marked with green. Last, we combine loss with MMD together and formed the final loss to backward
Full size image

3.1 Backbone network

Our basis or backbone network is ResNet-50, marked with orange in Fig. 1, which will be trained from scratch. The main target of ResNet is to make a deeper neural network that could be trained in an easier way [15]. To deal with this challenge, ResNet introduced residual learning to simplify complex functions into simpler functions so that it could be learned from the network. ResNet-50 is a specific ResNet containing 50-layer in 4 main layers with 3, 4, 6, 3 bottlenecks concerning residual learning respectively. This kind of structure is popular for testing the effect of a specific module, and would not delay the treatment of patients [18].
We make several modifications to ResNet-50 to fit our main target better. We turn the final fully connected layer from 1000 classes output into eight classes output. Besides, instead of using CrossEntropy (CE) with softmax in the binary classification task, Loss we calculate the loss using BinaryCrossEntropy (BCE) Loss with sigmoid for the reason that this is a multi-label binary classification, but CE Loss would make the sum of every probability equal 100% which unable to meet the requirements of multi-label task as the formula shown below,
$$\begin{aligned} \hat{s_i} = \frac{e^{-\hat{x_i}}}{\sum _{j \in N}{e^{-\hat{x_j}}}}, L_{CE} = -\sum _{i \in N}{s_i \cdot ln \hat{s_i}} \end{aligned}$$
(1)
where \(\hat{x}\) denotes each output of the images from the network, \(\hat{s}\) denotes each predicted value calculated by softmax and s denotes each actual value or label, N denotes all the data, \(L_{CE}\) denotes CE Loss.
While BCE Loss enables every label to be calculated separately as the formula shown below as well.
$$\begin{aligned} \hat{s_i} = \frac{1}{1 + e^{-\hat{x_i}}}, L_{BCE} = -\sum _{i \in N} {(s_i \cdot ln \hat{s_i} + (1 - s_i) \cdot ln(1 - \hat{s_i}))} \end{aligned}$$
(2)
where \(\hat{x}\) denotes each output of the images from the network, \(\hat{s}\) denotes each predicted value calculated by sigmoid and s denotes each actual value or label, N denotes all the data, \(L_{BCE}\) denotes BCE Loss.
At the same time, considering that the frequency of every ocular disease is extremely unbalanced so that the fundus images of every label are not balanced as well [3], we also try to use Focal Loss [20] and reach a higher accuracy. The Focal Loss is focused on reducing the problems brought by unbalanced datasets by modifying BCE Loss and the formula is shown below.
$$\begin{aligned} L_{Focal} = -\sum _{i \in N} {(s_i \cdot (1 - \hat{s_i})^{\gamma } \cdot ln \hat{s_i} + (1 - s_i) \cdot \hat{s_i}^\gamma \cdot ln(1 - \hat{s_i}))} \end{aligned}$$
(3)
where \(\hat{s}\) is the same as we calculated in BCE Loss, N denotes all the data, \(L_{Focal}\) denotes Focal Loss, \(\gamma \) denotes the tunable focusing parameter concerning how unbalanced the datasets are, when \(\gamma = 0\), \(L_{Focal} = L_{BCE}\) and there is no effect of reducing the problems brought by unbalanced datasets.

3.2 Domain adaptation module

In order to achieve domain adaptation, based on modified ResNet-50, we design a two-branch architecture with weight-sharing for MCM. As for the two branches, one, the source branch, is for the source domain and another, the target branch, is for the target domain, which is constructed similarly by modified ResNet-50 with weight sharing between the branches.
For the source branch, it will fully connect to MCM between layer3 and layer4 and go through the whole process of modified ResNet-50 then calculate the BCE Loss or Focal Loss at the same time, while for the target branch, it will also fully connect to MCM between layer3 and layer4 but then finish the process without going through ResNet-50.
As for MCM, the shape of input from two-branch layer3 are (14, 14, 1024). To fit the MMD calculation shape, they will first input into a fully connected layer with only (256) outputs. Then, we would get two outputs from the source domain and target domain. Then we calculate the MMD with the following formula:
$$\begin{aligned} L_{MMD} = \left( \frac{\sum _{i \in D_s}{x_i}}{|D_{s}|} - \frac{\sum _{j \in D_t}{x_j}}{|D_{t}|}\right) ^2 \end{aligned}$$
(4)
where \(D_s\) and \(D_t\) denote the dataset from source domain and target domain, \(x_i\) and \(x_j\) denote each output from \(D_s\) and \(D_t\) data after process by fully connected layer in MCM, \(L_{MMD}\) denotes MMD or the output of MCM.

3.3 Final loss calculation

After the calculation of BCE Loss or Focal Loss and MMD, it comes to the final loss, whose formula is shown as follows:
$$\begin{aligned} L_{Final} = L_{BCE/Focal} + \lambda \cdot L_{MMD} \end{aligned}$$
(5)
where \(L_{BCE/Focal}\) and \(L_{MMD}\) denotes BCE Loss or Focal Loss and MMD that we have calculated above, \(L_{Final}\) denotes the final loss that we want to backward, and train the network, \(\lambda \) denotes the tunable parameter concerning how large the dataset bias is or how strong we would like to reduce the dataset bias, when \(\lambda = 0\), \(L_{Final} = L_{BCE/Focal}\) and there is no effect of reducing the dataset bias or domain adaptation.
As for the total process, two images, one from the source dataset with a label and another from the target dataset without a label are transformed and then input into two weight-sharing ResNet-50 respectively. After the process finishes at layer3, two outputs are shaped (14, 14, 1024) input into MCM and then calculate the MMD after the fully connected layer. At the same time, the output from the source dataset of layer3 is input into layer4 of ResNet-50 and finish the whole process of ResNet-50 then calculate the BCE or Focal Loss with the label. After that, we calculate the final loss by adding MMD multiplied with \(\lambda \) to BCE or Focal Loss. Finally, we push the loss backward so that the network can be trained.

4 Experiment

4.1 Experiment dataset

The dataset of the fundus images we use is OIA-ODIR [18]. Li et al. collected 10,000 fundus images with eight types of annotations after cleaning and filtering from their private clinical fundus databases, which contain 1.6 million images total from 487 clinical hospitals in 26 provinces across China.
The 10,000 fundus images are from the left and right eyes of 5000 patients. The eight types of annotations are normal (N), diabetic retinopathy (D), glaucoma G, cataract (C), age-related macular degeneration (A), hypertension (H), myopia (M), and other diseases (O).
The annotating of the labels was done by professional annotation staff and the arbitration team. Every batch of images would be annotated by three members, respectively. If there is a different result, two or more experts would be involved in this process to get the final results.
After annotating, they split the datasets into three datasets, the training set (TRS), the off-site test set (OFS) and the on-site test set (ONS), which contains 3500, 500 and 1000 pairs of images, respectively. Finally, the annotation work was done, and the conclusion is shown in the Table 1.
Table 1
OIA-ODIR datasets
Label
N
D
G
C
A
H
M
O
Training case
1138
1130
215
212
164
103
174
982
Off-site testing cases
162
163
32
31
25
16
23
136
On-site testing cases
324
327
58
65
49
30
46
275
All Cases
1624
1620
305
308
238
149
243
1393
The classification of the final labels results based on pairs of images. The labels of the training set (TRS), the off-site test set (OFS) and the on-site test set (ONS) are shown respectively
However, their annotations are based on patients, which means every annotation includes one patient’s left and right eyes. To enlarge the size of the dataset and focus on one single eye instead of considering left and right eyes at the same time, we modify the annotations based on every image and create a final dataset, which contains 7000, 1000, and 2000 images, respectively. Our works and new annotations are shown in the Table 2.
Table 2
Ours datasets
Label
N
D
G
C
A
H
M
O
Training case
3098
1801
326
313
280
193
261
2197
Off-site testing cases
430
256
45
47
44
30
39
168
On-site testing cases
819
492
83
95
79
54
57
422
All Cases
4347
2549
454
455
403
277
357
2787

4.2 Experimental detail

We run our training and evaluating process with Python whose version is 3.9.16 and Pytorch with 1.13.1+cu116 version. The device we use to make this experiment is the Tesla T4 GPU which has 70W maximum power with 15,360 MiB memory. NVIDIA-SMI is 525.85.12, the driver version is 525.85.12, CUDA version is 12.0.
We have divided the OIA-ODIR into three datasets, 7000 images TRS, 1000 images OFS, and 2000 images ONS. Then we evaluate our architecture from TNS to OFS, ONS to OFS, and OFS to ONS with 10-folds evaluation.
The \(\lambda \) for the final loss calculation we adopt is from 1e−5 to 1e−8, the specific experiment of different \(\lambda \) is shown in the following section. The epochs of the training process are 40 to 60 epochs. The \(\gamma \) in Focal Loss is 2 according to the origin paper [20]. We use 1e−4 for the learning rate. As for the final accuracy calculation, we use F1 and AUC. Because of the effect from DropBlock and the different sizes of each dataset, we use 0.1 for ResNet-50 and RMMD, and 0.02 for RMMD with DropBlock in TRS to OFS, and ONS to OFS; 0.5 for ResNet-50 and RMMD, and 0.02 for RMMD with DropBlock in OFS to ONS in F1 calculation. All the threshold of F1 in the whole experiment is 0.2.

4.3 Experimental result

4.3.1 Parameters verification

After carefully testing, we get the data as the Table 3. They also have been drawn in the figures as shown in Fig. 2 and 3. As the data and figure shown above, when \(\lambda \) is less than 1e−6, AUC would not improve compared to the network without MCM. When \(\lambda \) reaches 1e−6, the accuracy suddenly reaches a peak that is greater than the network without MCM. And after 1e−6, \(\lambda \) continues to drop. At the same time, the reduction of Loss is not obvious after 60 epochs. Finally, we adopt the parameters that epochs we would run equals 60 and the \(\lambda \) equals 1e−6. More specific test settings are available in https://​github.​com/​zayn7lie/​DSCI-Applications/​tree/​main/​OIA-ODIR/​RMMD.
Table 3
AUC with different epoch and \(\lambda \)
AUC / Loss (%)
\(\lambda \)=0
\(\lambda \)=1e-8
\(\lambda \)=1e-7
\(\lambda \)=1e-6
\(\lambda \)=1e-5
20 epochs
50.55 / 16.87
52.73 / 14.63
56.38 / 19.34
54.21 / 20.37
50.58 / 16.35
40 epochs
55.90 / 0.389
52.50 / 0.188
54.26 / 0.281
57.40 / 0.314
52.79 / 6.434
60 epochs
54.64 / 0.071
52.27 / 0.068
54.27 / 0.073
56.73 / 0.083
52.81 / 0.055
80 epochs
54.36 / 0.053
51.73 / 0.042
54.35 / 0.045
56.12 / 0.065
52.41 / 0.031
Fig. 2
AUC with different epoch and \(\lambda \)
Full size image
Fig. 3
Loss with different epoch and \(\lambda \)
Full size image

4.3.2 Cross-domain performance

Before the final accuracy evaluation, we test several enhancements on OFS to ONS and reach a higher accuracy as the Table 4 and the Fig. 4 are shown. When we change from ResNet-50 to RMMD, AUC rises by 1.9332%. Then, when we only add FocalLoss to RMMD, AUC rises 1.5319%; if we only add Transform, AUC rises 0.4292%; if we only add DropBlock, AUC rises 2.3693%. If we add all enhancements on RMMD to form our architecture, AUC rises 4.7133%. Compared to the basis ResNet-50, ours improves by 6.6465%.
Table 4
AUC and F1 with different enhancement
Enhancement
ResNet-50
RMMD
+FocalLoss
+Transform
+DropBlock
Ours
AUC (%)
60.7116
62.6448
64.1767
63.0740
65.0141
67.3581
F1 (%)
16.4640
20.4792
21.4536
20.9690
20.9052
22.9688
Fig. 4
AUC and F1 with different enhancement
Full size image
In the final accuracy evaluation, we adopt AUC and F1 scores. The results are shown in the Table 5 and drawn in the Figs. 5 and 6. As the accuracy shows, RMMD could really improve the network’s effect when datasets encounter domain bias. When we insert MCM into ResNet-50 to become RMMD, for TRS to OFS we improve 3.4987% (AUC) and 5.4394(F1), then if we add the enhancements, they rise to 7.1796% (AUC) and 9.5254% (F1); for ONS to OFS we improve 5.4223% (AUC) and 8.1993% (F1), then if we add the enhancements, they rise to 5.7734% (AUC) and 10.0471% (F1); for OFS to ONS we improve 1.9332% (AUC) and 4.0152% (F1), then if we add the enhancements, they rise to 6.6465% (AUC) and 6.5048% (F1).
Table 5
AUC / F1 with different dataset and architecture
AUC / F1 (%)
TRS to OFS
ONS to OFS
OFS to ONS
ResNet-50
73.88 / 30.99
70.89 / 19.28
60.71 / 16.46
RMMD
77.38 / 36.43
76.32 / 27.48
62.64 / 20.48
Ours
81.06 / 40.51
76.67 / 29.32
67.36 / 22.97
Fig. 5
AUC with different dataset and architecture
Full size image
Fig. 6
F1 with different dataset and architecture
Full size image
As the figure shows, RMMD is able to improve the accuracy from ResNet-50 overall, and with the help of several methods, RMMD could have further improvement. At the same time, with the increasing dataset size, the overall accuracy of ResNet-50, RMMD, and total methods improved as well. Thus, the insertion of MCM can adjust different data between different domains and then enlarge the datasets from different domains for fundus recognition.

5 Conclusion

Nowadays, eye diseases are becoming more and more severe because of the high cost and long time to diagnose. In this paper, we aim to focus on a cost-efficient way, early fundus screening for diagnosing and managing to reduce domain bias when people adopt CNN in fundus images. We reach state-of-the-art accuracy when we evaluate our methods in OIA-ODIR datasets.
According to our architecture, we can figure out the problems we might encounter when different datasets have domain bias or the labeled dataset we use is too small to train. With our domain adaptation methods, as the results have shown, the insertion of MCM modules could improve the recognition accuracy into a new stage, inferring which could help to reduce domain gaps and prevent the network from learning some wrong features that might relate to specific domains. With this method, more architectures can generalize to different datasets, enabling their effect to have an improvement testing in different datasets. At the same time, after considering several factors, we also add some enhancements to our architecture. Adding these enhancements further improves the accuracy of the architecture, meaning that our network could help future ocular disease recognition to improve the accuracy.

5.1 Limitations and future works

Although our architecture has proven its effect, some limitations still exist. 1. The size of datasets. Although the datasets we adopt have 10,000 images or data with eight labels, there are some limitations that are still not enough for thorough training and evaluation. 2. Labels annotation. Even though labels have 8 types, most data have just one type of label which means it is a combining of single-label data and multi-label data. This situation is limited in our classification, which combines eight types of labels together. 3. Although MCM could reduce domain gaps in some perspectives, it is not very good at abstracting potential and deep features since it just simply calculates the residual.
In the future, we will try to figure out two challenges that are shown in the above sections. 1. We could cooperate with medical organizations and introduce more off-site datasets for more evaluation. 2. We will make more analysis to overcome the weakness of MCM caused by residual calculation. We will make more analysis on other algorithms like transformer, and attention modules to further improve our network.

Declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Financial interest or Non-financial interest

The authors have no relevant financial or non-financial interests to disclose. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by-nc-nd/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Our product recommendations

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Literature
2.
4.
go back to reference Chan HP, Doi K, Galhotra S et al (1987) Image feature analysis and computer-aided diagnosis in digital radiography. i. automated detection of microcalcifications in mammography: Image feature analysis. i. microcalcification detection. Med Phys 14(4):538–548. https://doi.org/10.1118/1.596065CrossRef Chan HP, Doi K, Galhotra S et al (1987) Image feature analysis and computer-aided diagnosis in digital radiography. i. automated detection of microcalcifications in mammography: Image feature analysis. i. microcalcification detection. Med Phys 14(4):538–548. https://​doi.​org/​10.​1118/​1.​596065CrossRef
18.
Metadata
Title
Enhancing ocular diseases recognition with domain adaptive framework: leveraging domain confusion
Author
Zayn Wang
Publication date
29-08-2024
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 3/2025
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-024-02358-2