Top

Published in:

Open Access 19-07-2023 | Original Article

Retinal disease projection conditioning by biological traits

Authors: Muhammad Hassan, Hao Zhang, Ahmed Ameen Fateh, Shuyue Ma, Wen Liang, Dingqi Shang, Jiaming Deng, Ziheng Zhang, Tsz Kwan Lam, Ming Xu, Qiming Huang, Dongmei Yu, Canyang Zhang, Zhou You, Wei Pang, Chengming Yang, Peiwu Qin

Published in: Complex & Intelligent Systems | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Fundus image captures rear of an eye which has been studied for disease identification, classification, segmentation, generation, and biological traits association using handcrafted, conventional, and deep learning methods. In biological traits estimation, most of the studies have been carried out for the age prediction and gender classification with convincing results. The current study utilizes the cutting-edge deep learning (DL) algorithms to estimate biological traits in terms of age and gender together with associating traits to retinal visuals. For the trait’s association, we embed aging as the label information into the proposed DL model to learn knowledge about the effected regions with aging. Our proposed DL models named FAG-Net and FGC-Net, which correspondingly estimates biological traits (age and gender) and generates fundus images. FAG-Net can generate multiple variants of an input fundus image given a list of ages as conditions. In this study, we analyzed fundus images and their corresponding association in terms of aging and gender. Our proposed models outperform randomly selected state-of-the-art DL models.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

The retina is the organ that enables humans to capture visuals from the real world. It is a window to the whole body that shares physiological, embryological, and anatomical characteristics with main organs, including the brain, the heart, the kidneys, and so on. The retina is a vital source to assess distinct pathological processes and neurological complications associated with risks of mortality. The retina refers to the inner surface of the eyeball opposite to the lens, including the optic disc, optic cup, macula, fovea, and blood vessel [1, 2]. Fundus images are fundus projections captured by a monocular camera on a 2D plane [3]. Fundus images play an important role in monitoring the health status of the human eye and multiple organs [4]. Analyzing fundus images and their corresponding association with biological traits can help prevent eye diseases and early diagnosis. It is an acceptable notion that has been considered as a gateway to examine neurological complications. Retina allows us to visualize both vascular and neural tissues in a non-invasive way. The strong association of retina with physiology and vitality may lead to a deeper association with biological traits, such as age and gender. Biological traits can be determined by genes, environmental factors, or a combination of both, which can either be qualitative (such as gender or skin color) or quantitative (such as age or blood pressure) [5]. Biological traits are relevant to a variety of systemic and ocular diseases in individuals [6], for instance, females are expected to have longer life expectancies compared to those males in similar living environment [7‐10]. With the increasing age, women with reduced estrogen production are predisposed to develop degenerative eye diseases, including cataracts and age-related macular degeneration [11‐13]. In contrast, males are more likely to suffer from pigment dispersion glaucoma [14], open-angle glaucoma [15], and diabetic retinopathy [16]. The study of biological traits association with fundus images is a challenging task in the clinical practices where experts in the field are unaware of the gender discrimination in the fundus images of males and females and the association of aging information. This study utilizes deep learning (DL) algorithms to estimate biological traits and their association to the generated fundus images.

The fundus or retinal images have been studied for classification, disease identification, and analysis using conventional machine learning (ML) to recent DL methods [17, 18]. However, much of the work has been focused on feature engineering, which involves computing explicit features specified by experts. On the contrary, DL has been characterized by multiple computational layers that allow an algorithm to learn the appropriate predictive features based on examples. The DL algorithms are optimized and reformulated with enhanced features and improvements to expand to a wider range of problems [19‐21]. The DL algorithms have been utilized for the classification and detection of different eye diseases, such as diabetic retinopathy and melanoma, with human comparable results. In the conventional ML approaches, the relationship between retinal morphology and systemic health has been extrapolated using multivariable regression. However, such methods show limited ability for large size and complex datasets [22, 23]. Thus, the DL algorithms avoid manual feature engineering, tuning, and made it possible of extracting hidden features which were previously unexplored by the conventional methods. The DL models have shown significant results for previous challenging tasks. The harnessing of DL power innovatively associated the retinal structure and pathophysiology. The DL models can extract independent features unknown to clinicians; however, they may face challenges of explainability and interpretability, which have been attempted to address in the existing work [24]. The DL approaches to fundus image analysis are receiving popularity featuring easy implementation and high efficiency [25]. It has been extrapolated that DL models can capture subtle pixel-level information in terms of luminous and contrast which humans may not differentiate. These findings underscore the promising ability of DL models hidden to humans and can be employed in medical imaging with high efficacy in clinical practices [26].

In clinical studies, experts in the field are unaware of the subjects’ discrimination based on their fundus images which emphasis on the importance of employing DL models. The cause and effect of demographic information in fundus images are not readily apparent to domain experts. On the contrary, DL models may enable data-driven algorithms to discover of novel approaches to disease biomarkers identification and biological traits association. Therefore, the ophthalmoscope has been deeply associated with systemic indices of biological traits (such as aging and gender) and diseases. In previous studies, age has been estimated from distinct clinical images, such as age prediction from brain MRI, facial images, and neuroimaging using machine learning and deep learning [27‐30]. For instance, brain MRI and facial images have been used for age prediction emphasizing on the potential of traits estimation from fundus images [27‐29, 31]. The excellent performance in age prediction implies that fast, safe, cost-effective, and user-friendly deep learning models can be possible in a larger population. In addition to the aging association, fundus images have also been associated with sex by applying logistic regression on several features [32]. These features include papillomacular angle, retinal vessel angles, and retinal artery trajectory. Various studies have shown retinal morphology differences between the sexes, including retinal and choroidal thickness [33, 34]. The study [26] reported fovea as an important region for gender classification. The prediction of gender became possible, which was an inconvenient job for the ophthalmologist who spent the whole career at retina [35]. Thus, results for the age and gender estimation may assist investigating physiological variations on fundus images corresponding to biological traits [17]. The estimation of age and gender classification may not be clinically inevitable, but the study of age progression based on biological traits learning hints the potential application of DL in discovering novel associations between traits and fundus images. The DL models implementation uncovers additional features from fundus images results in better biological traits association [36].

The successful estimation of age and gender prediction convince for studying age progression effects and evaluating aging status via fundus images. In the study of [17], aging effects were investigated while associating cardiovascular risk factor with fundus images. Similarly, large size DL models was used for classification and association of fundus images with physiological traits dependent on patients’ health [17]. The existing algorithms mainly consider the optic disc’s features for gender prediction as having consistent observations with that of Poplin [17]. In Poplin’s work, large deep learning models were used to classify sex and other physiological and behavioral traits that were associated with patient health based on fundus images. Similarly, fundus (retinal) images were closely related to age and gender traits by allowing the definition of ’retinal age gap’, which is a potential biomarker for aging and risk of mortality [37].

The variational effects of age progression can be visualized in distinct ways, including saliency maps or heat maps in fundus images that were difficult to be observed by ophthalmologists. The differential visualization in fundus images can also be used to distinguish male and female subjects. After the successful classification of gender trait from fundus images [38], our proposed model (FAG-Net) emphasizes on optic disc area and learned features while training and learning the association corresponding to aging. The optic disc was also considered the main structure to train our deep learning approaches. Similarly, the second proposed model (FGC-Net) utilizes such knowledge to generate different fundus images given a single fed fundus with a list of ages as label (condition). The detailed of the proposed modalities are illustrated in methodology section.

In the current study, we first trained and successfully evaluated a DL model (FAG-Net) for the trait effects in terms of age and gender estimation. We proposed a second DL model (FGC-Net) to learn aging effects and embed these effects for the generation purpose. The FGC-Net evaluates different age values given a single input fundus image. The corresponding multiple generated versions are subtracted accordingly to demonstrate the learning effects with age progression. The detailed architecture of both models has been illustrated in the methodology section. The rest of the paper is organized as follows: “Introduction” outlines the existing works, “Methodology” demonstrates methods, “Results” illustrates and analyzes results, and “Conclusion and future directions” concludes the study with future directions.

Literature study

In previous studies, age and gender have been estimated from distinct imaging modalities such as age prediction from brain MRI, facial images, and neuroimaging using machine learning and deep learning [27‐30]. Brain MRI and facial images have been used for age prediction, emphasizing the potential of trait estimation from fundus images [27‐29, 31]. In the work by Poplin [17], large deep learning models were used to classify gender and other physiological and behavioral traits that were associated with patient health based on retinal fundus images. There are a number of studies in which the fundus images have been used for age prediction and gender classification using machine learning [26‐29, 31‐34]. Most of them have estimated the age and gender of healthy or unhealthy subjects. However, the current study examines both healthy and unhealthy subjects’ age and gender associations with fundus images. For age and gender prediction, conventional to recent deep learning-based algorithms have been employed [17, 25, 26, 39]. To our knowledge, none of them attempted the age progression effects besides the age prediction and gender classification.

Clinicians are currently unaware of the distinct retinal features varying between males and females, highlighting the importance of deep learning and model explainability. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers. Gender has been classified in the study of [26], in which the area under the receiver operating characteristic curve of the code-free deep learning model was 0.93. The study [40] estimated biological age from dataset collected with MAE = 3.67 and cumulative score = 0.39 for age-related macular degeneratioin (AMD) [41]. The subjects for AMD prevalence are supposed to have an age above 50 as an inclusion criteria, which could not cover subjects of all age ranges. The study [42] developed CNN age and sex prediction models from normal participants with underlying vascular conditions such as hypertension, diabetes mellitus (DM), or any smoking history with convincing results for age prediction (R$^2$ = 0.92), MAE = 3.06 (for normal), MAE = 3.46 years (for hypertension), 3.55 years (for DM), and 2.56 years (for smokers); however, R$^2$ = 0.74 is relatively low for hypertension. The proposed model (FAG-Net) shows higher results in the majority of the evaluation metrics compared to the existing models for both healthy and unhealthy subjects, as tabulated in Tables 2 and 3.

The ML algorithms are widely applied in analyzing biological traits with different imaging modalities such as MRI, facial visuals, footprints, and so on [43]. In the conventional biological traits estimation, the study [44] proposed a trait tissue association mapping with human biological traits and diseases. The study [45] estimated the age of the subjects from MRI modality using PCA [46] for dimension reduction and relevance vector machine [47] with a significant score. The study in [48] applied a new automated machine learning approach in brain MRI to predict age with MAE = 4.612 years. Similarly, Valizadeh et al [49] used neural network [50] and support vector machine [51] to analyze five anatomical features, which resulted in high prediction accuracy. Martina et al [52] estimated brain age in PNC (Philadelphia Neurodevelopmental Cohort; n = 1126, age range $8-22$ years) using a cross-validation [53] framework with the MAE = 2.93 years. Similarly, the study [54] used partial least squares regression [55] to classify gender based on MRI with the accuracy = 97.8%. According to the study of [17], machine learning has been leveraged for many years for a variety of classification tasks, including the automated classification of eye disease. However, much of the work has focused on feature engineering, which involves computing explicit features specified by experts.

The relationship between retinal morphology and systemic health has been extrapolated using multivariable regression like conventional approaches. However, such methods show limited ability for large size and complex datasets [22, 23]. Thus, the advances in automatic algorithm into DL avoids manual feature engineering and made extracting hidden features possible, which were previously unexplored. The DL models have shown significant results for previous challenging tasks. The harnessing of DL power is innovatively associated with the retinal structure and pathophysiology. DL models extract independent features unknown to clinicians; however, face challenges of explainability and interpretability which have been attempted to address by a neuro-symbolic learning study [24]. Deep learning is a family of machine learning characterized by multiple and deep level computations that has been optimized for images. Deep learning has been applied in different domains specifically in diseases diagnosing, such as melanoma and diabetic retinopathy, and achieved comparable accuracy to that of human experts [56]. The model RCMNet composed of ResNet18 with a self-awareness mechanism observed a decent performance of 83.36% accuracy on the CAR-T cell dataset.

Deep learning approaches to automated retinal image analysis are gaining popularity for their relative ease of implementation and high efficacy [25]. It has been reported that DL models capture subtle pixel-level luminance variations, which are likely indifferentiable to humans. Such findings underscore the promising ability of deep neural networks to utilize salient features in medical imaging that may remain hidden to domain experts [26]. Deep learning has shown great strength in medical image analysis. The study [57] developed a hyperdimensional computing-based algorithm [58] to classify gender from resting state and task fMRI from the publicly available Human Connectome Project with accuracy$ = $ 87%. Similarly, Jonsson [30] presented a novel deep learning approach using residual convolutional neural networks [59] with the prediction of brain age from a T1-weighted MRI with MAE = 3.39 and $R^{2} = 0.87$; however, the study lacks generative feature given age as condition to evaluate the desired projection.

Most importantly, ophthalmologists have successfully predicted biological traits, such as age and gender with the significance of 0.97 as area under curve (AUC) score [26]. Yamashita performed logistic regression on several features that were identified to be associated with sex [32]. These features include papillomacular angle, retinal vessel angles and retinal artery trajectory. Various studies have shown retinal morphology differences between the sexes, including retinal and choroidal thickness [33, 34]. In previous studies, age has been estimated from clinical images via machine learning and deep learning [27‐30]. The excellent performance in age prediction implies that fast, safe, cost-effective, and user-friendly deep learning models are possible in larger population. Motivated by recent DL concepts such as convolution neural network and attention mechanism, we employ these characteristics in the proposed model to associate biological traits with retinal visuals. The state-of-the-art (SOTA) models are limited to the learning of trait factors in the fed visuals, whereas the proposed model learns both the aging factor and the generative capability in order to accomplish the desired projection. In contrast to the works of SOTA, our research seeks to demonstrate the continuous effect of aging in addition to age estimation and gender classification. By incorporating both control and healthy group subjects, specialists are able to include age as a condition in the model and retrieve the retinal visuals of a healthy subject. This will not only benefit experts in age estimation similar to SOTA, but it will also assist in examination and diagnosis decisions. The proposed models are elaborated in proceeded section.

Methodology

This section illustrates the proposed deep learning architectures, their parameters, and hyper parameters. Similarly, we also extrapolate the logic behind the specific structure to achieve the intended goals.

Biological traits estimation using FAG-Net architecture

For Age prediction and gender classification, we borrowed the concept of biological traits estimation from ShoeNet model [43]. The ShoeNet model has been used for age estimation and gender classification from pairwise shoeprints. However, the datasets available for fundus images are rarely found in pairwise (left and right eyes’ images). Thus, the model needs special attention to overcome the challenging situation to be utilized for biological traits estimation. Therefore, we propose a model for fundus images based age and gender estimation (FAG-Net) (Fig. 1). The model composed of six blocks, where block1, -2, and -6 contain spatial attention mechanism (SAB) while the rest of the blocks have been exempted from SAB. The first block receives input fundus images with dimensions of 512$\times $512$\times $3 (width$\times $hight$\times $channel). The input three channels fundus image first pass through a stack of convolution neural network with given number of filters (32) and kernel size (3). The SAB block has been augmented to focus on the salient spatial regions.

Attention-mechanism has shown great attention recently due to its significant performance in the literature [60]. In practice, both channel wise (CA) and spatial wise (SA) attentions have been employed with channel first order. However, we only applied SA, which only focuses along the spatial dimension. In SA, average pooling and max pooling are applied in parallel to the input and concatenated correspondingly. A 2D attention map is generated over each pixel for all spatial locations with a large filter-size (i.e., $K = 5$). The convolutional output is then normalized by non-linear sigmoid function. Finally, the normalized and direct connections are merged with element-wise multiplication to produce attention-based output. Both average and max pooling are used in SA to balance the selection of salient features (max pool) and global statistics (average pool). The embedding of the attention mechanism in FAG-Net focuses on regions of interest vulnerable to aging effects. The output from SAB passes through batch-normalization (BN) and rectified linear unit (ReLu) functions. Therefore, each block ends with BN and ReLu functions.

The input of block-2 received from block-1 passes through stack of convolution, SAB, and ends with maxpool layer. Similarly, the output of block-1 also passes as a direct connection to convolution maxpool (CMP) block. The convolution layer in CMP applies to the output (from block-1) with the same number of 64 filters (as that of block-2). However, the 1 $\times $ 1 kernel size has been used to produce the same feature maps (64) and passed to a maxpool operation to bring into the same dimension. Both outputs of block-2 and CMP block concatenate along the third dimension and forward as input to block-3 and a direct connection.

The purpose of CMP block is to retain the spatial features in high dimensional space to deeper-level related to age progression. In the abstract level, the dense structure passes salient features together with those extracted from block sequence. The feature maps increase and the dimensions decrease as the network goes deeper. The accumulated output from all the blocks passes through a normal convolution layer having 8 $\times $ 8 feature maps and 1024 number of filters. The final convolution layer passes the output to fully connected layers where each layer has been dropout with ratio of 9, 8, and 5 output 10 to avoid overfitting. The final output neuron can be singleton for age prediction or two for gender classification. In the case of age prediction, a linear activation function applies to produce a regression value. However, for gender classification, a softmax layer is employed to output weighted output for both male and female.

Objective function for FAG-Net

The objective function used for training FAG-Net composed of three loss terms including $L_1$, $L_2$, and regression specific custom loss function (CLF). The accumulative loss function (ALF) is the mean of all the weighted loss terms, formulated as follows:

$$\begin{aligned} ALF = \psi *L_1+\psi *L_2+\psi *CLF)/3, \end{aligned}$$

(1)

$\psi $ is the corresponding weights to balance the loss terms, including $L_1$ and $L_2$ and which can be formulated as follows:

$$\begin{aligned} \textit{L}_\textit{1}= & {} {\sum _{i = 0}^{n-1} abs|A_{age}^i-P_{age}^i|}, \end{aligned}$$

(2)

$$\begin{aligned} \textit{L}_\textit{2}= & {} {\sum _{i = 0}^{n-1}\{A_{age}^i-P_{age}^i\}^2}, \end{aligned}$$

(3)

where n is the number of samples and $A_{age}$ and $P_{age}$ denote the actual and predicted ages.

Furthermore, age prediction is a regression problem, and a single output will be expected as a result. Thus, a specialized custom loss function based on mean square error (MSE) is proposed to optimize the hyperparameters during training [43]. The optimizer (Adam) fine-tunes the weights of convolution filters to minimize the loss value. To produce regression specific results, CLF penalizes the out-ranged values more. It minimizes the distance between the actual and predicted age in a target-oriented way. The formulation of CLF is illustrated in the following equation:

$$\begin{aligned} \text {CLF} = \frac{\sum _{i = 1}^{n} E_i}{n};~E_i = {\left\{ \begin{array}{ll} d_i*\varphi ,&{} \text {if } d_i\le J\\ d_i^{3}+\varphi ,&{} \text {if } d_i>J \end{array}\right. } \end{aligned}$$

(4)

CLF is the mean of difference (E) for n number of samples, where n = $(total-samples)/(input$-size). $\varphi $ is a small value (0.0001-to$-$0.3) used to prevent the network from attaining zero difference and to sustain the learning process. Similarly, $d_i = ||y-\bar{y}||$ is an absolute error between actual age (y) and predicted age $(\bar{y})$. Furthermore, J is a natural number derived from MCS-J for predictable age ranges. In the second condition ($d_i\le J$), the values higher than the value of J will cause more penalization of the weights based on the computed loss-value in the exponential time (power 3). The penalization influences the optimization of network weights and biases. It will direct the optimizer to tune these parameters in order to minimize the difference between actual and predicted age. The CLF values indicate abrupt changes for $J = 2$ and $J = 3$, which demonstrates a high penalty by following that the MCS-J would be only counted in the given range of J. CLF not only considers the absolute error but also penalizes more the adjacent values to J in MCS-J. By directing for more penalization, the given optimizer fine tunes the learning weights to obtain a persuasive estimation score. Adam is used as an optimizer with the L$_2$ regularizer to tune hyper-parameters.

Evaluation metrics for FAG-Net

Besides MAE and MSE as evaluation metrics for age prediction, we apply cumulative score (CS) and mean cumulative score (MCS) as evaluation metrics to accommodate the nature of the problem. CS and MCS imitate the existing studies, and are used to assess accuracies in a range of age groups. CS (or CS$_j$) and MCS (or MCS-J) give more weight to the smaller ranges of match windows. The ranges depend on the value of j and J, the absolute differences between actual and estimated age scores [43], is formulated as follows.

$$\begin{aligned} \text {MCS-J}= & {} \frac{\sum _{j = 0}^{J} CS_j}{J+1}\nonumber \\ \text {CS}_j= & {} \frac{\sum _{i = 1}^{n} \delta _i}{n}*100 \end{aligned}$$

(5)

where

$$\begin{aligned} \delta _i = {\left\{ \begin{array}{ll} 1,&{} \text {if } \delta _i\le j\\ 0,&{} \text {if } \delta _i>j\\ \end{array}\right. } \end{aligned}$$

CS$_j$ is the percentage mean of $\delta _i$, where $\delta _i$ is the Euclidean-distance $y_i-\bar{y_i}$ between actual ($y_i$) and predicted ($\bar{y_i}$) score, and it will be counted as 1 for $y_i-\bar{y_i}\le j$. The value of $\delta _i$ expressed as zero (0) implies that the distance $y_i-\bar{y_i}$ is greater than the threshold value (j). The MCS score facilitates prediction in various ranges of matching thresholds rather than a single threshold. Thus, the MCS score gives a more comprehensive assessment for the challenging problem of retinal based age prediction to cover all the values of $(y_i-\bar{y_i})\le j$ for the setup threshold (j). This also allows us to give different penalties with varying thresholds in the objective function of the deep learning model.

Fundus images generation given age as condition

After proposing a sophisticated DL model (FAG-Net) for age prediction and gender classification, a novel network model has been introduced to predict futuristic variations in the fundus images. The model generates fundus images given age as condition (FGC-Net) (Fig. 2). The FGC-Net

Encoding

The encoding phase of FGC-Net first receives the input fundus images ($X^i \mathbb {R}^{N\times H\times W\times C}$) regarding biological traits association and learning (Fig. 3). The dimensions $N\times H\times W\times C$, denote the batch size, width, height, and features map (number of channel: 1 for grayscale and 3 for color images), respectively. The encoder automatically extracts lower-dimensional features from the input data and inputs them into the latent space. The $i^{th}$ convolutional layer ($NC_i$) acts as a feature extractor by encoding the salient features from $X_i$. Considering the input structure (e.g., $X^h = H$, $X^w = W$, $X^c = C$, where $X^h$, $X^w$, $X^c$ are the output structure with new height h, width w, and dimension c, respectively), the encoder (e) part contains five encoding blocks-EB (EB-1 to -6) to sufficiently extract low-level features in the spatial dimension (e.g., $X^h = \frac{1}{n}\times H$, $X^w = \frac{1}{n}\times W, X^c = n\times C$, where n is the number of downsampling and deeper levels) followed by the bottleneck layer ($Z\in \mathbb {R}^k$, where k is the spatial dimension of Z). The size of the channels (EB-1 to EB-6) decreases by halves in each subsequent deep layer, where the loss of information can be compensated for by doubling the number of filters (channels).

In the encoding layer, the received image passes through an input-block (IB) which has been designed for the purpose of extracting varieties of features by employing distinct kernel and filter sizes such as 1$\times $1, 3$\times $3, 5$\times $5, and 7$\times $7 after a normal convolution (512$\times $512, 24, 3, corresponding to dimensions, number of filters, and kernel size). The outputs of variant size of filters merge as elementwise sum prior to proceeding for the deeper block. The output of IB forwards to EB1, where EB1 contains strides convolution to avoid the loss of information useful for generation, normal convolution (dimensions, number of features, kernel size, and stride) for feature extraction, followed by BN and ReLu functions. The rest of the blocks (EB2 to EB6) have the same structure till the bottleneck layer. The EB compresses the input spatial wise and extends channel wise. The compressing process at the $l^{th}$ encoding block $EB\text{- }l$, where $l = 6$, is formulated as follows:

$$\begin{aligned} EB\text{- }l = En\Big (\left[ NC[S_t(X^{l-1})];\{op_b, op_r\} \right] ;\phi \Big ), \end{aligned}$$

(6)

where, $S_t$ and Co denote strides (s = 2) and normal convolution in a block ($-l$) over the data sample ($X^{l-1}$) obtained from a previous block ($l-1$). The output from a stride convolution $S_t$ and normal convolution NC forwards to BN ($op_b$) and ReLu ($op_r$) functions. The stack of stride convolution (St) and normal convolution (NC) avoids the loss of useful information. In addition to reducing computational operations [59], St enables the model to learn while downsampling [61] and retain and passes features into subsequent layers heading into latent space which is used by the decoder to generate back with age embedded effects.

Besides the encoding of input fundus image, the corresponding label information has also been embedded into the latent space. After a number of experiments, the label information as condition in the latent space is more effective to effect the generation process. The embedding of age as condition with the output of the encoding layer is carried out in the latent space. The encoder part (En) passes the label (age $L_g$) information as condition ($Ec(L_g, \xi )$), where $\xi $ is the learned parameters by the encoder, into the latent space of VAE.

Bottleneck and conditioning

The bottleneck layer is an unsupervised method of modeling complex and higher dimensional data in deeper level. The encoder part ($En(X^i, \phi )$) compresses the input from higher-dimensional space ($X_m^H,X_m^W$) through network parameters ($\phi $) and generates the probabilistic distribution over the latent space (Z) with a lower possible dimension ($\frac{1}{n}\times X^H,\frac{1}{n}\times X^W$). Similarly, $Ec(L_g, \xi )$ passes $L_g$ through fully connected layers with learning parameters of $\xi $ similar to the fully connected layer of $En(X^i, \phi )$. The decoder part utilizes the embedded and compressed form (latent variables Z) and generates it back to the high-dimensional generated space (Y). Minimizing the gap between X and Y enables the model to learn and tune the parameter values. The latent space enables the model to learn from the mutual distributions of X and $L_g$. The output of EB6 and $En(L_g, \xi )$ are passed to a fully connected layer modeling the complex dimensional structure into a latent representation and then flatten via 64 neurons. From each of the flatten 64-neurons, both mean ($\mu $) and standard deviation ($\sigma $) are computed.

The encoder part ($En\{X_{m};~\phi \}$) generates the posterior over latent space ($z^i$, where i denotes the sample number) and samples from the latent space ($P^i$) which can be used for the decoding (generation) as $De\{En\uplus L_g \oplus S_f;~\vartheta \}$. The latent space is obtained as follows:

$$\begin{aligned} z_i\sim \Re _i \left[ (z_0/x^i)\parallel (z_1/x^i) \right] , \end{aligned}$$

(7)

where $\Re ()$ is the distribution over $z_0$ and $z_1$ given input $x^i$. The sampling $z_i$ from the distribution ${\mathcal {N}}(;)$ can be rewritten for the conditional input as follows:

$$\begin{aligned} z_i\sim {\mathcal {P}}_i(z/X)= & {} {\mathcal {N}}\big (\mu (X; \phi _0), \sigma (X; \phi _0)\big )\nonumber \\{} & {} \parallel {\mathcal {N}}\big (\mu (X; \phi _1), \sigma (X; \phi _1),\nonumber \\ z_i\sim {\mathcal {P}}_i(z/X)= & {} {\mathcal {N}}\big ( \left[ \mu (X; \phi _0)+\mu (X; \phi _1)\right] ;\nonumber \\{} & {} \left[ \sigma (X; \phi _0)*\sigma (X; \phi _1) \right] \big )\nonumber \\ z_i\sim {\mathcal {P}}_i(z/X)= & {} {\mathcal {N}}\big ( \left[ \mu (X; \phi _l)\right] ; \left[ \sigma (X; \phi _m) \right] \big ) \nonumber \\ where ~ \phi _l= & {} \phi _0+\phi _1, ~\phi _m = \phi _0*\phi _1 \end{aligned}$$

(8)

The drawn sample ($z_i$) conditioned with $X_m$ from the distribution (see Eq. 8) maps into the same dimensions as the decoder ($Dec(z_i, \theta )$) for the generative process with the learning network parameters ($\theta $). The latent distribution must be regularized by the Kullback leibler (KL) divergence (see the loss function) to closely approximate the posterior (P(z/x)) and prior (P(z)) distributions. The regularization (i.e., via the Gaussian prior) holds in the latent space between the distributions in terms of $\mu $ and $\sigma $, which further contributes to the latent activations utilized by the decoder to produce new retinal image. The latent distributions are centered ($\mu $) and spread over the area $\sigma $ to project the possible fundus as desired (DSp). Usually, the distance between the learned distribution ${\mathcal {N}}(\mu ,~\sigma ^2)$ and the standard normal distribution ${\mathcal {N}}(0,1)$ can be quantified by the KL divergence. However, instead of Gaussian normal distribution and normal mean ($\mu $) standard deviation $\sigma $, we utilize the sum of $\mu $ and the product of $\sigma $. The detailed formulation is shown in Eqs. 8 and 10. The latent distribution and regularization are expected to have the properties of continuity and completeness. In the case of continuity, the sampling from the latent distribution given X may exist a nearby data point that feeds into the decoder to generate fundus images with a similar structure with additional information, as desired. The decoder must generate target-oriented fundus images in a controlled fashion.

Decoding

FGC-Net generates a random sample ($z_i, for~i = 1, 2,..., n$) conditioned by $L_g$ drawn from the probabilistic distribution $P_i(z_i/X)$ at the decoding side as decoding blocks (DB1 to DB7) and projects to $Y_i$:

$$\begin{aligned} Y_i = Dec\big \{[z_i\odot R_i]\oplus S_f(X); \theta \big \}, \end{aligned}$$

(9)

where, $Y_i$ is the generated fundus images corresponding to $z_i$ with adjustable weights ($\odot R_i$) regularized by the objective function and merged with the contextual skipped features ($\oplus S_f$) using network learning parameters ($\theta $).

In the decoding process, z is computed from the sum of $\mu $ and $\sigma $ multiplied by the standard normal distribution ($\varepsilon $). The values of $\mu $ and $\sigma $ are computed in Eq. 8. The $\varepsilon $ value is computed from the absolute different of normal distribution having mean $\mu (L_g)$ and standard deviation $\sigma (L_g)$ based on the fed scaler age values.

$$\begin{aligned} z\sim {\mathcal {N}}(\mu ,\sigma ^2)\cdot \epsilon \leftarrow \mu +\sigma ^2\cdot {\mathcal {N}}(\mu (L_g), \sigma (L_g)). \end{aligned}$$

(10)

The dimension of z is reshaped and upsampled to match the dimension of the corresponding encoding layer (EB6) and merge $\oplus S_f$ as elementwise sum with the skip layer from EB6. Each block receives input, performs upscale dimensions via transpose convolution followed by BN and ReLu function. Each decoding block-DB composed of strides convolution, BN, and ReLu activation. The output of DB1 concatenates with the skip connection from IB.

Skip layer

The deeper the network, the more chances of losing key features due to the application of downsampling operations and the vanishing gradient problem [62]. Similarly, to avoid the loss of contextual information [63], we adopted skip connections between the encoding ($Enc_k\{X;~\phi \}$) and decoding ($Dec_k\{z\oslash S_f;~ \theta \}$) at particular layers($_K$) to transfer spatial features and global information in terms of the input image structure.

The skip layers integrate the learned features from early levels, avoid degrading shallow stacked networks and overcome gradient information loss by retaining key features during training. These connections also improve the end-to-end mapping of training and achieve an effective role in a deeper network. The sole purpose of adopted skip connections is to facilitate the decoder to maintain the existing input structure while generating on the decoding side together with synthetic information to reflect age progression. The dimensions and merging position with the corresponding layers, both at the bottleneck and decoder layers, are show, in Fig. 3.

After generating z given P(z/X) from the encoder (see Eq. 6), the decoder part merges the data sample information from the latent space, conditioning information ($L_g$), and skip connection at a particular layer ($_k$) is formulated as follows:

$$\begin{aligned} DB_k = Dec\big (NC[S_t(Y^{k+1});\{op_b, op_r\}] \oplus S_f(X);\theta \big ), \nonumber \\ \end{aligned}$$

(11)

where $Y_{k+1}$, $\oplus S_f(X)$, and $\oplus $ denote the previous tensor, skipped features, and merging operation for the elementwise sum, respectively. Additionally, $op_b$ and $op_r$ denote BN and nonlinear activation ReLu operations, respectively. In addition to the completeness and continuity properties of the VAE, the involvement of skip connections borrowed from U-Net controls the generation process.

Discriminator

The discriminative part borrowed from generative adversarial network (GAN) [64] is appended at the end of FGC-Net, which brings sharpness and better quality to the generated images [65]. Adversarial learning plays a min-max game to distinguish the original and fake (generated or synthetic) images. FGC-Net brings the inferencing features to reason at the latent space and generates fundus images as desired [66]. However, instead of training in a min-max fashion, we utilize the discriminative part solely for prediction of a scaler value or regression similar to subjects’ ages. There are six blocks receiving both input and generated fundus images. Each discriminative block (DsB) composed of stride convolution, BN, and ReLu functions. The output of stacked DsB ends with three fully connected layers containing 512, 256, and 128 neurons followed by dropout layers with ratio of 0.8, 0.7, and 0.6, respectively. Finally, the output of fully connected layer passes through linear activation function to a single neuron for age estimation. In the objective function, both outputs as single values (from input fundus and generated fundus images) are formulated as mean square error (MSE) or $L_2$ loss term.

In our case, the generator maps $X_i$, which is the input to the encoder of FGC-Net (Fig. 3), to $Y_i^j$, which is output form the decoder of FGC-Net (Fig. 3). The model generates fundus images for each age value $L_g$ where $j = L_g$. The “discriminator” part discriminates the actual $X_i$ and generated version $Y_i^j$ either as real or fake value. The min-max game of learning in GAN [64] can be formulated as follows:

$$\begin{aligned} V(D,G) = \underset{G}{min}~\underset{D}{max}(D_{XY}, G_X), \end{aligned}$$

(12)

Similarly, the generative ($G_X$) and discriminative ($D_{XY}$) operations can be illustrated in mathematical forms as follows:

$$\begin{aligned} G_{X}= & {} G\left\{ \underbrace{En(X_i;~\phi )\rightarrow Y_i\sim Dec(Z_{i};~\theta )}_{Generative~Unit} \right. \nonumber \\{} & {} \rightarrow \left. \underbrace{Disc(\left[ X_{i}, Y_i\right] ;~\Phi )}_{Discriminative~Unit}\right\} \nonumber \\ G_{X}= & {} G(X_{i},~Y_{i};~\omega )~where~\omega = \{\phi ,\theta , \Phi \} \end{aligned}$$

(13)

The discriminator plays a vital role in the abstract reconstruction error in the circumstances where VAE is infused in the network model. The discriminator part measures the sample similarity [66] at both element and feature levels. In addition, the discriminator is made stronger to distinguish between real and fake images by including $L_2$ loss term.

Objective function for FGC-Net

The objective loss function for FGC-Net is composed of reconstruction loss ($L_2$) (Eq. 3) and KL divergence loss [67].

The probabilistic distribution in VAE as inferencing model ($q_\phi (z/x)$) approximates the posterior (true) distribution ($p_\theta (z/x)$) in terms of KL-divergence to minimize the gap as follows [68]:

$$\begin{aligned} \textit{KL}_{d}(q_\phi (z/x)||p_\theta (z/x))) = \mathbb {E}_{q_\phi }\left[ log \frac{q_\phi (z/x)}{p_\theta (z/x)} \right] , \end{aligned}$$

(14)

In our case, the KL-divergence between the distribution ${\mathcal {N}}(\mu _i, \sigma _i)$ of the inference model with mean $\mu _i$ and variance $\sigma _i$, and the standard normal distribution $\mu (L_g), \sigma (L_g)$ (Eq. 10) with mean $\mu $ and unit variance $\sigma $ can be formulated after the Bayesian inference simplification [69] as follows:

$$\begin{aligned}{} & {} \textit{KL}_{d}({\mathcal {N}}(\mu , \sigma )||{\mathcal {N}}(\mu (L_g), \sigma (L_g)))\nonumber \\{} & {} \quad = \frac{1}{2}\sum _{i = 1}^{l}\big (\sigma _i^2+\mu _i^2-1-exp(\sigma _i^2), \big ) \end{aligned}$$

(15)

Thus, the total loss function for FGC-Net (TLF-FGC) composed of the following terms:

$$\begin{aligned} TLF{-}FGC = (L_1+L_2+KL_d)/3 \end{aligned}$$

(16)

Dataset preparation

To train, evaluate, and test proposed models for biological trait estimation and trait-based futuristic analysis, we used the dataset Ocular Disease Intelligent Recognition (ODIR-5K) [70], PAPILA [71], and a longitudinal population based on 10-year progression collections (10Y-PC) [72]. There are total 12,005 samples in cumulation, where 80% (9604 of 12,005) and 20% (2401 of 12,005) are the respective training and testing splits. All three datasets contain age and gender as label information. The age as a label feeds to FGC-Net during training, which can be used as a condition in the testing phase. The samples missing label information such as age and gender were discarded. The subjects ranged in age from 10 to 80 years old. To propose a generalized model for biological trait estimation, we utilized both cross-sectional and longitudinal populations. Furthermore, to cover the estimation of biological traits, both healthy and unhealthy subjects were included so that the underlying DL model should learn features invariant to abnormalities. Similarly, varieties of cameras and environments have been used to capture different qualities of images, modeling sophisticated DL networks.

Network training

Both FAG-Net and FGC-Net have been trained via Adam for optimizing network parameters. For FAG-Net and FGC-Net, Adam optimizer was used with initial learning rate of 0.001, $\beta _1$ = 0.9, $\beta _2$ = 0.999, where the learning rate was decreased by $\frac{1}{10}$ after every 50 epochs. The batch size was composed of 16 samples according to the available GPU’s memory size. The models run for 500 epochs and dynamically stop when a poor result is observed after each epoch.

Results

Biological traits estimation

The proposed model FAG-Net and the state-of-the-art (SOTA) models have been trained for 5-fold cross-validation (FCV). The evaluation metrics MAE, MSE, MCS-2 and MCS-3 were used for testing purposes. Evaluation metrics MCS can help better assessing the performance of the models for age prediction where age prediction can only be predicted in a range of values rather than classified value. Therefore, MSE metrics may produce larger value for larger difference between actual and predicted values in the case of outliers. Thus, MSE metric for such scenarios may not be a reliable option. The details of five cross validation have been shown in Table 1. Table 2 shows the results of all the underlying modalities corresponding to evaluation metrics.

Table 1

FAG-Net scores for five cross validation $CS_0$, $CS_1$, $CS_2$, $CS_3$, MAE, MSE, MCS

Network	MAE	MSE	$CS_0$	$CS_1$	$CS_2$	$CS_3$	$CS_4$	$CS_5$	MCS-2	MCS-3	MCS-4
FCV-1	2.269	22.026	32.736	69.263	80.133	84.132	85.756	87.172	60.710	66.566	71.174
FCV-2	2.286	24.734	33.944	71.470	80.258	83.632	86.006	88.172	61.890	67.326	71.062
FCV-3	2.286	25.573	79.921	81.842	83.239	84.941	86.905	88.256	81.667	82.485	83.369
FCV-4	1.401	3.324	18.280	62.354	88.147	95.159	97.663	99.249	56.260	65.984	72.320
FCV-5	0.517	3.961	83.282	93.147	94.718	95.258	96.377	97.76	90.382	91.608	92.562
Average	1.634	15.151	49.705	75.767	85.475	88.814	90.729	92.168	70.315	74.9400	78.098

5-fold cross validation (FCV) together with mean values have been illustrated. The values of MCS-2 and MCS-3 are based on $CS_0$ to $CS_5$ and formulated in Eq. 5

Table 2

Comparative evaluation-scores of FAG-Net and SOTA models in terms of MAE, MSE, MCS, and R$_2$. For the values of CS$_0$, CS$_1$, CS$_2$, and CS$_3$, see formulation in Eq. 5

Network	MAE	MSE	$CS_0$	$CS_1$	$CS_2$	$CS_3$	$CS_4$	$CS_5$	MCS-2	MCS-3	MCS-4	R$_2$
AlexNet [73]	2.788	24.803	16.935	49.498	68.921	79.746	84.722	87.865	45.118	53.775	59.965	0.827
VGG-Net 16 [74]	5.430	65.727	5.805	19.030	33.260	44.172	55.608	63.771	19.365	25.567	31.575	0.543
VGG-Net 19 [74]	104.830	2142.82	0.261	0.829	1.309	1.920	2.880	3.710	0.800	1.080	1.440	$-$147.795
ShoeNet [43]	4.754	54.343	8.031	23.439	38.716	51.942	62.199	71.322	23.395	30.532	36.866	0.622
Pixel RNN [75]	34.080	1878.363	1.789	5.718	10.999	40.840	17.634	20.470	6.169	8.336	10.196	$-$12.407
Highway Net [76]	8.921	142.779	2.790	8.954	18.783	25.0321	34.860	41.857	10.176	13.890	18.084	0.008
Residual Net [59]	8.558	130.949	3.415	11.578	18.700	26.197	34.443	41.607	11.231	14.972	18.867	0.090
Google Inception V4 [77]	2.740	29.056	21.169	56.612	74.508	82.191	85.202	87.254	50.763	58.620	63.937	0.798
ResNeXt [78]	7.233	97.577	5.761	15.539	24.268	33.566	42.034	49.367	15.189	19.783	24.233	0.322
FAG-Net	1.634	15.151	49.705	75.767	85.475	88.814	90.729	92.168	70.315	74.9400	78.098	0.889

Table 3

Comparative evaluation-scores of FAG-Net and SOTA models in terms of MAE, MSE, MCS

Network	TP	FP	TN	FN	Specificity	Sensitivity	PPV	NPV	F$_1$	Accuracy%
AlexNet [73]	662	552	778	408	0.584	0.618	0.545	0.655	0.603	60.32
VGG-Net 16 [74]	1071	144	1017	169	0.8759	0.8637	0.8814	0.8575	0.8698	86.96
VGG-Net 19 [74]	1067	148	1024	162	0.8737	0.8681	0.8781	0.8634	0.8709	87.08
ShoeNet [43]	1063	152	1009	177	0.8890	0.8572	0.8748	0.8507	0.8631	86.29
Pixel RNN [75]	1060	155	996	190	0.8653	0.8480	0.903	0.8724	0.8297	85.63
Highway Net [76]	673	541	703	483	0.565	0.582	0.554	0.592	0.573	57.33
Residual Net [59]	1063	152	963	223	0.86	0.82	0.870	0.811	0.845	84.38
FAG-Net	1141	74	1065	121	0.935	0.904	0.939	0.897	0.919	91.87

Positive Predictive Value (PPV), Negative Predictive Value (NPV)

Gender classification

In biological traits estimation, we also trained the proposed (FAG-Net) and few SOTA models. All the classification results are shown in Table 3. After the successful classification of gender traits from fundus images, the proposed model (FGC-Net) emphasized more on optic disc area and learned features while training for aging association. In the study for gender classification [38], optic disc was also considered the main structure by the deep learning approaches.

To evaluate the performance of our proposed model for gender classification, we randomly chose few SOTA models and trained and tested them on the same dataset and parameters (Table 3). We used confusion metrics to evaluate the results. The rest of the metrics include true positive (TP), false positive (FP), true negative (TN), false negative (FL), specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), F$_1$ score, and accuracy. The derivation of these metrics is illustrated in the following equations.

$$\begin{aligned}{} & {} Sensitivity = \frac{TP}{TP+FN},\nonumber \\{} & {} Specificity = \frac{TN}{TN+FP},\nonumber \\{} & {} F_1 = 2\times \frac{Specificity\times Sensitivity}{Specificity+Sensitivity},\nonumber \\{} & {} PPV = \frac{TP}{TP+FP},\nonumber \\{} & {} NPV = \frac{TN}{TN+FN},\nonumber \\{} & {} Accuracy = \frac{TP+FN}{TP+TN+FP+FN}. \end{aligned}$$

(17)

$$\begin{aligned}{} & {} R_2 = 1 - \frac{RSS}{TSS},\nonumber \\{} & {} SSR = \sum _{i = 1}^{j}(X_i-Y_i)^2,\nonumber \\{} & {} TSS = \sum _{i}^{j}\big ( (Avg(X_i^j)-Y_i)^2 \big )^2 \end{aligned}$$

(18)

where sum of square of residuals (SSR), total sum of square (TSS).

From the accumulated results, our proposed model FAG-Net outperforms the competitive SOTA models. VGG-Net-16 has received the second highest score in terms of accuracy. The convincing results of our proposed model encourage us to proceed with the age prediction from fundus images and learning the corresponding effect.

Age progression effects in fundus images

In this study, FAG-Net has been utilized to estimate biological traits from fundus images. After the successful estimation of age (accuracy 91.878%) and gender (MAE 1.634), we proposed FGC-Net, a generative model conditioned by subjects’ age. To extrapolate the effects of the fed condition, we proposed different versions of FGC-Net in order to verify the changes made on fundus images with age progression. After training FGC-Net (Fig. 3) together with all versions, we randomly chose samples and fed them to the models to retrieve fundus images in different age stages (Fig. 4). There are total of seven versions of FGC-Net together with their output (Fig. 4). The random chosen sample is fed to each model together with different conditions (labels) as in the range of 10–80 years. The output images are subtracted from the original (fed fundus image) and the difference is displayed in Fig. 4 (2nd column to the 9th).

Variations have been observed based on the subjective evaluation. From the visualized results, three key anatomical structures including optical disc, area near OD, and size (volume), were observed to be variant given different ages. The OD region, approximately a circular and bright yellow object, in all the generated fundus images by a variety of modalities found variant from early to late aging. The embedded age as condition mostly influences optic disc with age progression and which can be observed with naked eyes from 5th to 7th row (Fig. 4) for the corresponding model. Similarly, the nearby thick vessels and region to OD have also been observed variant with aging. Besides, the size of the fundus images has also been found variant with age progression. Such variations are apparent for FGC-Net-6 model (Fig. 4-last row). We employed attention mechanism in all the proposed models to highlight the regions of interest while embedding and estimating biological traits. The attention mechanism also highlights pixels in the input image based on their contributions to the final evaluation. Therefore, the affected regions in the generated images can be observed from the underlying modalities. The learning process from the embedded age as condition occurs at abstract level. In other sense, the learning becomes generalized by utilizing the fundus images from both healthy and unhealthy subjects and avoids. Thus, the study innovatively learns biological traits and their effects on fundus images using the cutting-edge technology of deep learning.

The ability of neural networks to use greater abstractions and tighter integrations comes at the cost of lower interpretability. Saliency maps, also called heat maps or attention maps, are common model explanation tools used to visualize model thinking by indicating areas of local morphological changes within fundus photographs that carry more weight in modifying network predictions. Algorithms mainly used the features of the optic disc for gender prediction, which is consistent with the observations made by Poplin [17]. Deep learning models that were trained using images from the UK Biobank and EyePACS data sets primarily highlighted the optic disc, retinal vessels, and macula when soft attention heat maps were applied, although there appeared to be a weak signal distributed throughout the retina [17].

Conclusion and future directions

In this study, we investigate biological traits from fundus images from both healthy and unhealthy subjects. We also extrapolate the variational effects on fundus images with age progression. We proposed two types of DL models name FAG-Net and FGC-Net. FAG-Net estimates age and classifies subjects from fundus images utilizing the dense network architecture together with attention mechanism at distinct levels. The proposed models generalize the learning process in order to avoid the variation in anatomical structure in fundus images caused by the retinal disease. The study successfully carried out age prediction and gender classification with significant accuracy. Similarly, the attention mechanism highlighted regions of interest that are vulnerable to aging. Furthermore, the model shows similar salient regions in ungradable input images as in gradables (Fig. 2). This suggests that the model is sensitive to signals in poor quality images from subtle pixel-level luminance variations, which are likely indifferentiable to humans. This finding underscores the promising ability of deep neural networks to utilize salient features in medical imaging which may remain hidden to human experts. In the future study, more sophisticated deep learning models with attention mechanisms can be proposed for healthy and unhealthy subjects both in isolated and joint form.

Acknowledgements

We thank the support from the National Natural Science Foundation of China 31970752; Science, Technology, Innovation Commission of Shenzhen Municipality JCYJ20190809180003689, JSGG20200225150707332, JCYJ20220530143014032, KCXFZ20211020163813019, ZDSYS20200820165400003, WDZC20200820173710001, WDZC20200821150704001, JSGG20191129110812708; Shenzhen Bay Laboratory Open Funding, SZBL2020090501004; Department of Chemical Engineering-iBHE special cooperation joint fund project, DCE-iBHE-2022-3; Tsinghua Shenzhen Inter- national Graduate School Cross-disciplinary Research and Innovation Fund Research Plan, JC2022009; and Bureau of Planning, Land and Resources of Shenzhen Municipality (2022) 207.

Declarations

Conflict of interest

As the corresponding author on behalf of all the authors, I declare that all the authors are aware of the submission and have no conflict of interest. The submitted paper contains original, unpublished results, and is not currently under consideration elsewhere.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Applying particle swarm optimization-based dynamic adaptive hyperlink evaluation to focused crawler for meteorological disasters

next article MABAC framework for logarithmic bipolar fuzzy multiple attribute group decision-making for supplier selection

Abràmoff MD, Garvin MK, Sonka M (2010) Retinal imaging and image analysis. IEEE Rev Biomed Eng 3:169–208PubMedPubMedCentral

Navab N, Hornegger J, Wells WM, Frangi A (2015) Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III. Springer, vol. 9351

Li T, Bo W, Hu C, Kang H, Liu H, Wang K, Fu H (2021) Applications of deep learning in fundus images: A review. Med Image Anal 69:101971PubMed

Bernardes R, Serranho P, Lobo C (2011) “Digital ocular fundus imaging: A review,” Ophthalmologica, vol. 226, no. 4, p. 161-181, [Online]. Available: https://doi.org/10.1159/000329597

Mackay TF, Stone EA, Ayroles JF (2009) The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10(8):565–577PubMed

Betzler BK, Yang HHS, Thakur S, Yu M, Quek TC, Soh ZD, Lee G, Tham Y-C, Wong TY, Rim TH et al (2021) Gender prediction for a multiethnic population via deep learning across different retinal fundus photograph fields: Retrospective cross-sectional study. JMIR Med Inform 9(8):e25165PubMedPubMedCentral

Austad SN (2006) Why women live longer than men: sex differences in longevity. Gend Med 3(2):79–92PubMed

Zarulli V, Barthold Jones JA, Oksuzyan A, Lindahl-Jacobsen R, Christensen K, Vaupel JW (2018) Women live longer than men even during severe famines and epidemics. Proc Natl Acad Sci 115(4):E832–E840PubMedPubMedCentral

Baum F, Musolino C, Gesesew HA, Popay J (2021) New perspective on why women live longer than men: An exploration of power, gender, social determinants, and capitals. Int J Environ Res Public Health 18(2):661PubMedPubMedCentral

10.

Yoo TK, Kim SH, Kwak J, Kim HK, Rim TH (2018) Association between osteoporosis and age-related macular degeneration: the korea national health and nutrition examination survey. Investigative Ophthalmology & Visual Science 59(4):AMD132–AMD142

11.

Klein BE, Klein R, Linton KL (1992) Prevalence of age-related lens opacities in a population: the beaver dam eye study. Ophthalmology 99(4):546–552PubMed

12.

Rudnicka AR, Jarrar Z, Wormald R, Cook DG, Fletcher A, Owen CG (2012) Age and gender variations in age-related macular degeneration prevalence in populations of european ancestry: a meta-analysis. Ophthalmology 119(3):571–580PubMed

13.

Rim THT, Kim M-H, Kim WC, Kim T-I, Kim EK (2014) Cataract subtype risk factors identified from the korea national health and nutrition examination survey 2008–2010. BMC Ophthalmol 14(1):1–15

14.

Scheie HG, Cameron JD (1981) Pigment dispersion syndrome: a clinical study. Br J Ophthalmol 65(4):264–269PubMedPubMedCentral

15.

Rudnicka AR, Mt-Isa S, Owen CG, Cook DG, Ashby D (2006) Variations in primary open-angle glaucoma prevalence by age, gender, and race: a bayesian meta-analysis. Investigative ophthalmology & visual science 47(10):4254–4261

16.

Zhang X, Saaddine JB, Chou C-F, Cotch MF, Cheng YJ, Geiss LS, Gregg EW, Albright AL, Klein BE, Klein R (2010) Prevalence of diabetic retinopathy in the united states, 2005–2008. JAMA 304(6):649–656PubMedPubMedCentral

17.

Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR (2018) Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering 2(3):158–164PubMed

18.

Githinji B, Shao L, An L, Zhang H, Li F, Dong L, Ma L, Dong Y, Zhang Y, Wei WB et al (2022)“Multidimensional hypergraph on delineated retinal features for pathological myopia task,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 550–559

19.

Wei T, Li X, Stojanovic V (2021) Input-to-state stability of impulsive reaction-diffusion neural networks with infinite distributed delays. Nonlinear Dyn 103:1733–1755

20.

Xu Z, Li X, Stojanovic V (2021) Exponential stability of nonlinear state-dependent delayed impulsive systems with applications. Nonlinear Anal Hybrid Syst 42:101088MathSciNet

21.

Song X, Wu N, Song S, Stojanovic V (2023) “Switching-like event-triggered state estimation for reaction–diffusion neural networks against dos attacks,” Neural Processing Letters, pp. 1–22

22.

Mutlu U, Colijn JM, Ikram MA, Bonnemaijer PW, Licher S, Wolters FJ, Tiemeier H, Koudstaal PJ, Klaver CC, Ikram MK (2018) Association of retinal neurodegeneration on optical coherence tomography with dementia: a population-based study. JAMA Neurol 75(10):1256–1263PubMedPubMedCentral

23.

Owen CG, Rudnicka AR, Welikala RA, Fraz MM, Barman SA, Luben R, Hayat SA, Khaw K-T, Strachan DP, Whincup PH et al (2019) Retinal vasculometry associations with cardiometabolic risk factors in the european prospective investigation of cancer?norfolk study. Ophthalmology 126(1):96–106PubMed

24.

Hassan M, Guan H, Melliou A, Wang Y, Sun Q, Zeng S, Liang W, Zhang Y, Zhang Z, Hu Q et al (2022) “Neuro-symbolic learning: Principles and applications in ophthalmology,” arXiv preprint arXiv:2208.00374

25.

Normando EM, Davis BM, De Groef L, Nizari S, Turner LA, Ravindran N, Pahlitzsch M, Brenton J, Malaguarnera G, Guo L et al (2016) The retina as an early biomarker of neurodegeneration in a rotenone-induced model of parkinson?s disease: evidence for a neuroprotective effect of rosiglitazone in the eye and brain. Acta Neuropathol Commun 4(1):1–15

26.

Korot E, Pontikos N, Liu X, Wagner SK, Faes L, Huemer J, Balaskas K, Denniston AK, Khawaja A, Keane PA (2021) Predicting sex from retinal fundus photographs using automated deep learning. Sci Rep 11(1):1–8

27.

Wang J, Knol MJ, Tiulpin A, Dubost F, de Bruijne M, Vernooij MW, Adams HH, Ikram MA, Niessen WJ, Roshchupkin GV (2019) Gray matter age prediction as a biomarker for risk of dementia. Proc Natl Acad Sci 116(42):21 213-21 218

28.

Xia X, Chen X, Wu G, Li F, Wang Y, Chen Y, Chen M, Wang X, Chen W, Xian B et al (2020) Three-dimensional facial-image analysis to predict heterogeneity of the human ageing rate and the impact of lifestyle. Nat Metab 2(9):946–957PubMed

29.

Cole JH, Franke K (2017) Predicting age using neuroimaging: innovative brain ageing biomarkers. Trends Neurosci 40(12):681–690PubMed

30.

Jónsson BA, Bjornsdottir G, Thorgeirsson T, Ellingsen LM, Walters GB, Gudbjartsson D, Stefansson H, Stefansson K, Ulfarsson M (2019) Brain age prediction using deep learning uncovers associated sequence variants. Nat Commun 10(1):1–10

31.

Cole JH, Ritchie SJ, Bastin ME, Hernández V, Muñoz Maniega S, Royle N, Corley J, Pattie A, Harris SE, Zhang Q et al (2018) Brain age predicts mortality. Mol Psychiatry 23(5):1385–1392PubMed

32.

Yamashita T, Asaoka R, Terasaki H, Murata H, Tanaka M, Nakao K, Sakamoto T (2020) Factors in color fundus photographs that can be used by humans to determine sex of individuals. Translational Vision Science & Technology 9(2):4–4

33.

Ooto S, Hangai M, Yoshimura N (2015) Effects of sex and age on the normal retinal and choroidal structures on optical coherence tomography. Curr Eye Res 40(2):213–225PubMed

34.

Lamparter J, Schmidtmann I, Schuster AK, Siouli A, Wasielica-Poslednik J, Mirshahi A, Höhn R, Unterrainer J, Wild PS, Binder H et al (2018) Association of ocular, cardiovascular, morphometric and lifestyle parameters with retinal nerve fibre layer thickness. PLoS ONE 13(5):e0197682PubMedPubMedCentral

35.

Ting DSW, Wong TY (2018) Eyeing cardiovascular risk factors. Nature Biomedical Engineering 2(3):140–141PubMed

36.

Khan NC, Perera C, Dow ER, Chen KM, Mahajan VB, Mruthyunjaya P, Do DV, Leng T, Myung D (2022) Predicting systemic health features from retinal fundus images using transfer-learning-based artificial intelligence models. Diagnostics 12(7):1714PubMedPubMedCentral

37.

Zhu Z, Shi D, Guankai P, Tan Z, Shang X, Hu W, Liao H, Zhang X, Huang Y, Yu H et al (2022) “Retinal age gap as a predictive biomarker for mortality risk,” British Journal of Ophthalmology,

38.

Betzler BK, Yang HHS, Thakur S, Yu M, Da Soh Z, Lee G, Tham Y-C, Wong TY, Rim TH, Cheng C-Y et al (2021) Gender prediction for a multiethnic population via deep learning across different retinal fundus photograph fields: retrospective cross-sectional study. JMIR Med Inform 9(8):e25165PubMedPubMedCentral

39.

Zhang L, Lei Z, Du Z, Hassan M, Yuan X, Jiang C, Gul I, Zhai S, Zhong X, Xu L et al “Ai-boosted crispr-cas13a and total internal reflection fluorescence microscopy system for sars-cov-2 detection,” Frontiers in Sensors, p. 35

40.

Liu C, Wang W, Li Z, Jiang Y, Han X, Ha J, Meng W, He M (2019) “Biological age estimated from retinal imaging: a novel biomarker of aging,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 138–146

41.

Jin G, Ding X, Xiao W, Xu X, Wang L, Han X, Xiao O, Liu R, Wang W, Yan W et al (2018) Prevalence of age-related macular degeneration in rural southern china: the yangxi eye study. Br J Ophthalmol 102(5):625–630PubMed

42.

Kim YD, Noh KJ, Byun SJ, Lee S, Kim T, Sunwoo L, Lee KJ, Kang S-H, Park KH, Park SJ et al (2020) Effects of hypertension, diabetes, and smoking on age and sex prediction from retinal fundus images. Scientific Reports 10(1)

43.

Hassan M, Wang Y, Wang D, Li D, Liang Y, Zhou Y, Xu D (2021) Deep learning analysis and age prediction from shoeprints. Forensic Sci Int 327:110987PubMed

44.

Jia P, Dai Y, Hu R, Pei G, Manuel AM, Zhao Z (2020) Tsea-db: a trait-tissue association map for human complex traits and diseases. Nucleic Acids Res 48(D1):D1022–D1030PubMed

45.

Franke K, Ziegler G, Klöppel S, Gaser C, Initiative ADN et al (2010) Estimating the age of healthy subjects from t1-weighted MRI scans using kernel methods: exploring the influence of various parameters. Neuroimage 50(3):883–892PubMed

46.

Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews: computational statistics 2(4):433–459

47.

Tipping M (1999) “The relevance vector machine,” Advances in neural information processing systems, vol. 12

48.

Dafflon J, Pinaya WHL, Turkheimer F, Cole JH, Leech R, Harris MA, Cox SR, Whalley HC, Mcintosh AM, Hellyer PJ et al (2020) An automated machine learning approach to predict brain age from cortical anatomical measures. Hum Brain Mapp 41(13):3555–3566PubMedPubMedCentral

49.

Valizadeh S, Hänggi J, Mérillat S, Jäncke L (2017) Age prediction on the basis of brain anatomical measures. Hum Brain Mapp 38(2):997–1008PubMed

50.

Gupta N et al (2013) Artificial neural network. Network and Complex Systems 3(1):24–28

51.

Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567PubMed

52.

Lund MJ, Alnæs D, de Lange A-MG, Andreassen OA, Westlye LT, Kaufmann T (2022) “Brain age prediction using fMRI network coupling in youths and associations with psychiatric symptoms,” NeuroImage: Clinical, vol. 33, p. 102921,

53.

Browne MW (2000) Cross-validation methods. J Math Psychol 44(1):108–132MathSciNetPubMed

54.

Chen C, Cao X, Tian L (2019) Partial least squares regression performs well in MRI-based individualized estimations. Front Neurosci 13:1282PubMedPubMedCentral

55.

Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185:1–17

56.

Zhang R, Han X, Lei Z, Jiang C, Gul I, Hu Q, Zhai S, Liu H, Lian L, Liu Y et al (2022) Rcmnet: A deep learning model assists car-t therapy for leukemia. Comput Biol Med 150:106084PubMed

57.

Billmeyer R, Parhi KK (2021) Biological Gender Classification from fMRI via Hyperdimensional Computing,

58.

Thomas A, Dasgupta S, Rosing T (2021) Theoretical foundations of hyperdimensional computing. Journal of Artificial Intelligence Research 72:215–249MathSciNet

59.

He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

60.

Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999,

61.

Ayachi R, Afif M, Said Y, Atri M (2018) “Strided convolution instead of max pooling for memory efficiency of convolutional neural networks,” in International conference on the Sciences of Electronics, Technologies of Information and Telecommunications. Springer, pp. 234–243

62.

Drozdzal M, Vorontsov E, Chartrand G, Kadoury S, Pal C (2016) “The importance of skip connections in biomedical image segmentation,” in Deep learning and data labeling for medical applications. Springer, pp. 179–187

63.

Cai L, Gao H, Ji S (2019) “Multi-stage variational auto-encoders for coarse-to-fine image generation,” in Proceedings of the 2019 SIAM International Conference on Data Mining. SIAM, pp. 630–638

64.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

65.

Radford A, Metz L, Chintala S (2015) “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434

66.

Larsen ABL, Sønderby SK, Larochelle H, Winther O (2016) “Autoencoding beyond pixels using a learned similarity metric,” in International conference on machine learning. PMLR, pp. 1558–1566

67.

Kullback S (1997) Information theory and statistics. Courier Corporation

68.

Kingma DP, Welling M et al (2019) “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392

69.

Duchi J (2007) Derivations for linear algebra and optimization. Berkeley, California 3(1):2325–5870

70.

“Peking university international competition on ocular disease intelligent recognition (ODIR-2019): Odir-5k.” [Online]. Available: https://odir2019.grand-challenge.org/

71.

Kovalyk O, Morales-Sánchez J, Verdú-Monedero R, Sellés-Navarro I, Palazón-Cabanes A, Sancho-Gómez J-L (2022) Papila: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment. Scientific Data 9(1):1–12

72.

Yan YN, Wang YX, Yang Y, Xu L, Xu J, Wang Q, Yang JY, Yang X, Zhou WJ, Ohno-Matsui K et al (2018) Ten-year progression of myopic maculopathy: the beijing eye study 2001–2011. Ophthalmology 125(8):1253–1263PubMed

73.

Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

74.

Simonyan K, Zisserman A (2014) “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556

75.

Van Den Oord A, Kalchbrenner N, Kavukcuoglu K (2016) “Pixel recurrent neural networks,” in International conference on machine learning.PMLR, , pp. 1747–1756

76.

Srivastava RK, Greff K, Schmidhuber J (2015) “Highway networks,” arXiv preprint arXiv:1505.00387

77.

Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI conference on artificial intelligence 31(1)

78.

Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition:1492–1500

Title: Retinal disease projection conditioning by biological traits
Authors: Muhammad Hassan
Hao Zhang
Ahmed Ameen Fateh
Shuyue Ma
Wen Liang
Dingqi Shang
Jiaming Deng
Ziheng Zhang
Tsz Kwan Lam
Ming Xu
Qiming Huang
Dongmei Yu
Canyang Zhang
Zhou You
Wei Pang
Chengming Yang
Peiwu Qin
Publication date: 19-07-2023
Publisher: Springer International Publishing
Published in: Complex & Intelligent Systems / Issue 1/2024
Print ISSN: 2199-4536
Electronic ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-023-01141-0

Springer Professional

Retinal disease projection conditioning by biological traits

Abstract

Publisher's Note

Introduction

Literature study

Methodology

Biological traits estimation using FAG-Net architecture

Objective function for FAG-Net

Evaluation metrics for FAG-Net

Fundus images generation given age as condition

Encoding

Bottleneck and conditioning

Decoding

Skip layer

Discriminator

Objective function for FGC-Net

Dataset preparation

Network training

Results

Biological traits estimation

Gender classification

Age progression effects in fundus images

Conclusion and future directions

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

Introduction

Literature study

Methodology

Biological traits estimation using FAG-Net architecture

Objective function for FAG-Net

Evaluation metrics for FAG-Net

Fundus images generation given age as condition

Encoding

Bottleneck and conditioning

Decoding

Skip layer

Discriminator

Objective function for FGC-Net

Dataset preparation

Network training

Results

Biological traits estimation

Gender classification

Age progression effects in fundus images

Conclusion and future directions

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Other articles of this Issue 1/2024

An Aczel-Alsina aggregation-based outranking method for multiple attribute decision-making using single-valued neutrosophic numbers

Deeply integrating unsupervised semantics and syntax into heterogeneous graphs for inductive text classification

Novel intuitionistic fuzzy Aczel Alsina Hamy mean operators and their applications in the assessment of construction material

DIFLD: domain invariant feature learning to detect low-quality compressed face forgery images

Rotation-equivariant transformer for oriented person detection of overhead fisheye images

Dual-attention Network for View-invariant Action Recognition

Premium Partner