Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system

doi:10.1016/j.swevo.2016.02.002

Swarm and Evolutionary Computation

Volume 28, June 2016, Pages 144-160

https://doi.org/10.1016/j.swevo.2016.02.002 Get rights and content

Highlights

•
Two variants of Kernel ridge regression (KRR) are used for microarray medical data classification.
•
Modified cat swarm optimization is used for relevant feature selection.
•
Both binary and multiclass medical datasets are used.
•
The wavelet kernel ridge regression produces superior classification compared to radial basis ridge regression.

Abstract

Microarray gene expression based medical data classification has remained as one of the most challenging research areas in the field of bioinformatics, machine learning and pattern classification. This paper proposes two variations of kernel ridge regression (KRR), namely wavelet kernel ridge regression (WKRR) and radial basis kernel ridge regression (RKRR) for classification of microarray medical datasets. Microarray medical datasets contain irrelevant and redundant genes which cause high number of gene expression i.e. dimensionality and small sample sizes. To overcome the curse of dimensionality of the microarray datasets, modified cat swarm optimization (MCSO), a naturally inspired evolutionary algorithm, is used to select the most relevant features from the datasets. The adequacies of the classifiers are demonstrated by employing four from each binary and multi-class microarray medical datasets. Breast cancer, prostate cancer, colon tumor, leukemia datasets belong to the former and leukemia1, leukemia2, SRBCT, brain tumor1 to the latter. A number of useful performance evaluation measures including accuracy, sensitivity, specificity, confusion matrix, Gmean, F-score and the area under the receiver operating characteristic (ROC) curve are considered to examine the efficacy of the model. Other models like simple ridge regression (RR), online sequential ridge regression (OSRR), support vector machine radial basis function (SVMRBF), support vector machine polynomial (SVMPoly) and random forest are studied and analyzed for comparison. The experimental results demonstrate that KRR outperforms other models irrespective of the datasets and WKRR produces better results as compared to RKRR. Finally, when the results are compared on the basis of binary and multi-class datasets, it is found that binary class yields a little bit better result as compared to the multiclass irrespective of models.

Introduction

Microarray analysis and classification are very much essential for early diagnosis and treatment of the most dreaded disease like cancer. It shows the highest rate of morbidity and mortality in economically developed countries and stands second in developing countries [1]. Mostly, the human beings suffer from 200 types of cancer and the microarray technology is adopted to keep records of them [2]. The GLOBOCAN database, World Health Organization, Global health observatory and United Nations World population prospectus report that the four most common cancers occurring worldwide are lung, female breast, bowel and prostate cancer [3]. It causes abnormal and uncontrolled cell growth. It is related to genome and caused by oncogenes. Molecular analysis reveals that different cancer types will have different gene expression profiles [4], [5] and these may then be utilized to diagnose different cancers. High-density DNA microarray measures the activities of several thousand genes in a parallel way. This new approach helps in giving better therapeutic measurements to cancer patients by diagnosing cancer types with improved accuracy [5]. Early detection of any type of cancer increases the chance of survival for the victim. This detection is often formulated as a classification problem [6].

Microarray technology produces big datasets with gene expression values for thousands of genes (6000–60,000) in a cell mixture [7]. Hence, it becomes economically prohibitive to have a large sample size. This phenomena is called as a curse of dimensionality where samples (n)《 the number of features (p) [8]. To overcome this problem, microarray medical datasets need dimension reduction [8]. Dimensionality reduction methods are broadly classified into two types i.e. feature extraction [6], [7] and feature selection [8], [9], [10], [11]. During feature extraction, the features are projected into a new feature space with low dimensionality where the new features are generated as the combinations of original features. Widely-used feature extraction techniques are principal component analysis (PCA) [12], [13], [14], kernel principal component analysis (KPCA) [14], linear discriminate analysis (LDA) [12], [13] and canonical correlation analysis (CCA) [15]. On the other hand, feature selection method helps in selecting a subset of highly discriminating features from the original feature set without any transformation. Hence,feature selection is superior to feature extraction in terms of better readability and interpretability [11].

Feature selection algorithms are classified into supervised, unsupervised and semi-supervised depending on the presence or absence of class [9]. Supervised feature selection method includes filter, wrapper and embedded models. Filter models do not use any classifier [9]. This technique evaluates the significance of features by looking at the intrinsic properties of the data. In this approach, all the features are scored and ranked based on certain statistical criteria. Accordingly, features with highest ranking values are selected and the low scoring features are removed. As compared to other feature selection methods, filter methods are faster but they have three major limitations: (1) they ignore the interaction with the classifier; (2) each feature is considered independently thus ignoring feature dependencies; and (3) it is very difficult to determine the threshold point for ranking the features.

The wrapper model uses a predictive accuracy of a predetermined learning algorithm to determine the quality of selected features. This method is computationally expensive to run big datasets with large number of features. The embedded model bridges the gap between these two models by taking the advantages from both the techniques [9]. Feature selection methods proposed in the literature are fast correlation based filter (FCBF) [16], relief algorithm [17], support vector machine recursive feature elimination [18], sequential forward selection (SFS) [19] and sequential backward elimination (SBE) [19]. Amongst all the methods, SFS and SBE are extensively used due to their simplicity and low computational overhead. But they also have their own limitations. The major drawback of the sequential search method is the nesting effect i.e. in backward search when a feature is deleted it cannot be reselected and in forward search when a feature is selected, it cannot be deleted [20]. That is why the stochastic search strategy is adopted where some randomness is introduced in the search process and the feature selection process becomes less sensitive to the particular dataset. The most popular stochastic methods of feature selection are genetic algorithm [21], simulated annealing [22], ant colony optimization [23], particle swarm optimization [24], [25], [26], differential evolution [27], [28], bacterial foraging optimization [29], harmony search [30], cuckoo search [31], firefly [32], bat algorithm [33] and cat swarm optimization [34]. So,major advantages of feature selection method are selection without transformation, better readability and decrease in computational overhead [6].

Dimensionality reduction helps in the classification of microarray medical datasets by improving its accuracy. The important role of medical data classifier is to provide the explanation and justification for the accurate prediction of the disease [6]. Many traditional classifiers like KNN [35], naïve-bayes (NB) [36], decision tree [37], random forest [38], ID3 [39], C4.5 [40] and various neural network based classifiers like multilayer perceptron (MLP) [41], RBFNN [42], FLANN [43], SVM [44], [45], [46], [47] are found in the literature. Amongst all the classifiers, ANN and its variant are extensively used by researchers to classify medical datasets [48]. The success of the ANN based classifier is mostly dependent on the number of hidden layers, number of nodes in each hidden layer, values of the weights between input to hidden layers, hidden to output layer and the learning algorithms. In the literature, it is generally seen that when ANN is associated with gradient descent learning algorithm the performance of the model becomes time consuming. It also increases the computational overhead [49]. Beside this, due to the initial random choice of parameters, the convergence rate of the gradient descent learning algorithm becomes very slow and most often it gets trapped in the local minima. To avoid the above said limitations, pseudo-inverse based neural network [50], [51], [52], [53], [54], [55] has been proposed by many researchers like Schmidt [54], Pao [50], Broomhead and David Lowe [51]. Pseudo-inverse based neural network is recently re-named as extreme learning machine (ELM) [56] with the bias in ELM set to zero. However, this paper explores the possibility of using kernel ridge regression (KRR) [57], [58] that is recently renamed as kernel ELM [59] for microarray data classification. The architecture of ridge regression has some similarity with RVFL [52] and pseudo-inverse based neural network as it uses randomly assigned input weights between the input layer and hidden layer and the weights between the output layer and hidden layer are learnt using a pseudo-inverse formulation. However, ridge regression produces a large variation in the classification accuracy in different trials with the same number of hidden nodes. But kernel function addresses this problem by replacing the hidden layer of the ridge regression. The main advantage of kernel ridge regression is that the kernel function does not need to satisfy Mercer׳s theorem and there is no need of any randomness in assigning the connection weights between input and hidden layers. Literature suggests that kernel ridge regression is very much similar to kernel pseudo-inverse based neural network (KPINN) [58]. It exploits the concept of quadratic program algorithms for convex optimization from mathematical programming. It also borrows the idea of kernel representations from mathematical analysis and adopts the objective of finding maximum margin classifier from machine learning theory [60].

This paper proposes a modified cat swarm optimization (MCSO)technique to select the most optimal features from microarray medical datasets and kernel ridge regression (WKRR and RKRR) to classify the features obtained from MCSO algorithm. Literature in this domain also shows that CSO performs better than PSO though its computational complexity is higher than PSO [61]. In addition to it, both PSO and DE [62] sometime get influenced by parameter convergence and stagnation problem [63] which is not there in CSO. Further, modified cat swarm optimization based feature selection method (MCSO) that is capable of improving search efficiency within the entire problem space has been used to get best optimal candidate features from the high dimensional microarray medical dataset. The proposed feature selection methods employ k-nearest neighbor algorithm as the classifier and use five-fold cross validation technique to determine the classification accuracy.

The paper is organized as follows; 2 The process model for the classification of microarray datasets, 3 Datasets describe the process model and benchmark microarray medical datasets respectively. Section 4 deals with modified cat swarm optimization based feature selection method (MCSO). All the classifiers used in this study i.e. RR, OSRR, KRR, SVM and random forest, etc. are discussed in Section 5. Performance evaluation measures are presented in Section 6. Simulation results and analysis appear in 7 Simulation results, 8 Result analysis. Finally, conclusion is drawn in Section 9.

Section snippets

The process model for the classification of microarray datasets

All the microarray medical datasets are normalized using max–min normalization method as shown in Eq. (1). The modified cat swarm optimization algorithm (MCSO) is used to select the optimal feature subsets from these normalized datasets. For each dataset, MCSO is used to derive 10 subsets consisting of 10–100 genes in the interval of 10. To get the most optimal candidate features, k-nearest neighbor (KNN) classifier is considered to find out the classification accuracy. The subset with lower

Datasets

This section introduces eight benchmark microarray [64], [65], [66], [67], [68], [69], [70], [71] datasets downloaded from http://www.gems-system.org [64] and http://datam.i2r.a-star.edu.sg/datasets/krbd/ [65]. Out of eight datasets, four are binary: breast cancer, prostate cancer, colon tumor and leukemia. Other four are multi-class leukemia1, leukemia2, brain tumor1 and SRBCT. Each dataset is divided into two data files i.e. training and testing. The output is 0 or 1 for binary classification

Feature selection

Feature selection in itself is one of the important research areas in the domain of machine learning. The main advantage of feature selection is getting the most optimal candidate features that help in improving classification accuracy and reducing computational overhead, resource demand and storage space requirement. In the process the most influential features get selected so that the user can be able to interpret the relation between the features and classes [8], [25]. The idea of applying

RR, OSRR, KRR,SVM and random forest classifiers

This section discusses all five classifiers – RR, OSRR, KRR, SVM, Random Forest – used to classify both binary and multi-class microarray medical datasets.

Performance evaluation measures

Performance of all the classifiers are evaluated by different measures like training accuracy, testing accuracy, confusion matrix, receiver operating characteristic curve (ROC), sensitivity, specificity, Gmean, and F-score [76], [77], [78], [79].

Simulation results

This section discusses results obtained from all seven models − RR, OSRR, WKRR, RKRR, SVMRBF, SVMPoly and random forest – for both binary and multi-class microarray medical datasets. Due to space constraint few results are omitted. All the classifiers are implemented in the following environment, Operating System: Windows XP professional, CPU: Intel Core i3-370M (2.4 GHz), and Memory: 4GB RAM.

The number of hidden nodes ‘L’ in RR is varying from 2 to 22 with an increment of 2 neurons each time. A

Result analysis

In this paper, binary and multi-class microarray medical datasets are classified using RR, OSRR, WKRR, RKRR, SVMRBF, SVMPoly and random forest models. The meta-heuristic algorithm CSO and MCSO are used to select the most optimal candidate subsets. Number of gene selected and classification accuracy obtained from KNN in feature selection step are analyzed and compared. From Table 13, it is clearly established that MCSO yields better results as compared with CSO.

Testing accuracy obtained from KRR

Conclusion

Microarray data analysis and classification are very much essential for the effective diagnosis of cancer. However, microarray medical datasets always suffer from curse of dimensionality. So to select the most relevant features, MCSO has been proposed and compared with CSO. The proposed technique proves to be a better one. The selected features have been classified applying two variations of KRR, WKRR and RKRR. Other models like RR, OSRR, SVM (both SVMRBF and SVMPoly) and Random Forest have

References (80)

J. Ferlay
Cancer incidence and mortality patterns in Europe: estimates for 40 countries in 2012
Eur. J. Cancer
(2013)
Der-Chiang Li et al.
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets
Artif. Intell. Med.
(2011)
Der-Chiang Li et al.
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets
Artif. Intell. Med.
(2011)
V. Bolón-Canedo
A review of microarray datasets and applied feature selection methods
Inf. Sci.
(2014)
Abdulhamit Subasi et al.
EEG signal classification using PCA, ICA, LDA and support vector machines
Expert Syst. Appl.
(2010)
M.M. López
SVM-based CAD system for early detection of the Alzheimer׳s disease using kernel PCA and LDA
Neurosci. Lett.
(2009)
Cheng-Lung Huang et al.
A GA-based feature selection and parameters optimizationfor support vector machines
Expert Syst. Appl.
(2006)
Shih-Wei Lin
Parameter determination of support vector machine and feature selection using simulated annealing approach
Appl. Soft Comput.
(2008)
Hamidreza Rashidy Kanan et al.
An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system
Appl. Math. Comput.
(2008)
Alper Unler et al.
A discrete particle swarm optimization method for feature selection in binary classification problems
Eur. J. Oper. Res.
(2010)

Yuanning Liu

An improved particle swarm optimization for feature selection

J. Bionic Eng.

(2011)

Ahmed Al-Ani et al.

Feature subset selection using differential evolution and a wheel based search strategy

Swarm Evolut. Comput.

(2013)

B.B. Chaudhuri et al.

Efficient training and improved performance of multilayer perceptron in pattern classification

Neurocomputing

(2000)

T.S. Subashini et al.

Breast mass classification based on cytological patterns using RBFNN and SVM

Expert Syst. Appl.

(2009)

Dursun Delen et al.

Predicting breast cancer survivability: a comparison of three data mining methods

Artif. Intell. Med.

(2005)

F. Fernández-Navarro et al.

Evolutionary generalized radial basis function neural networks for improving prediction accuracy in gene classification using feature selection

Appl. Soft Comput.

(2012)

Yoh-Han Pao et al.

Learning and generalization characteristics of the random vector functional-link net

Neurocomputing

(1994)

Binghuang Cai et al.

A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion

J. Biomed. Inf.

(2014)

Ganapati Panda et al.

IIR system identification using cat swarm optimization

Expert Syst. Appl.

(2011)

Suman Kumar Saha

Cat swarm optimization algorithm for optimal linear phase FIR filter design

ISA Trans.

(2013)

Alexander Statnikov

GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data

Int. J. Med. Inf.

(2005)

Austin H. Chen et al.

The improvement of breast cancer prognosis accuracy from integrated gene expression and clinical data

Expert Syst. Appl.

(2012)

César Ferri et al.

An experimental comparison of performance measures for classification

Pattern Recognit. Lett.

(2009)

Panos Kanavos

The rising burden of cancer in the developing world

Ann. Oncol.

(2006)

Rebecca L. Siegel et al.

Cancer statistics

CA Cancer J. Clin.

(2015)

Yoonkyung Lee et al.

Classification of multiple cancer types by multicategory support vector machines using gene expression data

Bioinformatics

(2003)

Isabelle Guyon et al.

An Introduction to Feature Extraction Feature Extraction

(2006)

Yvan Saeys et al.

A review of feature selection techniques in bioinformatics

Bioinformatics

(2007)

Mingyue Tan

Comparative Study of Kernel Based Classification and Feature Selection Methods with Gene Expression Data

(2006)

Karg Michelle, et al., A comparison of PCA, KPCA and LDA for feature extraction to recognize affect in gait kinematics,...

Jean-Philippe Vert et al.

Graph-driven feature extraction from microarray data using diffusion kernels and kernel CCA

Adv. Neural Inf. Proces. Syst.

(2002)

Senliol Baris, et al., Fast Correlation Based Filter (FCBF) with a different search strategy, In: Proceedings of the...

Yi Zhang et al.

Gene selection algorithm by combining reliefF and mRMR

BMC Genom.

(2008)

Kai-Bo Duan

Multiple SVM-RFE for gene selection in cancer classification with expression data

IEEE Trans. Nanobiosci.

(2005)

Hanchuan Peng et al.

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

Kevin Dunne et al.

Solutions to instability problems with sequential wrapper-based approaches to feature selection

J. Mach. Learn. Res.

(2002)

Simon Fong

Selecting optimal feature set in high-dimensional data by swarm search

J. Appl. Math.

(2013)

Rami N. Khushaba, Ahmed Al-Ani, Adel Al-Jumaily, Differential evolution based feature subset selection, In:...

Rasleen Jakhar et al.

Face recognition using bacteria foraging optimization-based selected features

Int. J. Adv. Comput. Sci. Appl.

(2011)

Cited by (117)

Pattern recognition frequency-based feature selection with multi-objective discrete evolution strategy for high-dimensional medical datasets
2024, Expert Systems with Applications
Feature selection has a prominent role in high-dimensional datasets to increase classification accuracy, decrease the learning algorithm computational time, and present the most informative features to decision-makers. This paper proposes a two-stage hybrid feature selection for high-dimensional medical datasets: Maximum Pattern Recognition - Multi-objective Discrete Evolution Strategy (MPR-MDES). MPR is a rapid filter ranker that significantly outperforms existing frequency-based rankers in recognizing non-linear patterns, effectively eliminating a majority of non-informative features. Then, the wrapper Multi-objective Discrete Evolution Strategy (MDES) uses the remaining features and obtains sets of solutions which are automatically presented to decision-makers. The experiments conducted on large medical datasets demonstrate that MPR-MDES achieves considerable improvements compared to state-of-the-art methods, in terms of both classification accuracy and dimensionality reduction. In this sense, the proposal successfully performs when presenting informative feature sets to decision-makers. The implementation is available on https://github.com/KhaosResearch/MPR-MDES.
Sparse low-redundancy multilabel feature selection based on dynamic local structure preservation and triple graphs exploration
2024, Expert Systems with Applications
Much semantic information is involved in multilabel data due to more than one label associated with each instance. The redundant features and noise challenge knowledge mining in multilabel data. Constructing a learning model with discriminative features is essential for multilabel learning. Sparse graph-based methods simultaneously consider the topological structure, complex relations between features and labels, and the significance of features. However, three challenges exist. First, they either consider local label correlation or local label relevance and are complementary in the feature selection process. Second, existing methods use low-quality static graphs to explore local label correlations that result in degraded performance. Finally, only some methods deal with redundant features. A ridge regression-based sparse multilabel learning is proposed in this study to address these problems. The global and local label correlation are explored by preserving the instance-level graph structure to obtain a robust low-dimensional pseudo-label matrix to construct a high-quality dynamic label-level graph. Meanwhile, it preserves the feature-level graph structure to select low-redundant features. In addition, a new $ℓ_{2, 1 / 2 - 2}$ -norm is designed to maintain the high-row sparsity of the model. The above items are embedded into a unified multilabel learning framework. A simple and effective optimization solution is finally designed and compared with eight relevant algorithms on twelve public benchmark data sets. The results demonstrate that the algorithm can improve classification performance.
A systematic review on the potency of swarm intelligent nanorobots in the medical field
2024, Swarm and Evolutionary Computation
The field of robotics is emerging quickly, and the miniaturization of robots to the nanoscale has opened up new possibilities for healthcare. Swarm nanorobotics, as a research area, has attracted interest in recent years. It is viewed as a promising option for various medical applications due to its high drug delivery efficiency and low invasiveness. This survey article focuses on the challenges associated with swarm nanorobotics in the medical field, as well as the classification of nanorobotic systems. Additionally, recent progress in swarm nanorobots in medical applications is discussed in detail, including their use in oncology, drug delivery, surgery (such as for the eye, heart, neuro, biofilm, and intracellular), diabetes, thrombolysis, and dentistry. The article also reviews and summarizes swarm nanorobotics algorithms to overcome various issues, such as obstacle avoidance, path planning, control, and motion. Moreover, an assessment of the future direction of swarm nanorobotics in the medical field is provided.
Efficient machine learning algorithm with enhanced cat swarm optimization for prediction of compressive strength of GGBS-based geopolymer concrete at elevated temperature
2023, Construction and Building Materials
In order to assess building damage and develop fire safety applications, it is crucial to examine the mechanical behavior of concrete after exposure to high temperatures. Compressive strength is the most crucial mechanical characteristic of concrete which is vital for the quality assurance of engineering structures. But it is difficult to estimate the compressive strength with accuracy after geopolymer concrete (GPC) is exposed to extreme temperatures. The residual compressive strength of GGBS-based GPC is predicted in this study using a novel enhanced cat swarm-optimized extreme learning machine (ELM-ECSO) model. The ELM-ECSO compared executing statistical parameters such as the determination coefficient (R²), adjusted determination coefficient (Adj. R²), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) with simple ELM and ELM with cat swarm optimized to check its generalizability. To create a dataset for training and testing of the model, several experiments using various mix designs were carried out at high temperatures. The quasi-Monte Carlo method was implemented for the sensitivity analysis of the model. Assessment results showed that the ELM-ECSO model transcends the other two models owing to the highest prediction accuracy and showing the least error. Shapiro-Wilk statistical test was carried out to compare all the models. The sensitivity analysis indicated temperature exposure and curing age as the most impactful parameter while predicting the residual compressive strength of GPC. GPC with a Na₂SiO₃/NaOH ratio of 2.5 and alkaline solution/GGBS ratio of 0.40 shows better resilience against higher temperatures.
An improvised nature-inspired algorithm enfolded broad learning system for disease classification
2023, Egyptian Informatics Journal
Deep analysis of genomic data reveals that many deadly diseases are generated due to genetic mutation. To make the health care system more robust, a machine learning researcher’s prime intention is to classify the genomic data more efficiently within less time. As the genomic data suffers from the malediction of excessive dimensionality, the selection of the notable genes is always a big challenge for the researcher. The selection of prominent genomic key attributes by any nature-inspired learning algorithm always remains a non-deterministic polynomial-time (NP-Hard) problem. Therefore, there is always a scope to apply new algorithms. In this projected work, an improvised sine–cosine hybridized Monarch Butterfly Optimization (SC-MBO) algorithm, is embedded with the Broad Learning System (BLS), which is defined as SC-MBO-BLS, for choosing the most significant genes and classifying the genomic data simultaneously. Initially, Kernel-based Fisher Score (K-FS) is applied to select notable genes. Then, the selected genes further undergoes for execution using the SC-MBO-BLS model. To prove the effectiveness of the suggested model, ten cancerous genomic data are considered. Here, several performance evaluators (i.e., precision, MCC, sensitivity, Kappa, F-score, and specificity) are applied for unbiased comparison. This presented model is compared with SC-MBO wrapped Multilayer Perceptron (SC-MBO-MLP), SC-MBO wrapped Extreme Learning Machine (SC-MBO-ELM), and SC-MBO wrapped Kernel Extreme Learning Machine (SC-MBO- KELM) and yields the highest accuracy in ten datasets such as 100%, 98.4%, 99%, 99.6%, 100, 97.2%, 100%, 100%, 98.6%, 99.5% in Leukemia, Colon tumor, Breast cancer, Ovarian cancer, Lymphoma-3, MLL, ALL-AML-3, SRBCT, ALL-AML-4 and Lung cancer respectively. Further, the existing twenty standard models are taken for comparison with the suggested model. Additionally, to assess the presented model, a statistical method i.e., Analysis of variance (ANOVA) is considered. As per the above quantitative and qualitative estimation, it is deduced that the suggested SC-MBO-BLS approach outclasses other considering models.
Privacy-enhanced and non-interactive linear regression with dropout-resilience
2023, Information Sciences
Linear regression, the most basic and widely used machine learning algorithm, has played an essential role in many areas such as healthcare, economics, and weather prediction. However, in practice, regression training requires a large dataset, usually stored in a distributed form from multiple data owners. Moreover, it is difficult to federate data among users to train a global model because of privacy issues and communication limitations. To address these challenges, we propose a non-interactive privacy-enhanced training scheme for linear regression based on functional encryption - FELR. In particular, based on the secure aggregation scheme and Paillier inner-product functional encryption, two cloud servers can train the model on the ciphertext. The solution does not require any interaction between the cloud servers and data owners. Throughout the training period, the cloud servers do not know any local private data. A detailed security analysis demonstrates that FELR can provide high security for data owners. The experimental conclusions also demonstrate that the performance of the training scheme on the data owners' side is approximately 25% higher than that of the existing solution, which significantly decreases the computational costs.

View all citing articles on Scopus

View full text

Regular PaperMicroarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system

Highlights

Abstract

Introduction

Section snippets

The process model for the classification of microarray datasets

Datasets

Feature selection

RR, OSRR, KRR,SVM and random forest classifiers

Performance evaluation measures

Simulation results

Result analysis

Conclusion

Eur. J. Cancer

Artif. Intell. Med.

Artif. Intell. Med.

Inf. Sci.

Expert Syst. Appl.

Neurosci. Lett.

Expert Syst. Appl.

Appl. Soft Comput.

Appl. Math. Comput.

Eur. J. Oper. Res.

J. Bionic Eng.

Swarm Evolut. Comput.

Neurocomputing

Expert Syst. Appl.

Artif. Intell. Med.

Appl. Soft Comput.

Neurocomputing

J. Biomed. Inf.

Expert Syst. Appl.

ISA Trans.

Int. J. Med. Inf.

Expert Syst. Appl.

Pattern Recognit. Lett.

The rising burden of cancer in the developing world

Ann. Oncol.

Cancer statistics

CA Cancer J. Clin.

Classification of multiple cancer types by multicategory support vector machines using gene expression data

Bioinformatics

An Introduction to Feature Extraction Feature Extraction

A review of feature selection techniques in bioinformatics

Bioinformatics

Comparative Study of Kernel Based Classification and Feature Selection Methods with Gene Expression Data

Graph-driven feature extraction from microarray data using diffusion kernels and kernel CCA

Adv. Neural Inf. Proces. Syst.

Gene selection algorithm by combining reliefF and mRMR

BMC Genom.

Multiple SVM-RFE for gene selection in cancer classification with expression data

IEEE Trans. Nanobiosci.

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

IEEE Trans. Pattern Anal. Mach. Intell.

Solutions to instability problems with sequential wrapper-based approaches to feature selection

J. Mach. Learn. Res.

Selecting optimal feature set in high-dimensional data by swarm search

J. Appl. Math.

Face recognition using bacteria foraging optimization-based selected features

Int. J. Adv. Comput. Sci. Appl.

Regular Paper
Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system