Next Article in Journal
An Ensemble and Iterative Recovery Strategy Based kGNN Method to Edit Data with Label Noise
Next Article in Special Issue
A VNS-Based Matheuristic to Solve the Districting Problem in Bicycle-Sharing Systems
Previous Article in Journal
On Transmission Irregular Cubic Graphs of an Arbitrary Order
Previous Article in Special Issue
Equilibrium in a Bargaining Game of Two Sellers and Two Buyers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data

by
Jayashree Piri
1,
Puspanjali Mohapatra
2,
Biswaranjan Acharya
3,*,
Farhad Soleimanian Gharehchopogh
4,
Vassilis C. Gerogiannis
5,*,
Andreas Kanavos
6,* and
Stella Manika
7
1
Department of CSE, GITAM Institute of Technology (Deemed to be University), Visakhapatnam 530045, India
2
Department of CSE, International Institute of Information Technology, Bhubaneswar 751029, India
3
Department of Computer Engineering-AI, Marwadi University, Rajkot 360003, India
4
Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia 5756151818, Iran
5
Department of Digital Systems, University of Thessaly, Geopolis Campus, 45100 Larissa, Greece
6
Department of Digital Media and Communication, Ionian University, 28100 Kefalonia, Greece
7
Department of Planning and Regional Development, University of Thessaly, 38334 Volos, Greece
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(15), 2742; https://doi.org/10.3390/math10152742
Submission received: 9 July 2022 / Revised: 28 July 2022 / Accepted: 29 July 2022 / Published: 3 August 2022
(This article belongs to the Special Issue Advanced Optimization Methods and Applications)

Abstract

:
Feature selection (FS) is commonly thought of as a pre-processing strategy for determining the best subset of characteristics from a given collection of features. Here, a novel discrete artificial gorilla troop optimization (DAGTO) technique is introduced for the first time to handle FS tasks in the healthcare sector. Depending on the number and type of objective functions, four variants of the proposed method are implemented in this article, namely: (1) single-objective (SO-DAGTO), (2) bi-objective (wrapper) (MO-DAGTO1), (3) bi-objective (filter wrapper hybrid) (MO-DAGTO2), and (4) tri-objective (filter wrapper hybrid) (MO-DAGTO3) for identifying relevant features in diagnosing a particular disease. We provide an outstanding gorilla initialization strategy based on the label mutual information (MI) with the aim of increasing population variety and accelerate convergence. To verify the performance of the presented methods, ten medical datasets are taken into consideration, which are of variable dimensions. A comparison is also implemented between the best of the four suggested approaches (MO-DAGTO2) and four established multi-objective FS strategies, and it is statistically proven to be the superior one. Finally, a case study with COVID-19 samples is performed to extract the critical factors related to it and to demonstrate how this method is fruitful in real-world applications.

1. Introduction

Good health is the hallmark of life. But the story of disease is one that has affected humanity in various forms, forcing humans to struggle and compelling researchers to reveal the secrets of disease. Machine learning (ML) has established a method of feature selection where the features consist the cause of creating a disease in humans. A medical diagnosis constitutes a difficult procedure that necessitates clinical expertise. The demand for precise judgments, on the other hand, must be tempered with an understanding of the uncertainty that exists in many clinical circumstances. Rather than assuming diagnostic certainty, complicated presentations sometimes necessitate probabilistic assumptions. People can produce and store data at an unbelievable rate in the digital realm. This explosion of accessible data for additional analysis may be seen in medicine just as much as it can be seen in other fields. Various artificial intelligence technologies have been used to solve a variety of medical challenges with the goal of automating time-consuming and frequently subjective manual procedures carried out by physicians in a variety of disciplines. However, it is difficult to translate AI research into clinically verified and adequately regulated systems that can benefit everyone in a safe and timely manner. Clinical assessment is critical, with measurements that are understandable to physicians and ideally go beyond technical correctness to encompass quality of treatment and patient outcomes.
A vast number of illness indicators are frequently found in medical databases. Concretely, some illness indicators are not helpful in clinical data processing and they can even be harmful. As a result, feature selection is crucial since it can exclude illness signs that are not significant. It also improves the efficacy of the medical decision support systems by reducing their learning time and improving data understanding. FS has been particularly successful in clinical uses, as it may not only shrink dimensions but also aid in the understanding of illness aetiology.
FS techniques are mainly distinguished into three groups, namely wrapper, filter, and embedding procedures [1]. Wrappers look at the quasi-optimal sub-strings of attributes and a classifier for the wrapper process. More to the point, wrappers offer better consequences than filtering methods owing to the employment of a prediction system, but these tactics take longer to implement because of the classification system’s constant learning [2]. An alternative approach is to use statistical concepts and information theories to identify a subset of features that have the highest connection with a certain outcome while simultaneously minimising any internal correlations [1]. Embedded approaches also seek to incorporate the FS process into the classification training phase [3].
Evolutionary techniques have been presented as a response to the challenges outlined above. Due to population-based and global search potentiality, these designs are able to find better optimal results in contrast to greedy techniques [4,5,6,7,8,9]. Few studies have attempted to integrate filter and wrapper models by using evolutionary computing (EC) techniques, as the most extant EC algorithms follow one of these two models: filter or wrapper. The majority also treats FS as a one-objective task.
The artificial gorilla troop optimization (AGTO) is an advanced metaheuristic approach presented for the resolution of optimisation issues [10]. In previous studies, this method was found to achieve minimal feature evaluation, high speed, and great global and local finding capabilities [11,12]. The full potential of this strategy for addressing the FS job has yet to be discovered to our knowledge.
In this particular paper, our effort is to find the relevant aspects related to a particular disease by employing a novel discrete artificial gorilla troop optimization algorithm with various combinations of objective functions. Four different types of the proposed method are implemented here based on the number and type of objective function. These are: (1) single-objective (SO-DAGTO), (2) bi-objective (wrapper) (MO-DAGTO1), (3) bi-objective (filter wrapper hybrid) (MO-DAGTO2), and (4) tri-objective (filter wrapper hybrid) (MO-DAGTO3) in terms of feature selection in the medical era.
In this study, we have looked into the following objectives in particular:
  • To learn about the latest metaheuristic FS assignments as well as their benefits and drawbacks;
  • To propose a discrete version of the AGTO, entitled DAGTO, for handling FS work in the biomedical era;
  • To introduce a DAGTO with various combinations of objective functions to discover Pareto fronts for the FS work by simultaneously optimizing filter and wrapper conditions for the first time;
  • To boost the diversity of the population and speed up its convergence, we present an efficient and effective gorilla initialization technique based on label mutual information (MI);
  • To offer a comprehensive assessment report on the achievement of DAGTO in FS task using clinical information by executing four distinct variations of DAGTO according to the objective functions used;
  • To compare the introduced SO-DAGTO strategy to three standard single-objective mechanisms and MO-DAGTO approaches to four popular multi-criteria frameworks, and to prove if the offered strategies outcompete benchmarking approaches in minimizing feature subset width and rising accuracy rate;
  • To use the “knee point” concept for selecting the best one from the external repository, in the case of MO-DAGTOs; and
  • To validate the efficiency of the provided technique by testing with a real-world COVID-19 dataset.
The following is the paper’s structure. The background material is introduced in Section 2. The proposed approach is presented in Section 3, and the experimental setups and findings are discussed in Section 4 and Section 5 respectively. The strong points of the proposed approaches are listed in Section 6, whereas an application of the proposed method in real-world COVID-19 data is presented in the Section 7. Finally, Section 8 brings the paper to a close.

2. Background

2.1. Artificial Gorilla Troop Optimizer (AGTO)

AGTO is a novel metaheuristic approach based on the group behaviours of gorillas. Five distinct operators illustrated in Figure 1 are employed in the AGTO method regarding exploitation and exploration operations.
The optimization arena of the AGTO method has three types of solutions; P represents the location of the gorilla and G represents the location of the candidate gorilla formed in every step and operating if it outperforms the existing one. Finally, in each repetition, the “silverback” is the best alternative.

2.1.1. Exploration Phase

Regarding the exploration process, three separate strategies are used: migrating to an unseen site, migrating towards a recognised position, and movement to other gorillas. The technique of migration to an unknown place is chosen by using a parameter called p. When r a n d < p is used, the first mechanism is chosen. If rand is greater than or equal to 0.5 , the gorilla-to-gorilla moving method is chosen. On the other hand, if rand is less than 0.5 , the migration strategy to a known site is chosen.
Mathematically, these can be written as following:
G ( I t + 1 ) = ( u b l b ) · r n d 1 + l b ,       r a n d < p ( r n d 2 C ) · P r ( I t ) + L · H , r a n d 0.5   P ( I t ) L · ( L · ( P ( I t ) G r ( I t ) ) + r n d 3 · ( P ( I t ) G r ( I t ) ) ) , r a n d < 0.5 ,
where
  • P ( I t ) is the the gorilla’s present location;
  • P ( I t + 1 ) is the candidate gorilla location in the following I t iteration;
  • r n d 1 , r n d 2 , r n d 3 and r a n d are random numbers between 0 and 1;
  • p is a parameter with range in [0–1] that must be set prior to the optimization procedure;
  • u b and l b are the minimum and maximum values of the variables, respectively;
  • P r is a randomly chosen gorilla; and
  • G r is a randomly chosen candidate gorilla:
C = F · 1 I t m a x I t
F = c o s ( 2 · r n d 4 ) + 1
L = C · l
where I t is the current iteration, m a x I t is the maximum number of iterations to conduct, and l is a random number in the range [ 1 , 1 ] . The factor H in the Equation (1) is calculated as follows:
H = Z · P ( I t )
Z = [ C , C ] .
A group development activity is performed after the exploration activity by evaluating all G solutions and using the G ( I t ) solution as the P ( I t ) solution if the cost is G ( I t ) < P ( I t ) . As a result, the best solution found during this phase is referred to as a “silverback”.

2.1.2. Exploitation Phase

In this phase, the C value in Equation (2) is used to choose between two mechanisms; either by following the silverback (if C W ) or with a competition for adult females (if C < W ), where W is a pre-specified parameter.
  • Follow the Silverback: The silverback is a young and fit gorilla, and the other males in the troop are likewise young and sharply observe him. They also obey all of silverback’s commands to travel to diverse locations in search of food supplies and to stay with him. This behaviour is simulated by using the following Equation (7):
    G ( I t + 1 ) = L · M · P ( I t ) P S i l v e r b a c k + P ( I t )
    M = 1 N j = 1 N G j ( I t ) g 1 / g
    where
    g = 2 L .
  • Competition for Adult Females: When juvenile gorillas enter adolescence, they engage in risky competition with other males in order to pick grown-up females and to expand their troop. These brawls can extend for days and include several individuals. This process is simulated using Equation (10).
G ( I t + 1 ) = P S i l v e r b a c k ( P S i l v e r b a c k · Q P ( I t ) · Q ) · A
Q = 2 · r n d 5 1
A = β · E
E = N 1 , r n d 0.5 N 2 , r n d < 0.5
where Q simulates the impact force, A is the coefficient vector to assess the level of violence in a dispute, β is a preset parameter, and E replicates the impact of violence on solution dimensions.
A group development activity is performed after the exploration activity by evaluating all G solutions and using the G ( I t ) solution as the P ( I t ) solution if the cost is G ( I t ) < P ( I t ) . As a result, the optimal one found during this step is referred to as a “silverback”.

2.2. Single-Objective vs. Multi-Objective Optimization

The main purpose of single-objective optimization (SOP) is to identify the optimal solution, which refers to the lowest or maximum value of a single objective function that combines all multiple objectives into one. This type of optimization is useful as a tool for providing planners with information about the problem at hand, but it rarely provides a set of potential solutions that trade-off distinct objectives.
On the other hand, in a multi-objective optimization (MOP) with competing objectives, there is no single best solution. The interplay of several objectives results in a collection of compromised solutions, which are sometimes referred to as trade-offs, non-dominated, non-inferior, or Pareto-optimal options. A fitness comparison is used to establish a candidate’s superiority over other alternatives in an SOP. Despite this, the idea of dominance is used in MOP to assess the merit of a potential solution. If the following two requirements are true, a solution A 1 in the feasible region of a C-objective problem dominates another solution,  A 2 .
  • For all C: A1 is not inferior than A2
  • There is a c: A1c is surely superior than A2c

2.3. Related Work

FS techniques can be divided into three categories: embedded, filter, and wrapper. Here, for the first time, we have explored AGTO in the domain of feature selection for medical data, and we have also considered both filter and wrapper characteristics during optimization. Therefore, the following subsections briefly describe the existing work on both filter-based and wrapper-based FS techniques.

2.3.1. Filter-Based FS Techniques

Focus [13] and Relief [14] are two non-metaheuristic-based filter approaches for the FS task. The Relief approach assigns a weight to each characteristic based on how important it is. The fundamental disadvantage of this strategy is that it does not take into account feature redundancy. On the other hand, one of the most well-known filter methods is the Focus algorithm, which does a comprehensive search to analyze the whole subset of potential characteristics, which is computationally intensive and often impossible. Furthermore, employing information theory ideas, filter approaches such as mRmR [15] and MIFS [16] attempt to improve the efficiency of the FS algorithm.
Starting with metaheuristic-based filter techniques for FS problems, in [17], NSGAII looked at developing two filter techniques—NSGAIIMI and NSGAIIE—by using MI and entropy as the assessment criteria, respectively. Recently, a text feature selection technique based on a filter-based multi-objective algorithm was proposed in [18]. A text feature’s significance is determined by using the relative discriminative criterion (RDC), whereas redundancy is determined by using the correlation measure. In [19], the authors employed rough set theory and MOBPSO to implement filter-based FS. There were two multi-objective filter FS methods proposed in [20], both of which employed BPSO, modified MI, and entropy to perform superior classification. Three multi-objective ABC techniques (MOABC) were developed in [21], focusing on information theory and incorporating three filter objectives.
The authors of [22] have provided two new filter FS approaches for classification issues based on binary PSO and information theory. The first approach utilizes BPSO and the MI between each pair of attributes to assess the subset’s significance and duplication, whereas the second approach examines the relevance and duplication of the chosen feature subset by using BPSO and the entropy of each feature group. To control duplicate and undesired aspects in a dataset, the work in [23] introduced a filter technique employing an elitism-based MODE for FS, entitled FAEMODE. The uniqueness lies in this algorithm’s objective preparation, which takes into account linear as well as non-linear interdependence among feature sets. Two alternative multi-objective filter-based FS architectures built on the boolean cuckoo optimization technique, utilising the concept of non-dominated sorting GAs, NSGAIII (BCNSG3), and NSGAII (BCNSG2), have been proposed in [24]. To this end, four different multi-objective filter-based FS techniques were developed, each using MI and gain ratio-based entropy as filter assessment measurements.

2.3.2. Wrapper-Based FS Techniques

Wrapper approaches look at the quasi-optimal sub-strings of attributes and a classification model for the wrapper process. According to the searching process, these can be divided into two types: metaheuristic-based and non-metaheuristic-based. Branch & Bound [25], SFS [26], and SBS [27] are some of the most well-known non-metaheuristic-based FS methods. Despite their simplicity of design, these strategies have issues such as convergence to a local optimal and considerable computation complexity in big datasets. Both the SFS and SBS approaches contain a structural flaw, which means that the already appended (or discarded) characteristics cannot be eliminated (or inserted) in subsequent phases [1]. SFFS and SBFS have been presented as solutions to this problem [28]. However, these algorithms advancements have not been able to overcome the local optima convergence issue [13].
Researchers have applied metaheuristic algorithms to tackle the challenges outlined above and utilize improved discovery procedure. These techniques develop and rate several alternatives at the same time and provide a more comprehensive global search than conventional techniques because they are population-based. Furthermore, single-objective wrapper approaches often follow the goals of lowering feature subset size by maximising classification efficiency, or a combination of these goals. Some of the most popular evolutionary methods used for single-objective FS are: GA [29], PSO [30], WOA [31], GWO [32], FPA [33], ABC [34], ACO [35], GP [36], and FOA [37].
Due to the concurrent examination of numerous, frequently competing demands and the delivery of a sequence of non-dominated (ND) options, multi-objective FS strategies seem to be the subject of major research in recent years. An innovative technique, called MOGWO, was presented in [38], wherein a reservoir is used in this strategy to keep the ND options. Currently, in another work, a MOQBHHO method for identifying needed aspects affecting different diseases, has been introduced [39]. Authors additionally demonstrated the efficacy of the suggested strategy by matching its findings to those of deep-based AE and TSFS. In [40], a bi-objective FOA was proposed for handling the FS challenge. Several of the latest published articles [41,42,43] have focused on fixing the FS problem and improving the classifier’s variables at the same time. For more on multi-objective FS approaches, one can refer to [20,44,45,46,47,48,49,50].
To tackle the FS challenge, many variants of genetic algorithms (GA) have been suggested. Chromosomes are binary in primitive form; when a feature is chosen, the associated gene value is 1; otherwise, it is 0 [1]. In addition, a hybrid wrapper-embedded strategy to handle the FS problem is proposed in [29], wherein the proposed algorithm aims to carry out feature selection and create the prediction model by using the novel chromosomal expression technique at the same time. A hybrid technique combining the PSO algorithm and local search is presented in [51]. Local search is used in this research to choose the fewest and most differentiating criteria while also directing the PSO search. Finally, the authors of [52] developed a new strategy for particle initialization and updating to improve the performance of PSO in FS.

2.3.3. Hybrid Filter-Wrapper FS Techniques

Studies in the last few years have shown that merging the filter with the wrapper technique can produce outstanding results, as in [53], where two filter and one wrapper criteria are handled by multi-objective GA. With mutual information as filter fitness, a new multi-objective GWO for FS is proposed in [54], and the generated solutions are enhanced toward higher classification results by the use of wrapper fitness. In another similar work [55], a hybrid bat algorithm (BA) based on MI and naive Bayes, called BAMI, is introduced. A strategy based on filter-GA for FS, known as the GAFFS technique, has been presented in [56]. Information gain, gain ratio, ReliefF, chi-square, and correlation feature selector were chosen for selecting the most promising attributes from real-world datasets. To pick the most relevant features, GA is then applied with chromosomal fitness measured by using the KNN classifier’s classification accuracy. By using the whale optimization technique (WOA), a new hybrid filter-wrapper FS solution is suggested in [57]. This technique is a multi-objective one that optimizes both filter and wrapper fitness in a concurrent way. The effectiveness of this approach is proved on twelve standard datasets by a thorough evaluation with seven popular algorithms.
Though many researchers have assumed their approach to be multi-objective, this implies that the optimization procedure is simultaneously taking place, but the FS is still limited to one objective task as they optimize the objective functions sequentially during the filter and wrapper stages, respectively.

3. Proposed Techniques

As per the study in the previous part, the discrete form of AGTO to fix Boolean optimization jobs like FS has not been entrenched so far. Moreover, there has been no proposal to use AGTO as a SOP or MOP to address FS. There is an initiatory distinct AGTO in this section, which tried to address the FS challenge in medical data quarry by taking into account both the SOP and MOP aspects of the problem. Specifically, the four proposed variants of DAGTO differ in the number and types of objective functions used. Therefore, for clear understanding, we have divided the proposed techniques into two main categories, which are single-objective DAGTO (SO-DAGTO) and multi-objective DAGTOs (MO-DAGTO1, MO-DAGTO2, and MO-DAGTO3). The original AGTO was employed for solving continuous optimization tasks [10]. However, FS is treated as a discrete optimization problem and therefore, the following modifications to the various steps are required. The details of all the proposed variants are given in the following subsections.

3.1. Single-Objective DAGTO (SO-DAGTO)

  • Steps of Single-objective DAGTO: The step-by-step procedure for single-objective DAGTO is given below:
    (a)
    Step 1: Gorilla initialization based on MI: The goal of FS is to get rid of features that are not needed. During initialization of the gorilla, the insignificant features should have fewer chances to participate in the optimization process and reducing the gorilla’s search space. The MI, which is more sensitive to non-linear dependency, is used in this paper to quantify the amount of information shared between two variables (e.g., feature and class attribute). It is expressed as
    M I ( f i , c l a s s ) = H ( f i ) H ( f i | c l a s s ) ,
    where H ( f i ) is the entropy for f i and H ( f i | c l a s s ) is the conditional entropy for f i given c l a s s . The greater the MI value of a feature f i , the more important it is, and the more likely it is to be picked up as an initial selection. Based on this concept, we define a probability to determine the likelihood of a feature being picked up by an initial gorilla. It is defined as
    p r o b i = M I ( f i , c l a s s ) m a x ( M I ( f k , c l a s s ) ) , w h e r e k = 1 , 2 , , L .
    Furthermore, the greater the MI value of a feature, the higher the feature’s likelihood. A gorilla is created based on the likelihood of p r o b i by selecting its elements one by one from the entire feature set. As an example, the components of the ith gorilla, i.e., P i = ( p i 1 , p i 2 , , p i L ) , are chosen in the following manner. Specifically, a feature will be selected for the gorilla,
    p i k = 1 if r n d < p r o b k 0 otherwise ,
    where r n d is a random number between 0 and 1, and p r o b k is the probability of the kth feature. We have used this initialization technique for 70 % of the gorilla population. The positions of the rest of the gorillas are randomly initialized to enhance the diversity of the population.
    (b)
    Step 2: Fitness Assessment: To assess the individual solution in wrapper FS, a fitness/objective function is necessary. Feature selection’s main purpose is to improve prediction accuracy while reducing the number of characteristics. More to the point, the objective function (OF), which includes both criteria, is used in this variant of the proposed work and is described as [58],
    O F ( P ) = α · c l a s s i f i c a t i o n _ e r r o r + ( 1 α ) · L S L ,
    where c l a s s i f i c a t i o n _ e r r o r is the error rate of the learning algorithm, L S is the size of the feature substring, L is the original dimension, and α (here α = 0.99 ) is a control parameter for the effect of classification performance and feature size.
    (c)
    Step 3: Gorilla Location Update: Each gorilla position is initially updated by using Equation (1) and it is denoted as Δ p i L ( I t + 1 ) . As FS is a discrete optimization issue, a sigmoid transfer function is applied here to transmute the original AGTO to DAGTO, i.e., to compute the probability value by using the following Equation (18),
    T ( Δ p i L ( I t + 1 ) ) = s i g m o i d ( Δ p i L ( I t + 1 ) ) = 1 1 + e 2 Δ p i L ( I t + 1 ) .
    Then, each candidate gorilla location is calculated in the discrete domain by using the following Equation (19),
    p i L ( I t + 1 ) = 1 , r n d < T ( Δ p i L ( I t + 1 ) ) 0 , otherwise ,
    where i is the ith bit of the gorilla’s position, L is the true dimension and r n d is a random number in the range [ 0 , 1 ] . In following, each candidate gorilla is evaluated by using the fitness function given in Equation (17) and if new locations are found to be better than the older ones, then the corresponding replacement occurs. As a next step, a “silverback” solution is chosen, which is the best option for the updated population to continue the exploitation phase. In this stage, depending on the value of C and W described in Section 2.1.2, Equations (7) and (10) are used to alter the location of individual gorillas in the population. More to the point, the continuous location space is converted to a discrete one by employing the above Equation (19). If an updated gorilla position is found to be fitter than the existing one, it will be replaced.
    (d)
    Step 4: Finding Silverback: At the completion of every repetition, the fittest alternative (the one having a minimum O F value) is treated as a temporary silverback solution. As a result, it is compared with the older silverback and if it is found to be better, it replaces the existing one, otherwise not.
  • Algorithm for Single-objective DAGTO for FS: The detailed algorithm for the proposed single-objective DAGTO is given in Algorithm 1. There, in line 8, the exploration phase starts, while in line 14, the group is created. Furthermore, in line 18 the exploitation phase is taking place, whereas the corresponding group is created in line 26.
    Algorithm 1 Single-objective DAGTO for FS
    1:
    input Population size N, maximum number of iterations m a x I t and parameters β and p
    2:
    output The s i l v e r b a c k and its O F value
    3:
    Set the initial gorilla location P i ( i = 1 , 2 , , N ) as given in Section 3.1
    4:
    Compute the O F of each gorilla P i
    5:
    for C O U N T 1 to m a x I t  do
    6:
       Alter the C using Equation (2)
    7:
       Alter the L using Equation (4)
    8:
       for all Gorilla P i do
    9:
          Alter the gorilla location by Equation (1)
    10:
        Apply sigmoid to convert the gorilla location into probability figure
    11:
        Compute candidate gorilla position in discrete domain using Equation (19)
    12:
      end for
    13:
      for i 1 to N do
    14:
         Compute O F of each candidate gorilla ( G i )
    15:
         If G i is fitter than P i , replace it, where G is the candidate Gorilla location
    16:
      end for
    17:
      Set best location as the Silverback
    18:
      for all Gorilla P i do
    19:
          if C W then
    20:
              Alter the Gorilla location employing Equations (7) and (19)
    21:
          else
    22:
              Alter the Gorilla position applying Equations (10) and (19)
    23:
         end if
    24:
      end for
    25:
      for i 1 to N do
    26:
         Compute O F of each candidate gorilla ( G i )
    27:
         If G i is fitter than P i , replace it, where G is the candidate Gorilla location
    28:
      end for
    29:
      Set best location as the Silverback
    30:
    end for
  • Complexity Analysis: The computational complexity of single-objective DAGTO for FS depends on three important steps: initialization, O F evaluation, and update of the gorilla location. The initialization of gorilla, as explained in Section 3.1, requires O ( N · L ) basic operations, and the evaluation of all gorillas needs the calculation of O F and N. The complexity of the position update procedure depends on both exploration and exploitation stages. In each case, an update operation is executed on all the gorilla solutions, and the fittest one is selected, requiring
    O ( m a x I t · N ) + O ( m a x I t · N · L ) · 2 .
    Thus, the total computational complexity of single-objective DAGTO is
    O ( N · L ) + N · [ O ( L ) + O ( Q · L ) ] + O ( m a x I t · N ) + O ( m a x I t · N · L ) · 2 , where Q is the number of samples in the training dataset and the computational complexity of KNN model on Q samples is O ( Q · L ) .

3.2. Multi-Objective DAGTOs (MO-DAGTO1, MO-DAGTO2, and MO-DAGTO3)

  • Steps of Multi-objective DAGTOs: The step-by-step procedure of the proposed MO-DAGTOs is illustrated in Figure 2 and the three different cases are elaborated upon below.
    (a)
    Step 1: Gorilla Initialization based on MI: For all these three variants of the proposed multi-objective DAGTO, the exact same population initialization strategy, based on MI, is followed as described in Section 3.1.
    (b)
    Step 2: Fitness Assessment:
    • MO-DAGTO1: This first variant treats FS as a two-objective discrete optimization task whose intention is to reduce the feature dimension and simultaneously improve the classification efficiency. Therefore, each gorilla in the population is assessed by employing the following two O F s [48],
      O F 1 ( P ) = j = 1 L p j , i f p j = 1 ,
      where P is the location string of a gorilla with length L.
      O F 2 ( P ) = C l a s s i f i c a t i o n A c c u r a c y
    • MO-DAGTO2: This second variant considers FS as a two-criteria hybrid filter wrapper optimization challenge. The sole aim of FS is to shrink the number of attributes along with the classification error. Thus, the first objective function is formulated by using Equation (22) [58],
      O F 1 ( P ) = α · c l a s s i f i c a t i o n _ e r r o r + 1 α · L S L ,
      where
      c l a s s i f i c a t i o n _ e r r o r = # Wrongly Predicted Samples # Total Samples
      and L S is the length of the feature substring, L is the total feature_count, and α is a controlling parameter, as already mentioned.
      In order to select the appropriate characteristics, one must look for a group of features that collectively have the most relevance to the target and the least redundancy among themselves. Therefore, the maximum of the correlation among the attribute substring, the target attribute and the reduction of the dependency between the characteristics in an attribute substring are normally emphasised for FS purposes. MI as well as the Pearson correlation coefficient (PCC) are typical characteristics of relevance or interdependency. This motivated us to formulate the second objective function by using Equation (24) [23], which we aim to maximize,
      O F 2 ( P ) = 1 L S M I f i , c l a s s · 1 L S P C C f i , c l a s s ,
      where f i are the discrete characteristics present in the feature subgroup and c l a s s is a class attribute.
    • MO-DAGTO3: When two characteristics are highly linked, removing one does not have a significant impact on the prediction strength of the other. As a result, unnecessary characteristics can be removed by reducing their interdependence. Therefore, this variant treats FS as a tri-objective hybrid filter-wrapper optimization task, wherein the first two O F s are the same as in MO-DAGTO2, and the third O F is calculated as
      O F 3 ( P ) = 1 L S 2 M I ( f i , f j ) · 1 L S 2 P C C ( f i , f j ) .
      O F 3 measures both linear and non-linear dependence between variables in a feature space in this case. As a result, reducing O F 3 may result in a redundancy reduction.
    (c)
    Step 3: Repository Maintenance: After the exploration and exploitation phases of the proposed variants, an external storehouse is needed to keep all the non-dominated (ND) solutions so far because any multi-objective approach outputs a set of Pareto solutions rather than one. When inserting a new ND solution N A n e w into the external repository, the following situations may arise and the following actions should be taken.
    • If the N A n e w is dominated by any member of the external repository, then it is discarded.
    • If any existing member of the repository is dominated by the new one, then N A n e w will replace that solution.
    • Insert N A n e w into the external repository if N A n e w and the archive members are not dominating each other, i.e., they are all non-dominated solutions, and the repository capacity is greater than the current repository size.
    • If neither N A n e w nor the current repository solutions are dominated, but the repository overflows, throw any solution from the most gathered region and then push N A n e w to the archive [48].
    The pictorial representation of repository update is illustrated in Figure 3.
    (d)
    Step 4: Gorilla Location Update: Each gorilla position is first updated by using Equation (1), and it can be denoted as Δ p i L ( I t + 1 ) . As FS is a discrete optimization issue, a sigmoid transfer function is applied here to convert the original AGTO to DAGTO, i.e., to compute the probability value by using Equation (18). Then each candidate gorilla location is calculated in the discrete domain by using Equation (19). After that, each candidate gorilla is evaluated by using the objective functions according to their variants, and if the older gorilla location is dominated by the new location, then the corresponding replacement occurs. As the MO-DAGTO1, MO-DAGTO2, and MO-DAGTO3 are all under the category of multi-objective optimization problems, a solution A 1 is considered better than another solution A 2 if A 1 is not dominated by A 2 as explained in Section 2.2. Then, a silverback solution is chosen, which is the best solution for the updated population to continue the exploitation phase. A silverback is picked from the top 10 % of the reservoir, which is arranged in decreasing order of C D , during the exploration and exploitation process. As a result, selecting a solution from the front of the repository implies the selection of the best option from the unique solution existing in the less populated region. In the exploitation phase, depending on the value of C and W described in Section 2.1.2, Equations (7) and (10) are used to twist the location of individual gorillas in the population. As a result, the continuous location space is converted to a discrete one by employing Equation (19). If an updated gorilla position dominates the existing one, it will be replaced.
    (e)
    Step 5: Returning the Best Solution by using the Concept of Knee: The external repository contains all the solutions that are not mutually dominated after the given number of iterations maxIt. As a screening method, the “knee point” concept is utilized here to pick an optimum combination of features from a group of non-dominated ones [59,60].
  • Complexity Analysis: The initialization and fitness computation of gorillas, as explained in Section 3.1 for single-objective DAGTO, requires O ( N · L ) + N · [ O ( L ) + O ( Q · L ) ] . During each iteration, the position update, the fitness calculation, the finding of non-dominated solutions, and the selection of silverback operations are performed twice, one for exploration and another for exploitation phase. Mathematically, this can be calculated as:
    O ( p o s i t i o n u p d a t e ) + O ( f i t n e s s c a l c u l a t i o n ) + O ( f i n d i n g N D s o l u t i o n s ) + O ( s e l e c t i n g s i l v e r b a c k ) =     m a x I t · [ 2 · [ O ( N · L ) + N · [ O ( L ) + O ( Q · L ) + O ( C · N · l o g N ) + O ( C · N · l o g N ) ] ] ] .
    Here, we have used the idea of a dominance tree for extracting the Pareto solutions, which reduces the number of comparisons, causing a complexity of O ( C · N · l o g N ) . To find the silverback, the repository needs to be arranged on the basis of decreasing C D values, which requires O ( C · N · l o g N ) . The final output of multi-objective DAGTO is the best of the repository and it is chosen as the knee point in the Pareto front. The complexity of calculating the knee point (assuming the case when all N solutions are in the repository) is O ( N ) . Finally, the time complexity of the proposed multi-objective techniques can be expressed as
    O ( M u l t i o b j e c t i v e D A G T O s f o r F S ) = O ( N · L ) + N · [ O ( L ) + O ( Q · L ) ] +               m a x I t · [ 2 · [ O ( N · L ) + N · [ O ( L ) + O ( Q · L ) + O ( C · N · l o g N ) + O ( C · N · l o g N ) ] ] ] + O ( N )                         O ( N l o g N ) .

4. Setup for the Experiments

4.1. Datasets

The evaluation of all the DAGTO variants is tested by taking seven standard medical datasets of varied width from UCI and three microarray cancer datasets [61]. The details of each dataset are depicted in Table 1. In this study, a KNN classifier with a k value equal to 5 is used to determine the classification accuracy on normalized data.

4.2. Benchmark Methods and Performance Criteria for Comparison

The performance of the proposed single-objective DAGTO is compared with three standard methods; these are HLBDA [62], BSHO [63], and QBHHO [58]. In this study, the mean fitness value, the average accuracy, the average feature size [58], and the average execution time are used as evaluation criteria. Similarly, the performance of the best multi-objective DAGTO is checked with four other benchmark multi-objective FS techniques, namely NSGA-II [64], BMOFOA [40], FW-GPAWOA [57], and BMOChOA [65]. The four very popular multi-objective performance indicators, which are IGD, HV, Spread, and SCC [65], are used to compare the efficiency of the multi-objective FS techniques to solve feature selection jobs in healthcare data. A population size of 20 and 100 iterations are set in all algorithms to have a fair assessment. Each dataset was subjected to a total of 20 separate runs of each method, and they were implemented in Python 3.7 on an Intel Core i3-7020U CPU @ 2.30 GHz and a 4.00 GB RAM machine. This setup concerns the evaluation of both experiments presented in Section 5 and Section 7.

4.3. Parameter Settings

The user-defined parameter values for implementing all the above-mentioned single-objective and multi-objective approaches are listed in Table 2. The KNN method with k = 5 and 10-fold cross validation is used to grade the subset of identified factors. The KNN approach has a lower algorithmic expense, resulting thus in a lower overall overhead of the wrapper technique.

4.4. Design of Experiments

In this section, a list of nine experiments utilized in this research is discussed. All the experiments are conducted for each of the ten aforementioned datasets.
  • Single-objective DAGTO
    • Experiment 1: Performance comparison of the proposed single-objective DAGTO with other benchmark methods, like HLBDA, BSHO, and QBHHO.
    • Experiment 2: Convergence analysis of all the four single-objective FS approaches.
    • Experiment 3: Implementation of a Wilcoxon signed rank test to prove the superiority of the proposed approach.
  • Multi-objective DAGTOs
    • Experiment 4: Performance comparison between all the proposed multi-objective DAGTO variants using the multi-objective performance indicators discussed in Section 4.2.
    • Experiment 5: Conduct of a Wilcoxon signed rank test on hyper volume (HV) to verify the efficiency of the best variant out of three.
    • Experiment 6: Performance comparison between best variants of multi-objective DAGTO and four well-known multi-objective FS techniques using average feature size and average classification accuracy for equitable and fair comparison.
    • Experiment 7: Conduct of a Wilcoxon signed rank test on HV to verify the significance of the proposed approach with respect to the others (NSGA-II, BMOFOA, BMOChOA, and FW-GPAWOA).
    • Experiment 8: Execution time comparison.
    • Experiment 9: Comparison between the proposed SO-DAGTO and the best of the MO-DAGTOs, which is proven to be MO-DAGTO2.

5. Experimental Results and Discussion

5.1. Single-Objective DAGTO

5.1.1. Experiment 1

This section compares the proposed single-objective DAGTO to three well-known algorithms, namely HLBDA [62], BSHO [63], and QBHHO [58]. Three performance assessment metrics are computed to assess the efficiency of single-objective DAGTO. These measures are the mean fitness value, the average classification accuracy, and the average feature substring length. Each approach is executed 20 times due to the stochastic nature of the optimization procedure. Concretely, after 20 separate runs, the averages of the findings are gathered and reposed in Table 3. Table 3 shows that the proposed SO-DAGTO correctly predicted the optimal mean fitness values on eight out of ten datasets. SO-DAGTO consistently outperformed other methods both in finding the optimal feature subset in four datasets and in finding the optimal feature subset because they gave slightly higher average accuracy in the remaining six datasets with a large difference in the number of selected features. For example, the BSHO produces 0.3 % better accuracy at the cost of 7 extra features in the Lymphography dataset. Similarly, for datasets such as Cervical Cancer, Arrhythmia, SRBCT, and Leukemia, the presented single-objective DAGTO is able to choose fewer but more significant components causing the diseases.

5.1.2. Experiment 2

Figure 4 illustrates the convergence curves for each of the four single-objective FS methods on 10 different datasets. When it comes to determining the best feature subset, the suggested DAGTO outperformed other approaches because of its superior convergence behaviour due to the excellent gorilla initialization technique based on label MI. For example, DAGTO is capable of converging more quickly and deeper to identify the global optimum by using the datasets of Lymphography, Diabetic, Cardiotocography, Cervical Cancer, Arrhythmia, Parkinson, and Colon Tumor because GTO’s capacity to discover and utilize the search space is quite excellent and exceptional. The results clearly depict the advantages of the suggested DAGTO in all dimensions of the FS challenges.

5.1.3. Experiment 3

The Wilcoxon signed rank test [58] is used in this study to do pairwise comparisons of data. In this statistical test, if the p-value is greater than 0.05 , then the performance of the two approaches is determined to be comparable, i.e., =; otherwise, the two methods are substantially different in the comparison, i.e., + in positive significance and − in negative significance). The Wilcoxon test findings on the mean fitness values of SO-DAGTO with other approaches are given in Table 4. The suggested SO-DAGTO achieved much lower mean fitness value than its rivals in the majority of circumstances.

5.2. Multi-Objective DAGTOs

In each of the 20 separate runs, all the multi-objective FS techniques yield distinct subsets of non-dominated traits for each dataset. For comparison, the feature subsets offered by each technique are unified into one set. The non-dominated solutions are chosen as the best Pareto fronts from this group and in following, they are compared. To present an equitable comparison between the best results of all the four multi-objective approaches, we have taken the number of features and the corresponding classification accuracy values irrespective of the objective functions taken by them.

5.2.1. Experiment 4

In this experiment, the performance of the three proposed MO-DAGTOs are compared and verified in terms of the average accuracy, and the average feature size; to assess the quality of the Pareto front, the HV along with the number of Pareto solutions are used. The goal of HV is to find the portion of the objective plane that is bordered by the Pareto front and a reference point r [66]. Because it can measure both convergence and diversity of the solutions, this indicator is commonly used to compare multi-objective optimization techniques. Table 5 enlists the value of average accuracy, average feature size, average HV, and average number of Pareto solutions produced by each of the three MO-DAGTOs from 20 runs for each dataset. For the Lymphography, Diabetic, Cervical Cancer, SRBCT, and Leukemia datasets, the MO-DAGTO2 achieves the highest average classification accuracy with a satisfactory number of features as compared to the other two. On the other hand, the MO-DAGTO1 performs well in predicting the most relevant features in the cases of Cardiotocography, Lung Cancer and Colon Tumor datasets. Regarding Arrhythmia and Parkinson datasets, MO-DAGTO3 proved its efficiency in the FS task. From Table 5, it can be derived that the HV of the obtained Pareto fronts by MO-DGATO2 is more than the corresponding of MO-DAGTO1 and MO-DAGTO3 for nine out of ten datasets indicating the higher convergence speed and diverged solutions. For most of the datasets, the MO-DAGTO3 contains more number of solutions in its Pareto front because it optimizes three criteria at a time and the amount of alternatives increases with the increase of number of objective functions.

5.2.2. Experiment 5

Previous experiment 4 reveals that the overall performance of the MO-DAGTO2 is better than that of the MO-DAGTO1 and MO-DAGTO3. Therefore, in this experiment, we have applied the Wilcoxon signed rank test to statistically prove the excellency of MO-DAGTO2 over others. Based on the acquired testing Pareto fronts, 20 HVs are computed for each of the three variations across the 20 separate runs, and in following the Wilcoxon test ( α = 0.05 ) is used to check if there is a substantial difference between the approaches by examining the hypotheses listed below.
  • Null hypothesis (p-value > α ): performance of MO-DAGTO2 is similar “=” to that of MO-DAGTO1 and MO-DAGTO3.
  • Alternative hypothesis (p-value < α ): performance of MO-DAGTO2 is significantly superior “+” (or inferior “−”) to that of MO-DAGTO1 and MO-DAGTO3.
Table 6 depicts that the HV metrics produced by MO-DAGTO2 are substantially better than those generated by MO-DAGTO3 for nine datasets and markedly worse for one dataset, whereas the improved efficiency of the introduced MO-DAGTO2 method is more clear when compared to MO-DAGTO1. MO-DAGTO2 produces similar or considerably better outcomes than its competitors in 19 of the 20 p-values (2 methods, 10 datasets), and it obtains significantly poorer performance in just one of the 20 p-values.

5.2.3. Experiment 6

In experiment 5, we found that MO-DAGTO2 is superior as compared to MO-DAGTO1 and MO-DAGTO3 in producing better Pareto fronts in the majority of the datasets. Therefore, in this experiment, we have compared the performance results of MO-DAGTO2 with those of four other benchmark multi-objective FS strategies, namely NSGA-II, BMOFOA, BMOChOA, and FW-GPAWOA. For each dataset, Table 7 enlists the value of average classification accuracy, average feature size, and four multi-objective performance indicators (IGD, HV, Spread, and SCC) as described in Section 4.2, for all the five multi-objective FS approaches. The entries of Table 7 indicate that the HV values of the Pareto fronts generated by MO-DAGTO2 for eight (Lymphography, Cardiotocography, Cervical Cancer, Lung Cancer, Arrhythmia, Parkinson, Colon Tumor, and Leukemia) out of ten datasets are high as compared to others. Moreover, the IGD and Spread values are quite satisfactory in case of MO-DAGTO2, proving its efficiency in producing a Pareto front that is closer to the actual Pareto front and that covers a larger area in the objective plane. This excellence may be due to the efficient gorilla initialization strategy proposed in this research. The high SCC value entries by MO-DAGTO2 for most of the datasets reveals that the number of common elements between actual Pareto fronts and the calculated Pareto fronts by MO-DAGTO2 is greater than that of the other four techniques.
The Pareto solutions of NSGA-II, BMOFOA, BMOChOA, FW-GPAWOA, and MO-DAGTO2 are illustrated in Figure 5. The average number of chosen characteristics is shown in the x direction of each plot, and the average classification accuracy is shown on the y axis. The actual dimension of each dataset along with its corresponding classification accuracy is depicted at the top of each plot. According to Figure 5, in nine out of ten datasets, we notice that the Pareto fronts obtained by MO-DAGTO2 dominate others. According to the graphical findings of Cervical Cancer dataset, the Pareto front of MO-DAGTO2 is marginally dominated by the fronts of NSGA-II, BMOFOA, and BMOChOA. We can also observe that the optimum Pareto front created by MO-DAGTO2 in nine datasets has solutions that pick fewer than half of the entire number of features and improves classification accuracy above that offered by using all attributes. For example, in the case of Lymphography, by selecting only 22 % of the original features on average, our approach was able to boost the accuracy from 79 % to 82.5 % . Similarly, regarding the Cardiotocography dataset, by taking into consideration a reduced dataset with only 14 % of the actual dimension, the MO-DAGTO2 produces 5.2 % more accuracy. In the high dimensional datasets, MO-DAGTO2 enhances the classification accuracy from 64 % to 90.3 % considering only 4 % of the actual width in SRBCT and from 77.9 % to 87.2 % in the Leukemia dataset by taking only 13 % relevant features. In the case of the Lymphography, Diabetic, Cervical Cancer, Arrhythmia, Parkinson, SRBCT, and Leukemia datasets, the Pareto fronts of MO-DAGTO2 and FW-GPAWOA are very close to each other as they both are hybridization of wrapper and filter approaches, and they exhibit rapid convergence as compared to others. For Parkinson samples, when the number of features is between 2 and 94, the optimal front of FW-GPAWOA is present above the front of MO-DAGTO2. However, when the number of selected features increased above 94, the MO-DAGTO2 started performing well in the FS task. The performance of the BMOChOA is also quite satisfactory in executing the FS task in the Cervical Cancer and Leukemia datasets.

5.2.4. Experiment 7

Table 8 presents the p-values of the Wilcoxon signed rank test on HV metrics for MO-DAGTO2 against NSGA-II, BMOFOA, BMOChOA, and FW-GPAWOA. For nine datasets, the HV metrics for MO-DAGTO2 are substantially better than those produced by NSGA-II, BMOFOA, and BMOChOA. They are similar for one dataset with NSGA-II (Diabetic) and BMOChOA (Lymphography), equivalent for three datasets with FW-GPAWOA (Diabetic, Lung Cancer, and Leukemia), and significantly worse for one dataset with BMOFOA (Diabetic), and BMOChOA (SRBCT). In general, for the 40 p-values (4 methods and 10 datasets), our proposed algorithm MO-DAGTO2 produced equivalent or considerably better findings in 38 instances, and significantly worse results in just two situations.

5.2.5. Experiment 8

Over the 20 separate runs, Table 9 displays the average execution duration (in minutes) of NSGA-II, BMOFOA, BMOChOA, FW-GPAWOA, and MO-DAGTO2. It is important to notice that all methods have the same population size as well as the same number of repetitions, and were executed on the same machine. Table 9 reports that for the majority of datasets (Lymphography, Diabetic, Cervical Cancer, Lung Cancer, Arrhythmia, Parkinson, Colon Tumor, SRBCT, and Leukemia), our strategy takes longer to execute than the other alternatives. Although the suggested technique picks smaller feature subsets, which should result in fewer wrapper assessments, it thus requires less computing time. Consequently, the reason why the proposed technique takes longer may be explained by the fact that it explores and exploits all of the options available to the population. As a result, each time transfer function must convert continuous to discrete values, which is a rather time-consuming procedure. In addition, when the external archive is full, it employs the crowding distance in the archiving strategy and the deletion process; this distance needs a significant computational cost. It computes the knee point once more in the ending to discover the best of the repository. Because two populations are mixed in NSGA-II and separate fronts are determined at each iteration, the average execution time of NSGA-II and MO-DAGTO2 is quite close in most circumstances. However, in BMOChOA, each gorilla is assigned to either exploration or exploitation phase, depending on the parameter μ . This might be one of the reasons for its faster execution. Although both FW-GPAWOA and the proposed MO-DAGTO2 constitute hybridization of filter and wrapper techniques, the running time of the latter is longer than the former because FW-GPAWOA calculates only MI in its filter evaluation, whereas MO-DAGTO computes both MI and PCC to find its second fitness criteria.

5.2.6. Experiment 9

This experiment pertains to the comparison of the efficiency of the proposed SO-DAGTO and MO-DAGTO2 toward the FS task in predicting a particular disease. Regarding the classification performance introduced in Table 10, in six out of ten datasets (Lymphography, Diabetic, Cervical Cancer, Arrhythmia, SRBCT, and Leukemia), MO-DAGTO2 outputs a higher accuracy value by considering relatively fewer features. Specifically, in the case of Cardiotocography and Lung Cancer datasets, SO-DAGTO is able to achieve 2 % and 5 % more accuracy at the cost of 5 and 20 additional features, respectively. The performance of SO-DAGTO for Parkinson and Colon Tumor datasets is very attractive in terms of classification accuracy at the expense of a larger number of features as compared to MO-DAGTO2. In the end, we have employed the knee point concept [59,60] to filter the best of the optimum solutions present in the external repository. However, for datasets having flat extrema, we have selected the same by using the CD measure. According to the entries in the last column of the Table 10, for the Cervical Cancer, Lung Cancer, Parkinson, Colon Tumor, SRBCT, and Leukemia datasets, MO-DAGTO2 is capable of extracting the best optimum solution in terms of both the number of features and the classification accuracy. In particular, for high-dimensional datasets like Colon Tumor, SRBCT, and Leukemia, the efficiency of MO-DAGTO2 is quite satisfactory in solving the FS job. Overall, we can state that both SO-DAGTO and MO-DAGTO2 have proven to be the best in solving the FS task in medical datasets of variable dimensions. However, researchers are nowadays concentrating more on multi-objective FS approaches because they help practitioners take their vital decisions on the basis of multiple alternatives at hand.

6. Advantages (Pros) of the Suggested Fs Methods

After a careful review of the results of the preceding subsections, it can be inferred that the presented MO-DAGTO2 approach might be an effective candidate for removing extraneous elements from health data. The power of the introduced MO-DAGTO2 method can be summarized as follows.
  • This is the first attempt to apply MO-DAGTO for solving discrete optimization tasks such as feature selection.
  • MO-DAGTO2 is a multi-objective approach and thus can help medical professionals to make better decisions due to the availability of a large number of optimal solutions.
  • It can simultaneously optimize filter and wrapper criteria, resulting in a stronger set of features that really affect the disease and by using which, one can easily and correctly predict a particular issue.
  • When comparing MO-DAGTO2 against the other standard multi-objective FS techniques, it is proven to be superior at providing the best Pareto fronts respecting the lower number of traits and higher accuracy it offers.
  • The most distinctive feature of this suggested method is its ability to build a reservoir of Pareto optimal solutions, each of which is extremely unique due to the crowding distance.
  • In our wrapper evaluation, we choose KNN as a classification algorithm because it is a superior supervised classifier with low computational complexity.
  • The suggested MO-DAGTO2 algorithm, with a fusion of both filter and wrapper approaches, achieves better classification accuracy by incorporating short length feature substrings, according to the experimental observations of all nine FS algorithms.
  • Because of the great gorilla initialization methodology based on mutual information, the rate of convergence of all four suggested DAGTO algorithms is high when compared to others.
  • Because of the greater HV and spread values, the MO-DAGTO2 approach generates Pareto fronts that can control a huge volume in the objective plane.
  • Among all the above discussed multi-objective FS strategies, MO-DAGTO2 has the greatest contribution toward discovering the true Pareto front as the SCC values are very impressive in most of the datasets. This indicates its rapid convergence toward the optimal solution set.

7. Case Study with COVID-19 Dataset

The World Health Organization (WHO) stated in 2020 that the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) had begun to target China and had spread quickly around the globe. Since August 2020, the SARS-CoV-2 virus, called COVID-19, has killed more than 600,000 people all over the world [67]. Machine learning (ML) has recently emerged as a technical revolution that can be used to battle COVID-19 through diagnosing, treatment and identification [68]. Classification and clustering have been both shown to benefit from ML-based techniques. When it comes to constructing scalable ML models, we focus on the features that are most relevant to each dataset. However, it is difficult to build feature vectors that preserve as much information as possible because ML models require a feature string as input. Because of this, even scalability becomes a problem when the datasets are so enormous [69]. Moreover, genomic data from COVID-19 patients has been extensively studied [70,71]. An important issue in this scenario is the convertion of genomic sequences into a fixed-length feature space so that they may be used as inputs for ML classifiers when making predictions. Here, we provide a method for accurately predicting patient death based on a wide range of variables. Doctors can use this problem to prescribe drugs and devise tactics in advance that will assist in saving the majority of the corresponding lives. MO-DAGTO2, the suggested and proven best FS method, is used to predict COVID-19 patient health in this study by using the COVID-19 dataset presented in Table 11 along with five well-known classification models [69], namely KNN, Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT).
The dataset used in this work for case study is known as the COVID-19 Case Surveillance dataset, and it may be found on the website of the Centers for Disease Control and Prevention in the United States (https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4/data, accessed on 8 July 2022). There are a total of 32,806,678 records. However, the used dataset consists of only 101,017 patient records after deleting the missing and blank entries. The attributes are listed in Table 11.
Figure 6 illustrates the accuracy and feature size of the suggested MO-DAGTO2 and the other four multi-objective techniques. It can be shown that MO-DAGTO2 attained an excellent classification accuracy of 94 % by using only seven factors, which are sex, number of weeks between earliest date and date of symptom onset, known exposures, country, process, underlying conditions, and patient’s age group. The classification results of before and after FS with the utilization of MO-DAGTO2 method are listed in Table 12. Here, the RF classifier outperforms the other classifiers by achieving classification accuracy equal to 95 % , while considering only seven out of 18 features.
Our strategy enables doctors to allocate limited medical resources to the most vulnerable groups, especially during situations of medical scarcity, as well as to deliver urgent care. Clinicians may use the risk prediction method to determine which of their patients is most at risk of death, and they can then implement a tailored preventative strategy. A generic clinical decision support system based on our findings might benefit not just COVID-19 but also other possible pandemics in the future. What is more, biologists may be able to use the patterns extracted by this data to develop more effective vaccines and vaccination tactics.

8. Conclusions and Future Research

This study is the first to present a discrete gorilla troop optimization (DAGTO) algorithm to handle the FS task in the biomedical area. We have introduced four DAGTO versions depending on the number and type of fitness criteria to establish the approaches as top candidates for the FS mission. Moreover, an excellent gorilla initialization technique based on mutual information is attempted for faster convergence. The findings of all four DAGTO variants are evaluated and studied, and the MO-DAGTO2 version, which integrates both filter and wrapper approaches, is precisely confirmed as the supreme one at recognizing the ND solutions that is closest to the real Pareto frontiers. The best proven FS technique, MO-DAGTO2, is compared to four well-known multi-objective FS techniques, namely NSGA-II, BMOFOA, BMOChOA, and FW-GPAWOA, to ensure its consistency. When compared to the other four prominent approaches, MO-DAGTO2 has proven to be the most effective in terms of obtaining smaller feature dimensions and better recognition accuracy. In most datasets, the proposed MO-DAGTO2 technique yields the highest standard Pareto fronts, which are confirmed by using different multi-objective performance assessment criteria. By running a Wilcoxon rank test on the HV metric of the estimated fronts from the five techniques, the validity of the recommended MO-DAGTO2 is statistically confirmed once more. The suggested approach was also tested on a dataset regarding patients related to COVID-19.
Furthermore, we have noticed that MO-DAGTO2 takes longer to execute in most circumstances due to the application of the transfer function in both exploration and exploitation of all the members of the repository. In addition, the fitness evaluation involved the calculation of MI, PCC, and classification error rate. As a result, it is our keen interest to examine additional fitness functions in the future to achieve higher efficiency without increasing the running duration. We are also enthusiastic about combining various evolutionary algorithms [72] with other classification algorithms like random forest and ANN. Also, various advanced initialization procedures can also be applied to MO-DAGTO2 to boost its efficiency. Only healthcare data is used here to verify the efficacy of the suggested DAGTOs. However, the proposed approaches may be used to address different optimization challenges in the real world.

Author Contributions

Conceptualization, J.P., P.M., B.A. and F.S.G.; Methodology, J.P., P.M., B.A. and F.S.G.; Writing—original draft, J.P., P.M., B.A., F.S.G., V.C.G. and A.K.; Writing—review & editing, J.P., P.M., B.A., F.S.G., V.C.G., A.K. and S.M.; supervision: B.A., V.C.G. and A.K.; project administration: B.A., V.C.G. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
  2. Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  3. Liu, H.; Yu, L. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar]
  4. Erguzel, T.T.; Tas, C.; Cebi, M. A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders. Comput. Biol. Med. 2015, 64, 127–137. [Google Scholar] [CrossRef] [PubMed]
  5. Huang, H.; Xie, H.; Guo, J.; Chen, H. Ant colony optimization-based feature selection method for surface electromyography signals classification. Comput. Biol. Med. 2012, 42, 30–38. [Google Scholar] [CrossRef]
  6. Sahebi, G.; Movahedi, P.; Ebrahimi, M.; Pahikkala, T.; Plosila, J.; Tenhunen, H. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance. Comput. Biol. Med. 2020, 125, 103974. [Google Scholar] [CrossRef]
  7. Sreejith, S.; Nehemiah, H.K.; Kannan, A. Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection. Comput. Biol. Med. 2020, 126, 103991. [Google Scholar] [CrossRef]
  8. Vivekanandan, T.; Iyengar, N.C.S.N. Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput. Biol. Med. 2017, 90, 125–136. [Google Scholar] [CrossRef]
  9. Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput. 2016, 20, 606–626. [Google Scholar] [CrossRef] [Green Version]
  10. Benyamin, A.; Gharehchopogh, F.S.; Mirjalili, S. Artificial gorilla troops optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Int. J. Intell. Syst. 2021, 36, 5887–5958. [Google Scholar]
  11. Ginidi, A.; Ghoneim, S.M.; Elsayed, A.; El-Sehiemy, R.; Shaheen, A.; El-Fergany, A. Gorilla Troops Optimizer for Electrically Based Single and Double-Diode Models of Solar Photovoltaic Systems. Sustainability 2021, 13, 9459. [Google Scholar] [CrossRef]
  12. Sayed, G.I.; Hassanien, A.E. A Novel Chaotic Artificial Gorilla Troops Optimizer and Its Application for Fundus Images Segmentation. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics; Springer: Cham, Switzerland, 2021; pp. 318–329. [Google Scholar]
  13. Yusta, S.C. Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett. 2009, 30, 525–534. [Google Scholar] [CrossRef]
  14. Kira, K.; Rendell, L.A. A Practical Approach to Feature Selection. In Proceedings of the 9th International Workshop on Machine Learning (ML), Aberdeen, UK, 1–3 July 1992; Morgan Kaufmann: Burlington, UK, 1992; pp. 249–256. [Google Scholar]
  15. Peng, H.; Long, F.; Ding, C.H.Q. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
  16. Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Xue, B.; Cervante, L.; Shang, L.; Browne, W.N.; Zhang, M. Multi-objective Evolutionary Algorithms for filter Based Feature Selection in Classification. Int. J. Artif. Intell. Tools 2013, 22, 1350024. [Google Scholar] [CrossRef] [Green Version]
  18. Labani, M.; Moradi, P.; Jalili, M. A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion. Expert Syst. Appl. 2020, 149, 113276. [Google Scholar] [CrossRef]
  19. Cervante, L.; Xue, B.; Shang, L.; Zhang, M. A Multi-objective Feature Selection Approach Based on Binary PSO and Rough Set Theory. In Proceedings of the 13th European Conference on Evolutionary Computation in Combinatorial Optimization (EvoCOP), Vienna, Austria, 3–5 April 2013; Volume 7832, pp. 25–36. [Google Scholar]
  20. Xue, B.; Cervante, L.; Shang, L.; Browne, W.N.; Zhang, M. A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connect. Sci. 2012, 24, 91–116. [Google Scholar] [CrossRef]
  21. Hancer, E.; Xue, B.; Zhang, M.; Karaboga, D.; Akay, B. A multi-objective artificial bee colony approach to feature selection using fuzzy mutual information. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; pp. 2420–2427. [Google Scholar]
  22. Cervante, L.; Xue, B.; Zhang, M.; Shang, L. Binary particle swarm optimisation for feature selection: A filter based approach. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Brisbane, Australia, 10–15 June 2012; pp. 1–8. [Google Scholar]
  23. Nayak, S.K.; Rout, P.K.; Jagadev, A.K.; Swarnkar, T. Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure. J. King Saud Univ. Comput. Inf. Sci. 2020, 32, 174–187. [Google Scholar] [CrossRef]
  24. Ali, M.U.; Yusof, U.K.; Naim, S. Filter-Based Multi-Objective Feature Selection Using NSGA III and Cuckoo Optimization Algorithm. IEEE Access 2020, 8, 76333–76356. [Google Scholar]
  25. Narendra, P.M.; Fukunaga, K. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Trans. Comput. 1977, 26, 917–922. [Google Scholar] [CrossRef]
  26. Whitney, A.W. A Direct Method of Nonparametric Measurement Selection. IEEE Trans. Comput. 1971, 20, 1100–1103. [Google Scholar] [CrossRef]
  27. Marill, T.; Green, D.M. On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 1963, 9, 11–17. [Google Scholar] [CrossRef]
  28. Pudil, P.; Novovicová, J.; Kittler, J. Floating search methods in feature selection. Pattern Recognit. Lett. 1994, 15, 1119–1125. [Google Scholar] [CrossRef]
  29. Liu, X.; Liang, Y.; Wang, S.; Yang, Z.; Ye, H. A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection. IEEE Access 2018, 6, 22863–22874. [Google Scholar] [CrossRef]
  30. Chuang, L.; Tsai, S.; Yang, C. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 2011, 38, 12699–12707. [Google Scholar] [CrossRef]
  31. Mafarja, M.M.; Mirjalili, S. Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 2017, 260, 302–312. [Google Scholar] [CrossRef]
  32. Al-Tashi, Q.; Abdulkadir, S.J.; Rais, H.M.; Mirjalili, S.; Alhussian, H. Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection. IEEE Access 2019, 7, 39496–39508. [Google Scholar] [CrossRef]
  33. Sayed, S.A.; Nabil, E.; Badr, A. A binary clonal flower pollination algorithm for feature selection. Pattern Recognit. Lett. 2016, 77, 21–27. [Google Scholar] [CrossRef]
  34. Hancer, E.; Xue, B.; Karaboga, D.; Zhang, M. A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl. Soft Comput. 2015, 36, 334–348. [Google Scholar] [CrossRef]
  35. Kashef, S.; Nezamabadi-pour, H. An advanced ACO algorithm for feature subset selection. Neurocomputing 2015, 147, 271–279. [Google Scholar] [CrossRef]
  36. Muni, D.P.; Pal, N.R.; Das, J. Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man, Cybern. Part B 2006, 36, 106–117. [Google Scholar] [CrossRef]
  37. Ghaemi, M.; Feizi-Derakhshi, M. Feature selection using Forest Optimization Algorithm. Pattern Recognit. 2016, 60, 121–129. [Google Scholar] [CrossRef]
  38. Al-Tashi, Q.; Abdulkadir, S.J.; Rais, H.M.; Mirjalili, S.; Alhussian, H.; Ragab, M.G.; Alqushaibi, A. Binary Multi-Objective Grey Wolf Optimizer for Feature Selection in Classification. IEEE Access 2020, 8, 106247–106263. [Google Scholar] [CrossRef]
  39. Piri, J.; Mohapatra, P. An analytical study of modified multi-objective Harris Hawk Optimizer towards medical data feature selection. Comput. Biol. Med. 2021, 135, 104558. [Google Scholar] [CrossRef]
  40. Nouri-Moghaddam, B.; Ghazanfari, M.; Fathian, M. A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 2021, 175, 114737. [Google Scholar] [CrossRef]
  41. Behravan, I.; Dehghantanha, O.; Zahiri, S.H. An optimal SVM with feature selection using multi-objective PSO. In Proceedings of the 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), Bam, Iran, 9–11 March 2016; pp. 76–81. [Google Scholar]
  42. Bouraoui, A.; Jamoussi, S.; Ayed, Y.B. A multi-objective genetic algorithm for simultaneous model and feature selection for support vector machines. Artif. Intell. Rev. 2018, 50, 261–281. [Google Scholar] [CrossRef]
  43. dos Santos, B.C.; Nobre, C.N.; Zárate, L.E. Multi-Objective Genetic Algorithm for Feature Selection in a Protein Function Prediction Context. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]
  44. Emmanouilidis, C.; Hunter, A.; MacIntyre, J.; Cox, C. A multi-objective genetic algorithm approach to feature selection in neural and fuzzy modeling. Evol. Optim. 2001, 3, 1–26. [Google Scholar]
  45. Huang, B.Q.; Buckley, B.; Kechadi, M.T. Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Expert Syst. Appl. 2010, 37, 3638–3646. [Google Scholar] [CrossRef]
  46. de Oliveira, L.E.S.; Sabourin, R.; Bortolozzi, F.; Suen, C.Y. Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Digit Recognition. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR), Quebec City, QC, Canada, 11–15 August 2002; pp. 568–571. [Google Scholar]
  47. Piri, J.; Mohapatra, P.; Dey, R. Fetal Health Status Classification Using MOGA–CD Based Feature Selection Approach. In Proceedings of the IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 2–4 July 2020; pp. 1–6. [Google Scholar]
  48. Piri, J.; Mohapatra, P.; Dey, R. Multi-objective Ant Lion Optimization Based Feature Retrieval Methodology for Investigation of Fetal Wellbeing. In Proceedings of the 3rd International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2–4 September 2021; pp. 1732–1737. [Google Scholar]
  49. Piri, J.; Mohapatra, P.; Singh, D.; Samanta, D.; Singh, D.; Kaur, M.; Lee, H. Mining and Interpretation of Critical Aspects of Infant Health Status Using Multi-Objective Evolutionary Feature Selection Approaches. IEEE Access 2022, 10, 32622–32638. [Google Scholar] [CrossRef]
  50. Xue, B.; Fu, W.; Zhang, M. Differential evolution (DE) for multi-objective feature selection in classification. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), ACM, Vancouver, BC, Canada, 12–16 July 2014; pp. 83–84. [Google Scholar]
  51. Moradi, P.; Gholampour, M. A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 2016, 43, 117–130. [Google Scholar] [CrossRef]
  52. Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 2014, 18, 261–276. [Google Scholar] [CrossRef]
  53. Hammami, M.; Bechikh, S.; Hung, C.; Said, L.B. A Multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memetic Comput. 2019, 11, 193–208. [Google Scholar] [CrossRef]
  54. Emary, E.; Yamany, W.; Hassanien, A.E.; Snasel, V. Multi-Objective Gray-Wolf Optimization for Attribute Reduction. Procedia Comput. Sci. 2015, 65, 623–632. [Google Scholar] [CrossRef] [Green Version]
  55. Taha, A.M.; Chen, S.; Mustapha, A. Bat Algorithm Based Hybrid Filter-Wrapper Approach. Adv. Oper. Res. 2015, 2015, 961494. [Google Scholar] [CrossRef] [Green Version]
  56. Saxena, A.; Shrivas, M.M. Filter–GA Based Approach to Feature Selection for Classification. Int. J. Future Revolut. Comput. Sci. Commun. Eng. 2017, 3, 202–212. [Google Scholar]
  57. Got, A.; Moussaoui, A.; Zouache, D. Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach. Expert Syst. Appl. 2021, 183, 115312. [Google Scholar] [CrossRef]
  58. Too, J.; Abdullah, A.R.; Saad, N.M. A New Quadratic Binary Harris Hawk Optimization for Feature Selection. Electronics 2019, 8, 1130. [Google Scholar] [CrossRef] [Green Version]
  59. Li, W.; Zhang, G.; Zhang, T.; Huang, S. Knee Point-Guided Multiobjective Optimization Algorithm for Microgrid Dynamic Energy Management. Complexity 2020, 2020, 8877008. [Google Scholar] [CrossRef]
  60. Zhang, X.; Tian, Y.; Jin, Y. A Knee Point-Driven Evolutionary Algorithm for Many-Objective Optimization. IEEE Trans. Evol. Comput. 2015, 19, 761–776. [Google Scholar] [CrossRef]
  61. Zhu, Z.; Ong, Y.; Dash, M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 2007, 40, 3236–3248. [Google Scholar] [CrossRef]
  62. Too, J.; Mirjalili, S. A Hyper Learning Binary Dragonfly Algorithm for Feature Selection: A COVID-19 Case Study. Knowl.-Based Syst. 2021, 212, 106553. [Google Scholar] [CrossRef]
  63. Kumar, V.; Kaur, A. Binary spotted hyena optimizer and its application to feature selection. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 2625–2645. [Google Scholar] [CrossRef]
  64. Deb, K.; Agrawal, S.; Pratap, A.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef] [Green Version]
  65. Piri, J.; Mohapatra, P.; Pradhan, M.R.; Acharya, B.; Patra, T.K. A Binary Multi-Objective Chimp Optimizer With Dual Archive for Feature Selection in the Healthcare Domain. IEEE Access 2022, 10, 1756–1774. [Google Scholar] [CrossRef]
  66. Auger, A.; Bader, J.; Brockhoff, D.; Zitzler, E. Theory of the hypervolume indicator: Optimal μ-distributions and the choice of the reference point. In Proceedings of the 10th ACM/SIGEVO Conference on Foundations of Genetic Algorithms (FOGA), ACM, Orlando, FL, USA, 9–11 January 2009; pp. 87–102. [Google Scholar]
  67. Chen, X.; Tang, Y.; Mo, Y.; Li, S.; Lin, D.; Yang, Z.; Yang, Z.; Sun, H.; Qiu, J.; Liao, Y.; et al. A diagnostic model for coronavirus disease 2019 (COVID-19) based on radiological semantic and clinical features: A multi-center study. Eur. Radiol. 2020, 30, 4893–4902. [Google Scholar] [CrossRef] [Green Version]
  68. Sahlol, A.T.; Yousri, D.; Ewees, A.A.; Al-qaness, M.A.A.; Damasevicius, R.; Elaziz, M.A. COVID-19 image classification using deep features and fractional-order marine predators algorithm. Sci. Rep. 2020, 10, 15364. [Google Scholar] [CrossRef] [PubMed]
  69. Ali, S.; Zhou, Y.; Patterson, M. Efficient Analysis of COVID-19 Clinical Data using Machine Learning Models. Med. Biol. Eng. Comput. 2022, 60, 1881–1896. [Google Scholar] [CrossRef]
  70. Ali, S.; Ali, T.E.; Khan, M.A.; Khan, I.; Patterson, M. Effective and scalable clustering of SARS-CoV-2 sequences. In Proceedings of the ICBDR 2021: 2021 the 5th International Conference on Big Data Research, Tokyo, Japan, 25–27 September 2021. [Google Scholar]
  71. Kuzmin, K.; Adeniyi, A.E.; DaSouza, A.K.; Lim, D.; Nguyen, H.; Molina, N.R.; Xiong, L.; Weber, I.T.; Harrison, R.W. Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem. Biophys. Res. Commun. 2020, 533, 553–558. [Google Scholar] [CrossRef]
  72. Drakopoulos, G.; Stathopoulou, F.; Kanavos, A.; Paraskevas, M.; Tzimas, G.; Mylonas, P.; Iliadis, L. A Genetic Algorithm for Spatiosocial Tensor Clustering. Evol. Syst. 2020, 11, 491–501. [Google Scholar] [CrossRef]
Figure 1. Phases of AGTO [10].
Figure 1. Phases of AGTO [10].
Mathematics 10 02742 g001
Figure 2. Steps of MO-DAGTOs.
Figure 2. Steps of MO-DAGTOs.
Mathematics 10 02742 g002
Figure 3. Repository update for MO-DAGTOs.
Figure 3. Repository update for MO-DAGTOs.
Mathematics 10 02742 g003
Figure 4. Convergence curves of single-objective FS methods.
Figure 4. Convergence curves of single-objective FS methods.
Mathematics 10 02742 g004
Figure 5. Number of features vs. classification accuracy of multi-objective FS methods.
Figure 5. Number of features vs. classification accuracy of multi-objective FS methods.
Mathematics 10 02742 g005
Figure 6. Number of features vs. classification accuracy on COVID-19 data.
Figure 6. Number of features vs. classification accuracy on COVID-19 data.
Mathematics 10 02742 g006
Table 1. Datasets.
Table 1. Datasets.
SIDName#Samples#Characteristics#Target Labels
D1Lymphography148183
D2Diabetic1151192
D3Cardiotocography2126213
D4Cervical Cancer858352
D5Lung Cancer32563
D6Arrhythmia45227916
D7Parkinson7567542
D8Colon Tumor6220002
D9SRBCT8323084
D10Leukemia7271292
Table 2. Parameter values.
Table 2. Parameter values.
ApproachesParametersValues
Single objective (Population Size = 20, Maximum Number of Iterations = 100)
HLBDApl0.4
dl0.7
BSHO h [5, 0]
M[0.5, 1]
λ 0.99
ω 0.01
QBHHO α 0.99
Single-objective DAGTO β 3
W0.8
p0.03
Multi objective (Population Size = 20, Maximum Number of Iterations = 100, Repository Size = 50)
NSGA-IICrossover rate (CR)0.8
Mutation rate (MR)0.01
BMOFOATransfer rate10%
Lifetime20
LSC2
GSC7
BMOChOAChaotic mapTent
Multi-objective DAGTOs β 3
W0.8
p0.03
Table 3. Performance comparison of single-objective FS methods.
Table 3. Performance comparison of single-objective FS methods.
DatasetsCriteriaSO-DAGTOHLBDABSHOQBHHO
LymphographyMean_fitness0.1840.1970.20.201
Avg_accuracy0.8130.8040.8160.816
Avg_feature_size6.6671313.33
Avg_execution time (min)5.014.785.024.56
DiabeticMean_fitness0.3070.330.3240.318
Avg_accuracy0.6890.6860.70.649
Avg_feature_size779.66
Avg_execution time (min)8.028.137.867.73
CardiotocographyMean_fitness0.0910.1040.0970.101
Avg_accuracy0.9240.9080.9150.91
Avg_feature_size88711
Avg_execution time (min)11.0110.9711.0610.96
Cervical CancerMean_fitness0.0480.0580.0580.059
Avg_accuracy0.9560.9680.9450.953
Avg_feature_size61213.612
Avg_execution time (min)5.124.895.055.02
Lung CancerMean_fitness0.360.3840.3970.354
Avg_accuracy0.650.630.6180.643
Avg_feature_size2242.12813
Avg_execution time (min)2.002.023.192.00
ArrhythmiaMean_fitness0.3440.3660.380.377
Avg_accuracy0.6290.6240.6060.611
Avg_feature_size61131108105
Avg_execution time (min)5.795.435.875.38
ParkinsonMean_fitness0.120.1320.1320.123
Avg_accuracy0.890.8780.880.887
Avg_feature_size255.23251.3248271.34
Avg_execution time (min)10.3211.0110.8710.12
Colon TumorMean_fitness0.1590.1610.1760.17
Avg_accuracy0.840.8550.8380.838
Avg_feature_size9621000.31129850.4
Avg_execution time (min)9.539.689.469.48
SRBCTMean_fitness0.1170.1240.1460.137
Avg_accuracy0.8780.8830.8640.867
Avg_feature_size654754804976
Avg_execution time (min)13.2213.7512.8912.38
LeukemiaMean_fitness0.1390.14240.1250.135
Avg_accuracy0.8620.860.8870.87
Avg_feature_size181132012003.871991.13
Avg_execution time (min)28.0426.8727.0926.49
Table 4. Wilcoxon signed rank test results of SO-DAGTO vs. others.
Table 4. Wilcoxon signed rank test results of SO-DAGTO vs. others.
DatasetsSO-DAGTO vs. HLBDASO-DAGTO vs. BSHOSO-DAGTO vs. QBHHO
Lymphography=++
Diabetic=++
Cardiotocography+=+
Cervical Cancer+++
Lung Cancer++
Arrhythmia+++
Parkinson++=
Colon Tumor=++
SRBCT+++
Leukemia+
Table 5. Performance comparison among MO-DAGTOs.
Table 5. Performance comparison among MO-DAGTOs.
DatasetsMethodsAvg_AccuracyAvg_Feature_SizeHV#Pareto Solutions
LymphographyMO-DAGTO10.82430.2042
MO-DAGTO20.82540.3845
MO-DAGTO30.815.50.1786
DiabeticMO-DAGTO10.67320.11
MO-DAGTO20.73.50.2414
MO-DAGTO30.694.250.2454
CardiotocographyMO-DAGTO10.9045.140.157
MO-DAGTO20.9023.50.2424
MO-DAGTO30.8466.770.1219
Cervical CancerMO-DAGTO10.96230.2923
MO-DAGTO20.96390.4932
MO-DAGTO30.964.660.3183
Lung CancerMO-DAGTO10.67520.11
MO-DAGTO20.5922.50.3682
MO-DAGTO30.5752.250.2264
ArrhythmiaMO-DAGTO10.6230.2973
MO-DAGTO20.62330.3143
MO-DAGTO30.644.880.1179
ParkinsonMO-DAGTO10.8450.157
MO-DAGTO20.846620.36318
MO-DAGTO30.866.330.1229
Colon TumorMO-DAGTO10.851230.11
MO-DAGTO20.8236.20.3934
MO-DAGTO30.81956.850.1447
SRBCTMO-DAGTO10.69151.140.1777
MO-DAGTO20.90392.330.3053
MO-DAGTO30.671026.220.1229
LeukemiaMO-DAGTO10.81174.250.2754
MO-DAGTO20.8729640.3053
MO-DAGTO30.83941.280.1637
Table 6. Wilcoxon test results for MO-DAGTO1, MO-DAGTO2, and MO-DAGTO3.
Table 6. Wilcoxon test results for MO-DAGTO1, MO-DAGTO2, and MO-DAGTO3.
MO-DAGTO2 vs.MO-DAGTO1MO-DAGTO3
p -ValueSignificance p -ValueSignificance
Lymphography3.870 × 10 3 +2.671 × 10 2 +
Diabetic2.432 × 10 4 +4.270 × 10 2
Cardiotocography1.520 × 10 2 +1.113 × 10 2 +
Cervical Cancer4.890 × 10 4 +7.564 × 10 5 +
Lung Cancer2.980 × 10 2 +4.867 × 10 3 +
Arrhythmia0.071=3.560 × 10 3 +
Parkinson4.983 × 10 5 +6.480 × 10 5 +
Colon Tumor3.591 × 10 2 +4.238 × 10 3 +
SRBCT5.390 × 10 5 +6.123 × 10 5 +
Leukemia0.068=4.219 × 10 3 +
Table 7. Performance comparison among MO-DAGTO2 with benchmark methods.
Table 7. Performance comparison among MO-DAGTO2 with benchmark methods.
DatasetsMethodsAvg_AccAvg_Feature_SizeIGDHVSpreadSCC
LymphographyNSGA-II0.8237.160.0880.1780.6950
BMOFOA0.7953.750.0280.2450.6580
BMOChOA0.8126.660.1020.3340.9610
FW-GPAWOA0.78930.0910.3010.7241
MO-DAGTO20.82540.2640.3840.8453
DiabeticNSGA-II0.6583.750.0730.2540.7860
BMOFOA0.6294.660.1480.3120.8450
BMOChOA0.68821.1790.101
FW-GPAWOA0.6893.330.1770.30.780
MO-DAGTO20.73.50.0840.2410.6673
CardiotocographyNSGA-II0.8647.60.0220.2010.620
BMOFOA0.8755.80.0240.1990.5920
BMOChOA0.8684.60.0150.2040.7340
FW-GPAWOA0.896.250.0910.1340.580
MO-DAGTO20.9023.50.0550.2420.6772
Cervical CancerNSGA-II0.95210.420.2690.1520.5810
BMOFOA0.95140.1890.2010.6420
BMOChOA0.9614.330.0730.3150.7860
FW-GPAWOA0.96220.1590.25501
MO-DAGTO20.96390.2570.4931.2481
Lung CancerNSGA-II0.47319.51.1140.1990.5690
BMOFOA0.46418.61.0390.2380.6390
BMOChOA0.597100.8950.290.5480
FW-GPAWOA0.5452.50.6310.3530.6680
MO-DAGTO20.5922.50.5920.3680.8080
ArrhythmiaNSGA-II0.63126.50.1180.2220.3750
BMOFOA0.664840.1210.2190.3290
BMOChOA0.619122.330.2270.1620.3160
FW-GPAWOA0.6645.1250.2550.1350.5780
MO-DAGTO20.62330.0150.3140.7561
ParkinsonNSGA-II0.811346.40.0360.1110.1950
BMOFOA0.835300.330.3080.260.1990
BMOChOA0.802109.8750.0460.1390.8340
FW-GPAWOA0.8325.40.0230.1140.8690
MO-DAGTO20.8562.550.0550.3630.5850
Colon TumorNSGA-II0.8359660.3070.220.260
BMOFOA0.784438.50.4720.1670.7690
BMOChOA0.839896.330.3960.2650.2121
FW-GPAWOA0.822633.20.1360.2150.7610
MO-DAGTO20.8236.20.060.3931.4274
SRBCTNSGA-II0.7221111.2860.1210.1450.2370
BMOFOA0.7452090.0930.2271.0812
BMOChOA0.864299.50.4510.4080.9470
FW-GPAWOA0.776142.830.0490.2050.9491
MO-DAGTO20.90392.330.270.3050.9043
LeukemiaNSGA-II0.8171371.7140.0480.1620.8932
BMOFOA0.8551316.50.1680.2580.7880
BMOChOA0.8491454.20.1160.2130.9410
FW-GPAWOA0.83475.50.1710.2751.0552
MO-DAGTO20.8729640.280.3050.8322
Table 8. Results of Wilcoxon rank test for MO-DAGTO2 vs. others.
Table 8. Results of Wilcoxon rank test for MO-DAGTO2 vs. others.
MO-DAGTO2 vs.NSGA-IIBMOFOABMOChOAFW-GPAWOA
p -ValueSignf p -ValueSignf p -ValueSignf p -ValueSignf
Lymphography2.387 × 10 2 +3.114 × 10 2 +0.069=8.393 × 10 2 +
Diabetic1.000=3.762 × 10 3 2.918 × 10 3 +0.0761=
Cardiotocography9.113 × 10 2 +7.391 × 10 2 +9.778 × 10 2 +5.213 × 10 2 +
Cervical Cancer2.125 × 10 3 +3.613 × 10 3 +9.210 × 10 2 +7.120 × 10 2 +
Lung Cancer5.912 × 10 3 +7.112 × 10 3 +6.452 × 10 2 +1.000=
Arrhythmia2.987 × 10 2 +2.135 × 10 2 +6.432 × 10 3 +3.762 × 10 3 +
Parkinson3.127 × 10 3 +9.923 × 10 2 +8.312 × 10 3 +5.328 × 10 3 +
Colon Tumor7.560 × 10 4 +3.190 × 10 5 +9.569 × 10 3 +4.238 × 10 5 +
SRBCT2.013 × 10 3 +4.612 × 10 3 +2.678 × 10 3 3.780 × 10 3 +
Leukemia3.456 × 10 4 +1.784 × 10 2 +1.001 × 10 2 +0.0827=
Table 9. Execution time (in minutes) of multi-objective FS methods.
Table 9. Execution time (in minutes) of multi-objective FS methods.
DatasetsNSGA-IIBMOFOABMOChOAFW-GPAWOAMO-DAGTO2
Lymphography5.675.184.394.186.09
Diabetic8.127.047.237.028.13
Cardiotocography18.2110.678.987.8913.78
Cervical Cancer4.124.892.232.035.81
Lung Cancer4.375.033.123.046.03
Arrhythmia9.018.037.617.0110.35
Parkinson14.7813.6714.1114.5617.04
Colon Tumor5.897.025.195.175.96
SRBCT12.9010.3412.3412.8914.04
Leukemia17.2115.4516.7817.4519.14
Table 10. Comparison between SO-DAGTO and MO-DAGTO2.
Table 10. Comparison between SO-DAGTO and MO-DAGTO2.
DatasetMethodAvg_accAvg_Feature_SizeOptimum Solution
LymphographySO-DAGTO0.8136.66[5,0.831]
MO-DAGTO20.8254[2,0.796]
DiabeticSO-DAGTO0.6897[7,0.7]
MO-DAGTO20.73.5[2,0.688]
CardiotocographySO-DAGTO0.9248[7,0.93]
MO-DAGTO20.9023.5[3,0.908]
Cervical CancerSO-DAGTO0.9566[3,0.958]
MO-DAGTO20.9639[2,0.962]
Lung CancerSO-DAGTO0.6522[36,0.658]
MO-DAGTO20.5922.5[3,0.667]
ArrhythmiaSO-DAGTO0.62961[61,0.644]
MO-DAGTO20.6233[4,0.701]
ParkinsonSO-DAGTO0.89255.23[264,0.872]
MO-DAGTO20.8562.55[162,0.882]
Colon TumorSO-DAGTO0.84962[970,0.857]
MO-DAGTO20.8236.2[903,0.871]
SRBCTSO-DAGTO0.878654[556,0.89]
MO-DAGTO20.90392.33[164,0.914]
LeukemiaSO-DAGTO0.8621811[1674,0.877]
MO-DAGTO20.872964[794,0.877]
Table 11. Details of COVID-19 Dataset.
Table 11. Details of COVID-19 Dataset.
SNONameDetailsValues
1case_monthDate received by CDCMarch 2020, April 2020, ... , August 2021
2res_stateState name of USAAK, CO, FL, ... , UT, VT
3state_fips_codeFederal Information Processing Standards (FIPS) code for states
4res_countyCountry name
5county_fips_codeFIPS code foe countries
6age_groupPatients’s age group0–17, 18–49, 50–64, and 65+ years
7SexGender of patientM, F, other, unknown
8RaceRace of patientAmerican Indian/Alaska Native, Asian, Black, Multiple/Other, Native Hawaiian/Other Pacific Islander, White, Unknown
9Ethnicity Hispanic, Non-Hispanic, Unknown
10case_positive_specimen_intervalWeeks between the initial positive specimen collection and the earliest date of collection
11case_onset_intervalWeeks between earliest date and date of symptom onset
12ProcessUnder what process was the case first recognisedClinical evaluation, routine surveillance, multiple,...
13exposure_ynAny of the following known exposures, such as local or international travel, incarceration, a community event, or contact with a previously reported case of COVID-19, did the patient have in the 14 days before becoming ill?Yes, unknown
14current_statusCurrent status of the patientLaboratory-confirmed case, Probable case.
15symptom_statusSymptom status of the patientAsymptomatic, Symptomatic, Unknown
16hosp_ynWas the patient hospitalized?Yes, no, unknown
17icu_ynWas the patient admitted to an ICU?Yes, no, unknown
18underlying_conditions_ynWhether the patient is having diabetes, hypertension, cardiovascular disease, etc.Yes, no
19death_ynWhether the patient die as a result of this illnessYes, no, unknown
Table 12. Classification results of COVID-19 dataset.
Table 12. Classification results of COVID-19 dataset.
Methods AccuracySensitivitySpecificityPrecisionFPRError
KNNBFS0.930.9780.3850.9670.6150.051
AFS0.940.9770.4820.9730.5180.046
LRBFS0.93410.0040.9470.9960.053
AFS0.934100.94710.053
SVMBFS0.934100.9470.50.041
AFS0.934100.94710.053
RFBFS0.9460.9820.50.9740.50.041
AFS0.950.9810.4590.9720.5410.041
DTBFS0.9350.9760.4720.9720.5280.048
AFS0.9420.9820.4420.9710.5580.044
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Piri, J.; Mohapatra, P.; Acharya, B.; Gharehchopogh, F.S.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. Mathematics 2022, 10, 2742. https://doi.org/10.3390/math10152742

AMA Style

Piri J, Mohapatra P, Acharya B, Gharehchopogh FS, Gerogiannis VC, Kanavos A, Manika S. Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. Mathematics. 2022; 10(15):2742. https://doi.org/10.3390/math10152742

Chicago/Turabian Style

Piri, Jayashree, Puspanjali Mohapatra, Biswaranjan Acharya, Farhad Soleimanian Gharehchopogh, Vassilis C. Gerogiannis, Andreas Kanavos, and Stella Manika. 2022. "Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data" Mathematics 10, no. 15: 2742. https://doi.org/10.3390/math10152742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop