Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Impact of Chaos Functions on Modern Swarm Optimizers

  • E. Emary,

    Affiliations Department of Information Technology, Faculty of Computers and Information, Cairo University, Cairo, Egypt, Computing Department, Faculty of Computer Studies, Arab Open University, Shrouq, Egypt

  • Hossam M. Zawbaa

    hossam.zawbaa@gmail.com

    Affiliations Department of Information Technology, Faculty of Computers and Information, Beni-Suef University, Beni-Suef, Egypt, Department of Computer Science, Faculty of Mathematics and Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania

Abstract

Exploration and exploitation are two essential components for any optimization algorithm. Much exploration leads to oscillation and premature convergence while too much exploitation slows down the optimization algorithm and the optimizer may be stuck in local minima. Therefore, balancing the rates of exploration and exploitation at the optimization lifetime is a challenge. This study evaluates the impact of using chaos-based control of exploration/exploitation rates against using the systematic native control. Three modern algorithms were used in the study namely grey wolf optimizer (GWO), antlion optimizer (ALO) and moth-flame optimizer (MFO) in the domain of machine learning for feature selection. Results on a set of standard machine learning data using a set of assessment indicators prove advance in optimization algorithm performance when using variational repeated periods of declined exploration rates over using systematically decreased exploration rates.

1 Introduction

Chaos is the form of non-linear movement of the dynamics system and can experience any state according to its own regularity. Chaos has the ability of information processing that is applied widely in many fields like optimization problems, machine learning, and pattern recognition. The searching techniques based on chaos are similar to genetic algorithms (GA), but it uses the chaos variables. Because of the chaos is not repetition system, it can execute the overall search process in less computational time than stochastic ergodic techniques based on the probability [1]. Lately, the thought of applying the chaotic systems rather than random operations has been observed in the optimization theory. The randomness role can be worked by chaotic motion rather than random processes in the optimization algorithms [2]. Numerous studies state as true that the advantages of applying the chaotic-based processes instead of random-based processes are usually visible despite the fact that a general principle can not be planned [3]. Chaos optimization algorithm (COA) has the chaos properties like ergodicity that can more efficiently escape from local minima than whatever other stochastic optimization algorithm [2]. COA utilizes the consistency exist as a part of a chaotic motion to escape from local minima, but the random-based procedures escape from the local minima by perceiving some unacceptable solutions with a particular probability [2, 4, 5].

Over the previous years, the number of features in machine learning and pattern recognition applications have expanded from hundreds to thousands of input features. Feature selection is a procedure for distinguishing the subset of significance features and remove the repetitive ones. Additionally, this technique is valuable when the size of input features is vast and not every one of them are required for describing the data [6]. Feature selection prompts the decrease of the dimensionality of features space that will enhance the classification performance. Moreover, it will help in better data understanding and reduce the computation time during the classification process [7, 8]. There are two different techniques of feature selection approaches: filter-based and wrapper-based [9]. The filter-based approach searches for a subset of features that optimize a given data-dependent criterion [10]. The wrapper-based approach applies the machine learning strategy as the evaluation component that has better results than the filter-based approach [11]; however, it is computationally costly [12].

Generally, feature selection is formulated as a multi-objective problem with two goals: minimize the number of selected features and maximize the classification performance. These two goals are conflicting and the optimal solution ought to adjust between them. A sort of search methods has been applied, for example, sequential forward selection (SFS) [13] and sequential backward selection (SBS) [14]. Nonetheless, these feature selection methods still experience the ill effects of stagnation in the local optima [15]. Evolutionary computing (EC) algorithms adaptively search the space by using a set of search operators that contact in a social way to achieve the optimal solution [16, 17].

The chaos genetic feature selection optimization method (CGFSO) investigates the search space with every conceivable combination of a given data set. Each individual in the population represents a candidate solution and the feature dimension is the same length as chromosomes length [18, 19]. The chaotic time series with an EPNet algorithm requires a small network architecture whereas the expansion of neural parts might debase the performance amid the evolution and gives more survival probabilities to smaller networks in the population [20, 21]. The chaotic antlion optimization (CALO) can converge to the same optimal solution for a higher number of applications regardless of the stochastic searching and the chaotic adjustment [22].

In this paper, our aim is to propose an optimization algorithm based on two chaotic functions that employed to analyze their performance and impact on three bio-inspired optimization algorithms namely grey wolf optimization (GWO), antlion optimization (ALO), and moth-flame optimization (MFO). The rest of this paper is comprised as follows: Section 2 gives the foundation information of three used bio-inspired optimization algorithms and describes the usage of chaotic variants of GWO, ALO, and MFO in feature selection. Experimental results with discussions are displayed in Section 3 and the conclusions and future work are provided in Section 4.

2 Materials and Methods

The following subsections provide the short brief of the applied three bio-inspired optimization algorithms and chaotic maps. After that, it describes the usage of chaotic variants of GWO, ALO, and MFO in feature selection.

2.1 Preliminaries

This subsection provides a brief summary of the three modern bio-inspired optimizers namely grey wolf optimization (GWO), antlion optimization (ALO), and moth-flame optimization (MFO). Moreover, we have provided a short introduction to the chaos theory.

  1. Gray wolf optimization (GWO)
    GWO is a relatively new optimization algorithm proposed by Mirjalili et al. in 2014 [23]. This algorithm mimics the hunting process of grey wolves. Grey wolves often live in a pack with strict leadership hierarchy. Table 1 describes such hierarchy and the responsibility of each agent in the pack.
    For a pack to hunt prey, a set of phases are to be applied in sequence [24]:
    Prey encircling: the pack encircles a prey by repositioning individual agents according to the prey location. The encircling is modeled as in given Eq (1): (1) where is as defined in Eq (2), t is the iteration number, and are coefficient vectors, is the prey position, and is the gray wolf position. (2) where , vectors are calculated as in Eqs (3) and (4). (3) (4) where a is linearly decreased from 2 to 0 over the course of iterations controlling exploration and exploitation, and r1, r2 are random vectors in [0, 1]. The value of a is the same for all wolves in the pack.
    By the end of encircling stage, the hunting stages start.
    Hunting: is performed by the whole pack based on the information coming from the alpha, beta and delta wolves which are expected to know the prey location. Alpha, beta, and delta are assumed to have better knowledge about the potential location of prey and hence are used as guides for hunting. Individual wolf in the pack updates its position as shown in the Eq (5): (5) where , , are defined as in Eqs (6), (7) and (8) respectively. (6) (7) (8) where , , are the first three best solutions in the swarm at a given iteration t, , , are defined as in Eq (3), and , , are defined using Eqs (9), (10) and (11) respectively. (9) (10) (11) where , , and are defined as in Eq (4).
    By the end of preying stage, the pack attacks the prey in what is called attacking stage.
    Attacking stage: the attacking phase is performed by allowing the agents to approach the prey which is achieved by decrementing the exploration rate a as outlined in Eq 4 with a ∈ [1, 2].
    By finishing the attacking phase the pack is ready to search for a new prey which is the next phase.
    Search for prey: in this phase, wolves diverge from each other to find a new prey. This behavior is modeled by allowing large value for the parameter a to allow for exploration of the search space as outlined in Eq 4 with a ∈ [0, 1].
    Algorithm 1 outlines the gray wolf optimization (GWO) algorithm.
    Algorithm 1: Grey wolf optimization (GWO) algorithm
    1: Inputs: n Number of gray wolves in the pack,
    NIter Number of iterations for optimization.
    2: Outputs: xα Optimal gray wolf position,
    f(xα) Best fitness value.
    3: Initialize a population of n gray wolves positions randomly.
    4: Find the α, β, and δ solutions based on their fitness values.
    5: while stopping criteria is not met do
    6:   for all Wolfipack do
    7:    Update current wolf’s position according to Eq (5).
    8:   end for
    9:   Update a, A, and C as in Eqs (2) and (3).
    10:  Evaluate the positions of individual wolves.
    11:  Update α, β, and δ positions as in Eqs (9), (10) and (11).
    12: end while
    13: Select the optimal position (xα) and its fitness (f(xα)).
  2. Antlion optimization (ALO)
    ALO is a recent optimization algorithm that was proposed by Mirjalili in 2015 [25]. ALO algorithm emulates the chasing system of the antlions in nature. There are two different agents utilized as a part of ALO algorithm to be specific ants and antlions. Antlion always dependably burrows a trap to chase ants, and when an ant falls in the hole, the antlion tosses sand towards it until it is slaughtered and ingested. After that, the antlion readies the trap again for the next chase.
    In the artificial optimizer, the ants change their positions according to the antlions positions. Individual ant performs two local searches around the elite antlion and the selected antlion; then updates itself position as the average of these two random walks as in Eq (12): (12) where is the random walk around the roulette wheel selected antlion, and is the random walk around the elite antlion.
    The random walking of an ant around a given antlion is modeled as in the Eq (13): (13) where is the updated ant number i position at iteration t, ai is the minimum of random walk in ith dimension, and bi is the maximum of random walk in ith dimension, Xi is defined in Eq 14, c and d are the lower and upper bounds of the random walk. (14) where cumsum calculates the cumulative sum, and r(t) is a stochastic function characterized as in the Eq (15): (15) where rand is a random number created with uniform distribution in the interim of [0, 1].
    c and d parameters are adapted according to Eqs 16 and 17 to limit the range of the random walk around the given antlion. (16) (17) where lb, ub are the lower and upper limits for dimension i, rand is a random number drawn from uniform distribution and I is a factor that control the exploration/exploitation rate and is defined in Eq (18): (18) where T is the maximum number of iterations, w is a constant characterized taking into account the present iteration (w = 2 when t > 0.1T, w = 3 when t > 0.5T, w = 4 when t > 0.75T, w = 5 when t > 0.9T, and w = 6 when t > 0.95T). The constant w can modify the accuracy level of exploitation.
    Finally, the selection operation is applied where an antlion is replaced by an ant if the ant becomes fitter. As indicated by the above conditions ALO optimizer can be formulated as in the following algorithm 2.
    Algorithm 2: Antlion optimization (ALO) algorithm
    1: Inputs: n number of ants,
    N number of antlions,
    T maximum number of iterations.
    2: Outputs: The elitist antlion.
    3: Initialize a population of n ant’s positions and N antlion’s positions randomly.
    4: Calculate the fitness of all ants and antlions.
    5: Find the fittest antlion (the elite).
    6: t = 0.
    7: while tT do
    8:   for all anti do
    9:   Select an antlion utilizing the roulette wheel selection.
    10:   Slide ants towards the antlion; perform random pursuit around this selected antlion and normalize it.
    11:   Create a random walk for anti around the elite antlion and normalize it.
    12:   Update anti position as the average of the two obtained random walks.
    13:  end for
    14:  Calculate the fitness of all ants.
    15:  Replace an antlion with its corresponding ant it if gets to be fitter.
    16:  Update elite if an antlion gets to be fitter than the elite.
    17: end while
    18: Select the elitist antlion and its fitness.
  3. Moth-flame optimization (MFO)
    Mirjalili proposed moth-flame optimization (MFO) in 2015 [26] that is motivated by the moths navigation approach. Moths depend on transverse orientation for navigation where a moth flies by keeping up a settled point concerning the moon. At the point when moths see the human-made artificial light, they endeavor to have a similar angle of the light to fly in the straight line. Moths and flames are the primary components of the artificial MFO algorithm. MFO is a population-based algorithm with the set of n moths are used as search operators. Flames are the best N positions of moths that are acquired so far. In this manner, every moth seeks around a flame and updates it if there should be an occurrence of finding a better solution. Given logarithmic spiral, a moth updates its position concerning a given flame as in the Eq (19) [26]: (19) where Di shows the Euclidian distance of the ith moth for the jth flame, b is a constant to define the shape of the logarithmic spiral, Mi demonstrate the ith moth, Fj shows the jth flame, and t is a random number in [−1, 1].
    As might be found in the above equation, the next position of a moth is characterized with respect to a flame. The t parameter in the spiral equation describes how much the next position of the moth ought to be near the flame. The spiral equation permits a moth to fly around a flame and not necessarily in the space between flames taking into account both exploration and exploitation of solutions. With a specific end goal to further emphasize exploitation, we suppose that t is a random number in [r, 1] where r is linearly decreased from −1 to −2 through the span of emphasis and is called convergence constant. With this technique, moths tend to exploit their corresponding flames more precisely corresponding to the number of iterations. In order to upgrade the likelihood of converging to a global solution, a given moth is obliged to update its position utilizing one of the flames (the corresponding flame). In every iteration and in the wake of updating the flame-list, the flames are sorted based on their fitness values. After that, the moths update their positions as for their corresponding flames. To permit much exploitation of the best encouraging solutions, the number of flames to be taken after is diminished as for the iteration number as given in the Eq (20): (20) where l is the present iteration number, N is the maximum number of flames, and T demonstrates the maximum number of iterations. The moth-flame optimization (MFO) is exhibited in algorithm 3.
    Algorithm 3: Moth-flame optimization (MFO) algorithm
    1: Inputs: n number of moths,
    N number of flames,
    T maximum number of iterations.
    2: Outputs: Best the optimal flame position,
    f(Best) the optimal flame fitness value.
    3: Initialize a population of n flames positions randomly in the search space.
    4: while Stopping criteria is not met do
    5:   Update the N number of flames as in Eq (20).
    6:   Compute the fitness value for each moth.
    7:   if first iteration then
    8:    Sort the moths from best to worst as per their fitness and place the result in the flame list.
    9:   else
    10:   Merge the population of past moths and flames.
    11:   Sort the merged population from best to worst.
    12:   Select the best N positions from the sorted merged population as the new flame list.
    13:  end if
    14:  Calculate the convergence constant r.
    15:  for all Mothi with in do
    16:   Compute t as t = (r − 1)*rand + 1.
    17:   if iN then
    18:    Update Mothi position as indicated by Flamei applying Eq (19).
    19:   else
    20:    Update Mothi position as indicated by FlameN using Eq (19).
    21:   end if
    22:  end for
    23: end while
    24: Select the optimal position and its fitness.
  4. Chaotic maps
    Chaotic systems are deterministic systems that display unpredictable (or even random) conduct and reliance on the underlying conditions. Chaos is a standout amongst the most prominent phenomena that exist in nonlinear systems, whose activity is complex and like that of randomness [27]. Chaos theory concentrates on the conduct of systems that take after deterministic laws yet seem random and unpredictable, i.e., dynamical systems [16]. To be considered as chaotic, the dynamical system must fulfill the chaotic properties, for example, delicate to starting conditions, topologically blend, dense periodic orbits, ergodic, and stochastically intrinsic [28]. Chaotic components can experience all states in specific reaches as per their own particular consistency without redundancy [27]. Because of the ergodic and dynamic properties of chaos elements, chaos search is more fit for slope-climbing and getting away from local optima than the random search, and in this way, it has been applied for optimization [27]. It is generally perceived that chaos is a principal method of movement basic all common phenomena. A chaotic map is a guide that shows some sort of chaotic behavior [28]. The basic chaotic maps in the literature are:
    1. Sinusoidal map: represented by the Eq (21) [27]: (21) which creates chaotic numbers in the range (0, 1) with a = 2.3.
    2. Singer map: given in the Eq (22) [29]: (22) with xk ∈ (0,1) under the condition that x0 ∈ (0, 1), μ ∈ [0.9, 1.08].
thumbnail
Table 1. The leadership hierarchy of the grey wolves pack.

https://doi.org/10.1371/journal.pone.0158738.t001

2.2 The proposed chaos-based exploration rate

As specified in [30], meta-heuristic optimization strategies dependably utilize high-level data to guide its trial and error mechanism for discovering satisfactory solutions. All meta-heuristic algorithms apply particular tradeoff of randomization and local search. Two noteworthy parts control the performance of a meta-heuristic algorithm to be specific exploitation (intensification) and exploration (diversification) [30]. Diversification intends to produce various solutions in order to investigate the search space on the large scale while intensification intends to concentrate on the pursuit in a local region by exploiting the data that an immediate decent solution is found in this area. The good combination of these two components will guarantee that the global optimality is achievable [30].

Some optimizers switches between exploration and exploitation gave some randomly generated values, for example, bat algorithm (BA) [31] and flower pollination algorithm (FPA) [32]. Numerous late investigates done in the optimization area to use a single versatile equation to accomplish both diversification and intensification with various rates as in GWO [23], ALO [25], and MFO [26]. In the both cases, the optimizer begins with large exploration rate and iteratively decrements this rate to improve the exploitation rate. The exploration rate takes after linear shape in GWO [23] and MFO [26] while it keeps the piece-wise linear shape in ALO [25]. In spite of the fact that exploration rate functions demonstrated proficient for tackling various optimization problems, despite this methodology has the following disadvantages:

  1. Stagnation: once the optimizer approaches the end of the optimization process, it gets to be hard to get away from the local optima and discover better solutions since its investigation ability is turned out to be exceptionally constrained. This causes the algorithm to keep enhancing the solutions that have as of now been found, regardless of the possibility that they are local optima.
  2. Sub-optimal selection: toward the start of the optimization prepare, the optimizer has extremely exploration ability yet with this enhanced explorative force it might leave a promising locale to less encouraging because of the oscillation and extensive training rates; see Fig 1 for such case in 1D representation.
thumbnail
Fig 1. An example of leaving the promising region for the less promising one in 1-D case.

https://doi.org/10.1371/journal.pone.0158738.g001

These issues motivate using interleaved successive periods of exploration and exploitation all through the optimization time. In this way, when achieving a solution, the exploitation will be applied, followed by another exploration that permits the optimizer to jump to another promising region, then exploitation again to improve further the solution found. Chaotic systems with their intriguing properties, for example, topologically blend, ergodicity, and intrinsic stochasticity, which can be applied to adjust the exploration rate parameter, taking into account the required equilibrium between the diversification and intensification [16].

The diversification rate in the native GWO is called a and is linearly diminished from 2 to 0 throughout the course of iterations; as in the Eq (4). The a turns out to be under 1 to demonstrate the case that the wolf quit moving and approaches the prey. Along these lines, the following position of a wolf might be at any point between the wolf position and the prey position; see Fig 2 [23]. At the point when the estimation of a is greater than 1, wolves separate from each other to hunt down prey and merge to assault prey. Additionally, for the values greater than 1, the grey wolves are compelled to locate a fitter prey ideally; as in Fig 2 [23].

thumbnail
Fig 2. Exploration versus exploitation periods depending on the parameter a in GWO [23].

https://doi.org/10.1371/journal.pone.0158738.g002

In ALO, the exploration or exploitation are controlled by contracting the span of updating ant’s positions and imitates the ant sliding procedure inside the pits by using the parameter I as appeared in Fig 3. The span of ants’ random walks hypersphere is decreased adaptively, and this behavior can be mathematically modeling as in the Eq (23). (23) where t is the present iteration, T is the maximum number of iterations, w is a constant characterized taking into account the present iteration (w = 2 when t > 0.1T, w = 3 when t > 0.5T, w = 4 when t > 0.75T, w = 5 when t > 0.9T, and w = 6 when t > 0.95T). The constant w can conform the exactness level of exploitation.

thumbnail
Fig 3. The shrinking of the random walk limits as per the parameter I [25].

https://doi.org/10.1371/journal.pone.0158738.g003

In the case of MFO, the T parameter in the spiral equation characterizes how much the next position of the moth ought to be near (the corresponding) flame (T = −1 is the nearest position to the flame while T = 1 demonstrates the most remote one) as shown in Fig 4. The spiral equation permits a moth to fly around a flame and not as a matter of course in the space between them. In this manner, the exploration and exploitation of the search space can be ensured [26]. For switching between the exploration and exploitation operations, the parameter a is characterized and linearly decremented amid the optimization from -1 to -2 to control the furthest reaches of the T value. The definite moth position in a given dimension is computed as given in Eq (24). (24) where rand is a random number drawn from uniform distribution in the extent [0, 1] and a is computed as in the Eq (25). (25) where t is the iteration number and MaxIter is the total number of iteration.

thumbnail
Fig 4. The possible positions of a given moth with respect to the corresponding flame [26].

https://doi.org/10.1371/journal.pone.0158738.g004

In the proposed chaos adjustment of exploration rates, the adjustment of the parameter a in GWO is controlled by the given chaos function; see Fig 5. A comparative strategy is utilized to adjust the parameter I that confines the exploration rate in ALO; as in the Fig 6. While the same conduct is applied to MFO where the a parameter controls the parametric t value that decides the moth position regarding the corresponding flame; as shown in Fig 6. Two chaotic functions were used in this study to analyze their performance and impact on GWO, ALO, and MFO. The two chaotic functions have fit the progressive decrement of exploration and exploitation at successive periods are the Singer and Sinusoidal functions which are embraced here as examples.

thumbnail
Fig 5. The Singer exploration rate versus the native exploration used as a part of GWO, ALO, and MFO.

https://doi.org/10.1371/journal.pone.0158738.g005

thumbnail
Fig 6. The Sinusoidal exploration rate versus the native exploration used as a part of GWO, ALO, and MFO.

https://doi.org/10.1371/journal.pone.0158738.g006

2.3 Application of chaotic variants of GWO, ALO, and MFO for feature selection

In this section, the proposed chaotic variations are employed in feature selection for classification problems. For a feature vector sized N, the diverse feature subsets would be 2N which is an immense space of features to be searched exhaustively. Therefore, GWO, ALO, and MFO are used adaptively to explore the search space for best feature subset. The best feature subset is the one with greatest classification performance and least number of selected features. The fitness function applied as a part of the optimizer to assess every grey wolf/antlion/moth position is as given by the following Eq (26): (26) where γR(D) is the classification performance of condition feature set R with respect to choice D, R is the length of selected feature subset, C is the aggregate number of features, α and β are two parameters relating to the significance of classification performance and subset length, α ∈ [0, 1] and β = 1 − α.

We can highlight that the fitness function maximizes the classification performance; γR(D), and the proportion of the unselected features to the aggregate number of features; as in the term . The above equation can be effectively changed over into a minimization by using the error rate as opposed to the classification performance and the selected features proportion as opposed to the unselected feature size. The minimization problem can be formulated as in the following Eq (27): (27) where ER(D) is the error rate for the classifier of condition feature set, R is the length of selected feature subset, and C is the aggregate number of features. α ∈ [0, 1] and β = 1 − α are constants to control the significance of classification performance and feature selection; β = 0.01 in the current experiments.

The principle characteristic of wrapper-based in feature selection is the usage of the classifier as a guide during the feature selection procedure. K-nearest neighbor (KNN) is a supervised learning technique that classifies the unknown examples and determined based on the minimum distance from the unknown samples to the training ones. In the proposed system, KNN is used as a classifier to guarantee the goodness of the selected features [33].

3 Results and Discussion

3.1 Data description

21 datasets in Table 2 from the UCI machine learning repository [34] are used as a part of the analyses and examinations results. The data sets were chosen to have different numbers of features and instances of the various problems that the proposed algorithms will be tried on.

In this study, the wrapper-based approach for feature selection is using the KNN classifier. KNN is used in the experiments based on trial and error basis where the best decision of K is chosen (K = 5) as the best performing on all the data sets [33]. Through the training process, every wolf/antlion/moth position represents one solution (feature subset). The training set is used to assess the KNN on the validation set all through the optimization to guide the feature selection procedure. The test data are kept avoided the optimization and is let for final assessment. The global and optimizer-specific parameter setting is sketched out in Table 3. Every one of the parameters is set either as indicated by particular domain learning as the α, β parameters of the fitness function or based on trial and error on small simulations or regular in the literature.

3.2 Evaluation criteria

Singular data sets are divided randomly into three different similar parts to be specific: training, validation, and testing sets in cross-validation manner. The apportioning of the data is repeated for 20 times to guarantee stability and statistical significance of the results. In every run, the following measures are recorded from the validation data:

  • Classification performance: is a marker characterizes the classifier given the selected feature set and can be detailed as in the Eq (28). (28) where M is the number of times to run the optimization algorithm to choose the feature subset, N is the number of samples in the test set, Ci is the classifier label for data sample i, Li is the reference class label for data sample i, and Match is a function that yields 1 when the two input labels are the same and produces 0 when they are distinctive.
  • Mean fitness: is the average of solutions obtained from running an optimization algorithm for various M running that can be given in Eq (29). (29)
  • Standard deviation (std) fitness: is a representation for the variety of the acquired best solutions found for running a stochastic optimizer for M different times. Std is employed as a pointer for the optimizer stability and robustness, though std is smaller this implies that the optimizer always converges to the same solution; while bigger values for std mean many random results as in the Eq (30). (30) where Mean is the average characterized as in the Eq (29).
  • Selection features size: demonstrates the mean size of the selected features to the aggregate number of features as in the Eq (31). (31) where size(x) is the number of values for the vector x, and D is the number of selected features in data set.
  • Wilcoxon rank-sum test: the null hypothesis is that the two examples originate from the same population, so any distinction in the two rank aggregates comes just from the inspecting error. The rank-sum test is described as the nonparametric version of the t test for two independent gatherings. It tests the null hypothesis that data in x and y vectors are samples from continuous distributions with equivalent medians against the option that they are not [35].
  • T-test: is statistical importance demonstrates regardless of whether the contrast between two groups’ midpoints in all probability mirrors a “real” distinction in the population from which the groups were inspected [36].

The three embraced optimizers were evaluated against their corresponding chaos controlled one where the three optimizers; GWO, ALO, and MFO were employed with two chaotic functions (Sinusoidal and Singer) used as a part of the study. Also, the used optimizers and their chaotic variants are compared against one of the conventional optimizers namely particle swarm optimizer (PSO) [37].

3.3 Experimental results

Figs 7 and 8 outline the performance of the native ALO versus its chaotic one where the singer and the sinusoidal function is employed in all 21 data sets. We can observe from the figures that the chaotic ALO version [22] is superior in performance to the native ALO performance on most of the data set regardless of the used chaotic function. Comparable results are illustrated in Figs 9 and 10 applying the GWO. In the case of MFO, less enhance in performance is remarked where the chaotic and the native MFO have practically identical results as shown in Figs 11 and 12. The principle purpose behind this, a random value is utilized to choose the position of a moth that is joined with the used exploration rate, so it decreases its effect. Besides, the native MFO is relied upon to give variability in exploration toward the end stages of the optimization as found in Fig 13. Fig 13 demonstrates the conceivable positions for a moth in given dimension where the flame is situated at 1 while the moth is initially situated at 0 in that dimension at the different a values. We can highlight from the figure that it conceivable to provide exploration and exploitation at all the optimization stages and henceforth there is no improve in performance for the chaotic MFO.

thumbnail
Fig 7. Mean fitness values for the ALO versus CALO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g007

thumbnail
Fig 8. Mean fitness values for the ALO versus CALO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g008

thumbnail
Fig 9. Mean fitness values for the GWO versus CGWO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g009

thumbnail
Fig 10. Mean fitness values for the GWO versus CGWO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g010

thumbnail
Fig 11. Mean fitness values for the MFO versus CMFO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g011

thumbnail
Fig 12. Mean fitness values for the MFO versus CMFO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g012

thumbnail
Fig 13. Possible positions for a moth using the different values for the a parameter.

https://doi.org/10.1371/journal.pone.0158738.g013

Comparative conclusions and comments can be found in Figs 1419 that outline the performance of the selected features by the different optimizers on the test data. Moreover, we can remark that there is clear progress in performance for using chaotic versions with both ALO and GWO while the native MFO has practically identical performance on test data with it chaotic version. For assessing optimizer repeatability and capability to convergence to the global optimal regardless of the initial positioning of search agents, we have employed the standard deviation (std) of the acquired fitness function value over all the runs for the different optimizers. Figs 2022 outline the obtained std values for the native and chaotic variants of three used optimizers. We can highlight from the figures that the chaotic versions of the optimizers have practically comparable std values in spite of the fact that they rely on upon chaotic functions as opposed to systematic functions which motivate using such models.

thumbnail
Fig 14. Average classification performance for the ALO versus CALO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g014

thumbnail
Fig 15. Average classification performance for the ALO versus CALO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g015

thumbnail
Fig 16. Average classification performance for the GWO versus CGWO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g016

thumbnail
Fig 17. Average classification performance for the GWO versus CGWO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g017

thumbnail
Fig 18. Average classification performance for the MFO versus CMFO using the Singer function.

https://doi.org/10.1371/journal.pone.0158738.g018

thumbnail
Fig 19. Average classification performance for the MFO versus CMFO using the Sinusoidal function.

https://doi.org/10.1371/journal.pone.0158738.g019

thumbnail
Fig 20. Standard deviation (std) of the obtained optimal fitness values for ALO versus CALO using the Singer and Sinusoidal functions.

https://doi.org/10.1371/journal.pone.0158738.g020

thumbnail
Fig 21. Standard deviation (std) of the obtained optimal fitness values for GWO versus CGWO using the Singer and Sinusoidal functions.

https://doi.org/10.1371/journal.pone.0158738.g021

thumbnail
Fig 22. Standard deviation (std) of the obtained optimal fitness values for MFO versus CMFO using the Singer and Sinusoidal functions.

https://doi.org/10.1371/journal.pone.0158738.g022

The secondary objective in the used fitness function is minimizing the number of selected features. Tables 46. We can remark that the three optimizers with their variants can optimize the number of selected features as well as improve the classifier performance. We can notice that this secondary objective is much achieved in the case of using the chaotic variants as the chaos allows the algorithms to continue exploring the space for the better solution even in the final periods of the optimization process. For assessing the significance of performance of the optimizers and their corresponding chaotic variants both the t-test and the Wilixicon test were measured in Table 7. Table 7 demonstrates the significant advance in utilizing the chaotic version of the ALO and GWO at a significance level of 0.05 while there is no remarkable progress in using the chaotic variant of MFO. A comparable conclusion can be derived from remarking the p-value of the t-test where the chaotic variant of ALO and GWO are much clear while there is no reasonable improve in performance for using the chaotic variant of MFO.

thumbnail
Table 4. Average selection features size for the chaotic and native ALO over all the data sets.

https://doi.org/10.1371/journal.pone.0158738.t004

thumbnail
Table 5. Average selection features size for the chaotic and native GWO over all the data sets.

https://doi.org/10.1371/journal.pone.0158738.t005

thumbnail
Table 6. Average selection features size for the chaotic and native MFO over all the data sets.

https://doi.org/10.1371/journal.pone.0158738.t006

When using different weights of objectives α and β in the fitness function, we change the priority of each objective in the optimization process. Changing such weights affect the shape of the objective function and hence affects the performance of the used optimizers. The change in the shape of fitness function can result in appearance/disappearance of local minima and/or change of the global optima for such fitness function. Figs 2325 depict the impact of changing the weight factor α on the performance of individual optimizers. We set the range of change of the α parameter to the range [0.99, 0.9] and the β parameter is set as 1 − α as the rational setting. The figures display the obtained sum of fitness function values obtained by the 10 different employed optimizers over all the data sets used. We can view from the figures that the chaotic versions of the optimizers still performs well with changing the shape of the fitness function which prove the capability of such variant to tolerate local optimal and convergence to a global one.

thumbnail
Fig 23. Fitness sum over all the used data sets for CALO versus ALO and PSO at a different setting of α.

https://doi.org/10.1371/journal.pone.0158738.g023

thumbnail
Fig 24. Fitness sum over all the used data sets for CGWO versus GWO and PSO at a different setting of α.

https://doi.org/10.1371/journal.pone.0158738.g024

thumbnail
Fig 25. Fitness sum over all the used data sets for CMFO versus MFO and PSO at a different setting of α.

https://doi.org/10.1371/journal.pone.0158738.g025

4 Conclusions

In this study, two chaos functions were employed to control the exploration rate for three of the modern bio-inspired optimization algorithms. The fundamental assumption of the work evaluates the impact of using such chaos functions as a replacement for the originally used linear or quasi-linear functions. The study evaluates the proposed two chaotic versions of the algorithms as well as the original ones in the domain of machine learning for feature selection. The results were conducted on a set of data sets drawn from the UCI repository and were assessed using different assessment indicators. The study found that utilizing exploration with high rate toward the end stages of optimization offers the algorithm some assistance with avoiding the local optima and premature convergence. Additionally, using the exploitation with high rate at the beginning of optimization helps the algorithm to select the most promising regions in the search space. Henceforth, the paper proposes using successive periods of exploration and exploitation during the optimization time. The chaos functions are the illustration of such functions that can provide successive exploration and exploitation and thus it provides excellent performance when used with GWO and ALO.

Acknowledgments

This work was partially supported by the IPROCOM Marie Curie initial training network, funded through the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement No. 316555.

Author Contributions

Conceived and designed the experiments: EE HMZ. Performed the experiments: EE. Analyzed the data: HMZ. Contributed reagents/materials/analysis tools: HMZ EE. Wrote the paper: HMZ EE.

References

  1. 1. Wang D.F., Han P., Ren Q., “Chaos Optimization Variable Arguments PID Controller And Its Application To Main System Pressure Regulating System”, First International Conference On Machine Learning And Cybernetics, Beijing, November, 2002.
  2. 2. Tavazoei M.S., Haeri M. “An optimization algorithm based on chaotic behavior and fractal nature”, Computational and Applied Mathematics, Vol. 206, pp. 1070–1081, 2007.
  3. 3. Bucolo M., Caponetto R., Fortuna L., Frasca M., Rizzo A., “Does chaos work better than noise”, Circuits and Systems Magazine, IEEE, Vol. 2, No. 3, pp. 4–19, 2002.
  4. 4. Li B., Jiang W., “Optimization complex functions by chaos search”, Cybernetics and Systems, Vol. 29, No. 4, pp. 409–419, 1998.
  5. 5. Yang J.J., Zhou J.Z., Wu W., Liu F., Zhu C.J., Cao G.J., “A chaos algorithm based on progressive optimality and Tabu search algorithm”, International Conference on Machine Learning and Cybernetics, Vol. 5, pp. 2977–2981, 2005.
  6. 6. Chizi B., Rokach L., Maimon O., “A Survey of Feature Selection Techniques”, IGI Global, pp. 1888–1895, 2009.
  7. 7. Chandrashekar G., Sahin F., “A survey on feature selection methods”, Computers & Electrical Engineering, Vol. 40, No. 1, pp. 16–28, 2014.
  8. 8. Huang C.L., “ACO-based hybrid classification system with feature subset selection and model parameters optimization”, Neurocomputing, Vol. 73, pp. 438–448, 2009.
  9. 9. Kohavi R., John G.H., “Wrappers for feature subset selection”, Artificial Intelligent, Vol. 97, No. 1, pp. 273–324, 1997.
  10. 10. Chuang L.Y., Tsai S.W., Yang C.H., “Improved binary particle swarm optimization using catsh effect for attribute selection”, Expert Systems with Applications, Vol. 38, pp. 12699–12707, 2011.
  11. 11. Xue B., Zhang M., Browne W.N., “Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms”, Applied Soft Computing, Vol. 18, PP. 261–276, 2014.
  12. 12. Guyon I., Elisseeff A., “An introduction to variable and attribute selection”, Journal of Machine Learning Research, Vol. 3, pp. 1157–1182, 2003.
  13. 13. Whitney A., “A direct method of nonparametric measurement selection”, IEEE Trans. Comput. Vol. C-20, No. 9, pp. 1100–1103, 1971.
  14. 14. Marill T., Green D., “On the effectiveness of receptors in recognition systems”, IEEE Trans. Inf. Theory, Vol. IT-9, No. 1, pp. 11–17, 1963.
  15. 15. Bing X., Zhang M., Browne W.N., “Particle swarm optimization for feature selection in classification: a multi-objective approach”, IEEE transactions on cybernetics, Vol. 43, No. 6, pp. 1656–1671, 2013.
  16. 16. Hassanien A.E., Emary E., “Swarm Intelligence: Principles, Advances, and Applications”, CRC Press, Taylor & Francis Group, 2015.
  17. 17. Shoghian S., Kouzehgar M., “A Comparison among Wolf Pack Search and Four other Optimization Algorithms”, World Academy of Science, Engineering and Technology, Vol. 6, 2012.
  18. 18. Oh I.S., Lee J.S., Moon B.R., “Hybrid genetic algorithms for feature selection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 11, pp. 1424–1437, 2004.
  19. 19. Chen H., Jiang W., Li C., Li R., “A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm”, Mathematical Problems in Engineering, Vol. 2013, 2013.
  20. 20. Gholipour A., Araabi B.N., Lucas C., “Predicting chaotic time series using neural and neurofuzzy models: A comparative study”, Neural Processing Letters, Vol. 24, No. 3, pp. 217–239, 2006.
  21. 21. Landassuri-Moreno V., Marcial-Romero J.R., “Chaotic Time Series Prediction with Feature Selection Evolution”, Electronics, Robotics and Automotive Mechanics Conference, 2011.
  22. 22. Zawbaa H.M., Emary E., Grosan C., “Feature selection via chaotic antlion optimization”, PLoS ONE, Vol. 11, No. 3, e0150652, 2016.
  23. 23. Mirjalili S., Mirjalili S.M., Lewis A., “Grey Wolf Optimizer”, Advances in Engineering Software, Vol. 69, pp. 46–61, 2014.
  24. 24. Emary E., Zawbaa H.M., Grosan C., Hassanien A.E., “Feature subset selection approach by Gray-wolf optimization”, 1st International Afro-European Conference For Industrial Advancement, Ethiopia, November, 2014.
  25. 25. Mirjalili S., “The Ant Lion Optimizer”, Advances in Engineering Software, Vol. 83, pp. 83–98, 2015.
  26. 26. Mirjalili S., “Moth-Flame Optimization Algorithm: A Novel Nature-inspired Heuristic Paradigm”, Knowledge-Based Systems, Vol. 89, pp. 228–249, 2015.
  27. 27. Ren B., Zhong W., “Multi-objective optimization using chaos based PSO”, Information Technology, Vol. 10, No. 10, pp. 1908–1916, 2011.
  28. 28. Vohra R., Patel B., “An Efficient Chaos-Based Optimization Algorithm Approach for Cryptography”, Communication Network Security, Vol. 1, No. 4, pp. 75–79, 2012.
  29. 29. Aguirregabiria J.M., “Robust chaos with variable Lyapunov exponent in smooth one-dimensional maps”, Chaos Solitons & Fractals, Vol. 42, No. 4, pp. 2531–2539, 2014.
  30. 30. Yang X.S., “Nature-Inspired Metaheuristic Algorithms”, Luniver Press, 2nd edition, 2010
  31. 31. Yang X.S., “A New Metaheuristic Bat-Inspired Algorithm”, Nature Inspired Cooperative Strategies for Optimization (NISCO), Studies in Computational Intelligence, Springer, Vol. 284, pp. 65–74, 2010.
  32. 32. Yang X.S., “Flower pollination algorithm for global optimization”, Unconventional Computation and Natural Computation, Lecture Notes in Computer Science, Vol. 7445, pp. 240–249, 2012.
  33. 33. Yang C.S., Chuang L.Y., Li J.C., Yang C.H., “Chaotic Binary Particle Swarm Optimization for Feature Selection using Logistic Map”, IEEE Conference on Soft Computing in Industrial Applications, pp. 107–112, 2008.
  34. 34. Frank A., Asuncion A., “UCI Machine Learning Repository”, 2010.
  35. 35. Wilcoxon F., “Individual Comparisons by Ranking Methods”, Biometrics Bulletin, Vol. 1, No. 6. pp. 80–83, 1945.
  36. 36. Rice J.A., “Mathematical Statistics and Data Analysis”, 3rd edition, Duxbury Advanced, 2006.
  37. 37. Kennedy J., Eberhart R., “Particle swarm optimization”, IEEE International Conference on Neural Networks, Vo. 4, pp. 1942–1948, 1995.