Honey Bees Mating Optimization algorithm for financial classification problems

https://doi.org/10.1016/j.asoc.2009.09.010Get rights and content

Abstract

Nature inspired methods are approaches that are used in various fields and for the solution for a number of problems. This study uses a nature inspired method, namely Honey Bees Mating Optimization, that is based on the mating behaviour of honey bees for a financial classification problem. Financial decisions are often based on classification models which are used to assign a set of observations into predefined groups. One important step towards the development of accurate financial classification models involves the selection of the appropriate independent variables (features) which are relevant for the problem at hand. The proposed method uses for the feature selection step, the Honey Bees Mating Optimization algorithm while for the classification step, Nearest Neighbor based classifiers are used. The performance of the method is tested in a financial classification task involving credit risk assessment. The results of the proposed method are compared with the results of a particle swarm optimization algorithm, an ant colony optimization, a genetic algorithm and a tabu search algorithm.

Introduction

Several biological and natural processes have been influencing the methodologies in science and technology in an increasing manner in the past years. Feedback control processes, artificial neurons, the DNA molecule description and similar genomics matters, studies of the behaviour of natural immunological systems, and more, represent some of the very successful domains of this kind in a variety of real world applications. During the last decade, nature inspired intelligence becomes increasingly popular through the development and utilization of intelligent paradigms in advanced information systems design. Cross-disciplinary team-based thinking attempts to cross-fertilize engineering and life science understanding into advanced inter-operable systems. The methods contribute to technological advances driven by concepts from nature/biology including advances in structural genomics (intelligent drug design through imprecise data bases), mapping of genes to proteins and proteins to genes (one-to-many and many-to-one characteristics of naturally occurring organisms), modelling of complete cell structures (showing modularity and hierarchy), functional genomics (handling of hybrid sources and heterogeneous and inconsistent origins of disparate databases), self-organization of natural systems, etc. Among the most popular nature inspired approaches, when the task is optimization within complex domains of data or information, are those methods representing successful animal and micro-organism team behaviour, such as swarm or flocking intelligence (birds flocks or fish schools inspired particle swarm optimization [1]), artificial immune systems (that mimic the biological one [2], [3]), or ant colonies (ants foraging behaviours gave rise to ant colony optimization [4], [5]), etc.

In the recent few years a number of swarm intelligence algorithms, based on the behaviour of the bees have been presented [6]. These algorithms are divided, mainly, in two categories according to their behaviour in the nature, the foraging behaviour and the mating behaviour. The most important approaches that simulate the foraging behaviour of the bees are the Artificial Bee Colony (ABC) algorithm proposed by Karaboga and Basturk [7], [8], the Virtual Bee algorithm proposed by Yang [9], the Bee Colony Optimization algorithm proposed by Teodorovic and Dell’Orco [10], the BeeHive algorithm proposed by Wedde et al. [11], the Bee Swarm Optimization algorithm proposed by Drias et al. [12] and the Bees algorithm proposed by Pham et al. [13]. The Artificial Bee Colony algorithm [7], [8] is, mainly, applied in continuous optimization problems and simulates the waggled dance behaviour that a swarm of bees perform during the foraging process of the bees. In this algorithm there are three groups of bees, the employed bees (bees that determines the food source (possible solutions) from a prespecified set of food sources and share this information (waggle dance) with the other bees in the hive), the onlookers bees (bees that based on the information that they take from the employed bees they search for a better food source in the neighborhood of the memorized food sources) and the scout bees (employed bees that their food source has been abandoned and they search for a new food source randomly). The Virtual Bee algorithm [9] is, also, applied in continuous optimization problems. In this algorithm, the population of the bees are associated with a memory, a food source, and then all the memories communicate between them with a waggle dance procedure. The whole procedure is similar with a genetic algorithm and it has been applied on two function optimization problems with two parameters. In the BeeHive [11] algorithm, a protocol inspired from dance language and foraging behaviour of honey bees is used. In the Bees Swarm Optimization [12], initially a bee finds an initial solution (food source) and from this solution the other solutions are produced with certain strategies. Then, every bee is assigned in a solution and when they accomplished their search, the bees communicate between them with a waggle dance strategy and the best solution will become the new reference solution. To avoid cycling the authors use a tabu list. In the Bees algorithm [13], a population of initial solutions (food sources) are randomly generated. Then, the bees are assigned to the solutions based on their fitness function. The bees return to the hive and based on their food sources, a number of bees are assigned to the same food source in order to find a better neighborhood solution. In the Bee Colony Optimization [10] algorithm, a step by step solution is produced by each forager bee and when the foragers returns to the hive a waggle dance is performed by each forager. Then the other bees, based on a probability, follow the foragers. This algorithm looks like the Ant Colony Optimization [5] algorithm but it does not use at all the concept of pheromone trails.

Contrary to the fact that there are many algorithms that are based on the foraging behaviour of the bees, the main algorithm proposed based on the marriage behaviour is the Honey Bees Mating Optimization algorithm (HBMO), that was presented [14], [15]. Since then, it has been used on a number of different applications [16], [17], [18]. The Honey Bees Mating Optimization algorithm simulates the mating process of the queen of the hive. The mating process of the queen begins when the queen flights away from the nest performing the mating flight during which the drones follow the queen and mate with her in the air. The algorithm is a swarm intelligence algorithm since it uses a swarm of bees where there are three kinds of bees, the queen, the drones and the workers. There is a number of procedures that can be applied inside the swarm. In the Honey Bees Mating Optimization algorithm, the procedure of mating of the queen with the drones is described. From this point of view someone would classify the HBMO algorithm as a memetic algorithm, since we have an elitist genetic algorithm where the queen plays the role of “super-parent”. But of course this method is not a simple memetic algorithm because in this algorithm we have a number of details that differentiate the HBMO algorithm from a simple memetic algorithm. First, the queen is flying randomly in the air and, based on her speed and her energy, if she meets a drone then there is a possibility to mate with him. Even if the queen mates with the drone, she does not create directly a brood but stores the genotype (with the term “genotype” we mean some of the basic characteristics of the drones, i.e. part of the solution) of the drone in her spermatheca and the brood is created only when the mating flight has been completed. Another difference of the proposed algorithm from a memetic algorithm is that the broods are not created by using one queen and one drone but each brood uses parts of the solutions (genotype) of the one queen and more than one drones. In our proposed algorithm except of this classic procedure that has been used from the researchers in the previous published algorithms based on Honey Bees Mating Optimization [14], [15], [16], [17], [18], we use also an adaptive memory procedure in order the queen to have the possibility to store from previous selected good drones (in previous mating flights) part of their solutions in order to use them in a new mating flight and to produce more fittest drones. Another difference from a classic memetic algorithm is the role of the workers. Someone could say that since the role of the workers is simply the brood care and they are only a local search phase in the algorithm, then we have one of the basic characteristics of the memetic algorithms (a genetic algorithm with a local search phase [19]). But here we have a strict parallelism of the local search phase with what happens in real life, i.e. with the foraging behaviour of the honey bees. We mean that each one of the workers, which are different honey bees, takes care one brood in order to find for him food and feed him with the “royal jelly” in order to make him fittest and if this brood is better than the queen to take her place. And thus, this algorithm combines both the mating process of the queen and one part of the foraging behaviour of the honey bees inside the hive.

This paper presents a novel approach to solve the Feature Subset Selection Problem using Honey Bees Mating Optimization Algorithm. In the classification phase of the proposed algorithm, a number of variants of the Nearest Neighbor classification method are used [20]. The algorithm is used for the credit risk assessment classification task that is a very challenging and important management science problem in the domain of financial analysis [21]. Modern finance is a broad field often involved with hard decision-making problems related to risk management. In several cases, financial decision-making problems require the assignment of the available options into predefined groups/classes. Credit risk analysis, bankruptcy prediction, and country risk assessment, among other are some typical examples [22]. In this context the development of reliable classification models is clearly of major importance to researchers and practitioners. The development of financial classification models is a complicated process, involving careful data collection and pre-processing, model development, validation and implementation. Focusing on model development, several methods have been used, including statistical methods, artificial intelligence techniques and operations research methodologies. In all cases, the quality of the data is a fundamental point. This is mainly related to the adequacy of the sample data in terms of the number of observation and the relevance of the decision attributes (i.e., independent variables) used in the analysis. During the last years the application of nature inspired methods to financial problems have been developed [23], [24], [25], [26], [27], [28]. More precisely, in [24] a procedure that utilizes a genetic algorithm in order to solve the Feature Subset Selection Problem is presented and is combined with a number of Nearest Neighbor based classifiers. The genetic based classification algorithm is applied for the solution of the credit risk assessment classification problem. In [25], a memetic algorithm, which is based on the concepts of genetic algorithms and particle swarm optimization is presented. Contrary to genetic algorithms, in this algorithm the evolution of each individual of the population is performed using a particle swarm optimization algorithm. The memetic-based classification algorithm is combined with a number of nearest neighbor based classifiers and is tested in a very significant financial classification task, involving the identification of qualified audit reports. In [26] a tahu search metaheuristic combined with a number of Nearest Neighbor classifiers is applied for the solution of the credit risk assessment problem while in [27] an ant colony optimization algorithm combined with a number of Nearest Neighbor classifiers for the solution of the same financial classification problem is presented. All the methods presented in [24], [25], [26], [27] gave very satisfactory results and compared to other classic metaheuristic algorithms their results were in all cases better than the results of the classic metaheuristic algorithms.

The rest of the paper is organized as follows: the next section provides a short description of the feature selection problem. In Section 3, a detailed analysis of the proposed algorithm is presented. Section 4 describes the applications context using the aforementioned financial data sets and the experimental settings, whereas Section 5 presents the obtained computational results. The last section concludes the paper and discusses some future research directions.

Section snippets

Feature selection problem

Recently, there has been an increasing need for novel data-mining methodologies that can analyze and interpret large volumes of data. The proper selection of the right set of features for classification is one of the most important problems in designing a good classifier. Feature selection is widely used as the first stage of classification task to reduce the dimension of problem, decrease noise, improve speed by the elimination of irrelevant or redundant features. The basic feature selection

The Honey Bees Mating Optimization algorithm

In this paper, as it has already been mentioned, an algorithm for the solution to the feature selection problem based on the Honey Bees Mating Optimization is presented. This algorithm is combined with three Nearest Neighbor-based classifiers, the 1-Nearest Neighbor, the k-Nearest Neighbor and the Weighted k (wk)-Nearest Neighbor classifier. A pseudocode of the proposed Honey Bees Mating Optimization based classification algorithm is presented in the following and then an analytical description

Application

The nature inspired algorithm is applied to a financial classification problem which is related to credit risk assessment. The data, taken from Doumpos and Pasiouras [46] involve 1330 firm-year observations for UK non-financial firms, over the period 1999–2001. The sample observations are classified into five risk groups according to their level of likelihood of default, measured on the basis of their QuiScore, a credit rating assigned by Qui Credit Assessment Ltd. In particular, on the basis

Computational results

The algorithms were implemented in Fortran 90 (Lahey f95 compiler) on a Centrino Mobile Intel Pentium M750/1.86 GHz, running Suse Linux 9.1. To test the efficiency of the proposed method, the 10-fold cross-validation procedure is utilized. Initially, the data set is divided into 10 disjoint groups containing approximately M/10 samples each, where M is the number of the samples in the data set. Next, each of these groups is systematically removed from the data set, a model is built from the

Conclusions and future research

An important issue in building a good classifier is the selection of a set of appropriate input feature variables. The Honey Bees Mating Optimization algorithm has been proposed in this study for solving this Feature Subset Selection Problem. Three different classifiers were used for the classification problem, based on the nearest neighbor classification rule. The performance of the proposed algorithm was tested using financial data involving credit risk assessment. The obtained results

References (47)

  • J. Kennedy et al.

    Particle swarm optimization

    Proceedings of 1995 IEEE International Conference on Neural Networks

    (1995)
  • L.N. De Castro et al.

    Artificial Immune Systems: A New Computational Intelligence Approach

    (2002)
  • M. Dorigo et al.

    Ant colony system: a cooperative learning approach to the traveling salesman problem

    IEEE Transactions on Evolutionary Computation

    (1997)
  • M. Dorigo et al.

    Ant Colony Optimization, A Bradford Book

    (2004)
  • A. Baykasoglu et al.

    Artificial bee colony algorithm and its application to generalized assignment problem

  • D. Karaboga et al.

    A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm

    Journal of Global Optimization

    (2007)
  • X.S. Yang

    Engineering optimizations via nature-inspired virtual bee algorithms

  • D. Teodorovic et al.

    Bee colony optimization—a cooperative learning approach to complex transportation problems

    Advanced OR and AI Methods in Transportation

    (2005)
  • H.F. Wedde et al.

    BeeHive: an efficient fault-tolerant routing algorithm inspired by honey bee behavior

  • H. Drias et al.

    Cooperative bees swarm for solving the maximum weighted satisfiability problem

    IWAAN International Work Conference on Artificial and Natural Neural Networks LNCS 3512

    (2005)
  • D.T. Pham et al.

    The bees algorithm—a novel tool for complex optimisation problems

  • H.A. Abbass

    A monogenous MBO approach to satisfiability

  • Cited by (55)

    • A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid)

      2017, Swarm and Evolutionary Computation
      Citation Excerpt :

      In literature, many methods based on evolutionary and SI algorithms like GA, ACO, BCO, DE and PSO have been proposed for optimizing the problem of Feature Selection [7–18]. Variant and hybrid forms of SI algorithms have also been attempted for FS optimization [28–38]. M.Dorigo and his colleagues introduced Ant Colony Optimization (ACO) in the early 1990s [19].

    • An Adaptive Bumble Bees Mating Optimization algorithm

      2017, Applied Soft Computing Journal
    • A hybrid simultaneous perturbation artificial bee colony and back-propagation algorithm for training a local linear radial basis neural network on ore grade estimation

      2017, Neurocomputing
      Citation Excerpt :

      Karaboga and Akay presented a survey of the algorithms based on the bee swarm intelligence, and their applications [30]. The ABC algorithm is used to solve several optimization, classification, and neural network problems [28,30,31]. The ABC algorithm is used to train neural networks in order to alleviate some drawbacks of traditional training algorithms, such as becoming trapped in local minima [32].

    View all citing articles on Scopus
    View full text