Enhancing the effectiveness of Ant Colony Decision Tree algorithms by co-learning
Graphical abstract
Introduction
Our goal is to propose new approach for constructing more effective decision trees, concerning classification accuracy and decision tree growth. This solution is possible due to application of Ant Colony Optimization. Using Ant Colony algorithms in decision tree construction allows to construct variety of alternative decision trees presenting different local optima. In case of deterministic algorithms, each decision tree is constructed in the same way. Due to the pheromone updating rules, representing reinforcement learning schema, agent–ants construct better quality decision trees.
This collaborative view of learning can occur without interaction between the learning agents, known as ensemble learning, or with interaction during the learning stage, known as co-learning [1]. This mechanism is a good way to obtain better treatment results, and should always be taken into consideration in comprehensive point of view.
Results obtained during experimental study motivated us to propose new optimization criteria for decision tress evaluation. It is important in case of bi-criterion decision tree evaluation, where as heuristic function, as well as quality function we create via decision tree growth and classification accuracy. In this article we also discuss the decision forest as a more effective classification ensemble.
The additional aim of this article is to arrange information about ACDT algorithm as well as its performance the Ant Colony Decision Tree approach is firstly proposed by Boryczka and Kozak [2]. In the beginning this algorithms created binary decision trees for discrete attributes occurred in data sets. The comparative study with classical approach has shown that ACDT offered better results in context of accuracy of the classification. In the following papers Boryczka and Kozak [3] adjusted this approach to continuous values of attributes. They proposed the inequality test.
Boryczka et al. [4] have also undertaken construction the heterarchical algorithm of the ACDT approach with parallel implementation. Authors also tested the performance of the ACDT in practical, real-life data sets: H-Bond Data Set [5]. Otero et al. [6] proposed a modification of the ACDT approach by using a new splitting criterion derived from C4.5 approach in contrary to our proposition originated from CART algorithm.
This article is organized as follows: Section 1 comprises an introduction to the subject matter of this article. In Sections 2 Swarm Intelligence, 3 Ant Colony Optimization, the Swarm Intelligence and Ant Colony Optimization is introduced. Section 4 reviews Ant Colony Optimization in Data Mining. Decision Trees and Ant Colony Decision Trees are presented in Sections 5 Decision trees, 6 The Ant Colony Decision Trees (ACDT) approach. Section 7 focuses on the evaluation criteria of the constructed decision tree. Section 8 presents the experimental study that was conducted to evaluate the performance of the ACDT algorithm. Finally, we conclude with general remarks on this work, and some directions for future research are pointed out.
Section snippets
Swarm Intelligence
There has recently been a number of research studies regarding the application of Swarm Intelligence to various data-mining problems. Swarm Intelligence describes the ability of groups of decentralized and self-organizing agents to exhibit highly organized behaviours. These global behaviours of swarms often allow the swarm as a whole to accomplish tasks which are beyond the capabilities of any one of the constituent individuals. Following the appearance of the most recent books, (“Swarm
Ant Colony Optimization
In Ant Colony Optimization (ACO) a stochastic optimization strategy inspired by the collaborative path-finding behaviours that are exhibited by colonies of real ants is observed. In nature, ants are simple organisms that each possess very limited capabilities and, individually, are only able to accomplish the most simple of tasks. Amazingly, colonies of ants are able to collectively solve difficult problems that are far beyond the abilities of any single member of the group (manifested by
Ant Colony Optimization in Data Mining
Data mining (sometimes called data or knowledge discovery) involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining consists of more than collecting and managing data – it
Decision trees
Data mining, the science and technology of exploring data sets in order to discover previously unknown patterns, is a part of the overall process of knowledge discovery in databases. In data mining, a decision tree is a predictive model which can be used to represent both classifiers and regression models. When a decision tree is used for classification tasks, it is referred to as a classification tree; when it is used for regression tasks it is called a regression tree.
A decision tree is used
The Ant Colony Decision Trees (ACDT) approach
It is interesting to note that the ACDT algorithm is mainly based on the standard version of Ant Colony Optimization. A number of slight modifications has been introduced both as a new discrete optimization algorithm for constructing decision trees and as a new metaheuristics approach in data mining procedures. The presented modifications are introduced in the main transition rule and are treated as an improvement of the quality of the classification mechanism. So, we have employed a classical
The evaluation criteria of the constructed decision tree
In order to compare the influence of the decision tree growth evaluation method (indirectly – the quality) constructed by agent–ants we show three proposals. All of the approaches differ from the classical version by the formula that calculates decision tree growth (Eq. (7)), whereas (6) does not undergo change.
The proposed changes were supposed to increase the stability of the created decision trees. In this way the following solutions are presented:
- •
decision tree growth is determined by
Experiments
A range of experiments was conducted to test the quality of effectiveness in the ACDT algorithm. First we will describe our experimental methodology and explain its motivation. Then we will present and discuss our results. In this section we will consider an experimental study performed for the following adjustments. We performed 50 experiments for each data set. Each experiment included 300 generations with a population size of the ant colony that was equal to 20. A comparative study of the
Conclusions
The quality evaluation of decision trees constructed by agent–ants is a significant aspect of ACDT assessment. This problem seems to be a bi-criteria optimization problem. The method of evaluation as well as parameter ψ changes are the most important aspects in this approach.
The problem of tuning parameter ϕ values constitutes a decisive role in this analysis. The experiment results confirm that this algorithm is suitable for a variety of applications. The ϕ parameter gives the possibility of
References (41)
- et al.
Inducing decision trees with an ant colony optimization algorithm
Appl. Soft Comput.
(2012) Finding groups in data: cluster analysis with ants
Appl. Soft Comput.
(2009)- et al.
Diversification for better classification trees
Comput. Oper. Res.
(2006) - et al.
Constructing optimal binary decision trees is NP-complete
Inf. Process. Lett.
(1976) - et al.
Delegating classifiers
21st International Conf. on Machine Learning
(2004) - et al.
Ant colony decision trees – a new method for constructing decision trees based on ant colony optimization
Computational Collective Intelligence: Technologies and Applications. Vol. 6421 of LNCS
(2010) - et al.
An adaptive discretization in the ACDT algorithm for continuous attributes
Computational Collective Intelligence: Technologies and Applications. Vol. 6923 of LNCS
(2011) - et al.
Heterarchy in constructing decision trees – parallel ACDT
Trans. Comp. Collect. Intell.
(2013) - et al.
Dynamic version of the ACDT/ACDF algorithm for H-bond data set analysis
- et al.
Swarm Intelligence
(2001, April)
Swarm Intelligence: From Natural to Artificial Systems
Swarm intelligence in cellular robotic systems
La reconstruction du nid et les coordinations inter-individuelles chez bellicositermes natalensis et cubitermes sp. La theorie de la stigmerie
Insects Soc.
Positive feedback as a search strategy. Tech. Rep. 91-016
Optimization, learning and natural algorithms (Ph.D. thesis)
New Ideas in Optimization
Ant algorithms for distributed discrete optimization
Artif. Life
Ant Colony Optimization
Special issue on ant colony optimization
Swarm Intell.
New Ideas in Optimization
Cited by (28)
Steering the interpretability of decision trees using lasso regression - an evolutionary perspective
2023, Information SciencesOptimization of decision trees using modified African buffalo algorithm
2022, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :Decision trees are prone to over-fitting, under-fitting and local decisions (Otero et al., 2012; Boryczka and Kozak, 2015). To improve and globally optimize the decision tree, various meta-heuristic algorithms, including genetic algorithm and ant colony optimization are used in the literature (Otero et al., 2012; Boryczka and Kozak, 2015, 2010; Parpinelli et al., 2002; Jr et al., 1307; Yang, 2014; Odili et al., 2017). All these algorithms are parametric and require reinforcement to achieve better results and make them faster.
Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study
2017, Applied Soft Computing JournalCitation Excerpt :A general recommendation for the migration fraction is 10% of the subpopulation size and 20 generations for the migration interval [33]. The collaborative learning that occurs during the learning phase may improve the effectiveness of various optimization techniques in classification [34]. Co-evolution is a cross-species interaction on a shared fitness landscape that is based on a predator–prey or host–parasite relationship.
Collective data mining in the ant colony decision tree approach
2016, Information SciencesCitation Excerpt :Thus far, the use of ACO in the process of constructing decision trees was described in [2]; these were preliminary versions of the ACDT algorithm. The ACDT algorithm continued to develop; the latest version was published in [4]. Otero et al. [35] proposed a different algorithm based on ACO for decision tree induction.
The role of decision tree representation in regression problems – An evolutionary perspective
2016, Applied Soft Computing Journal