Particle swarm optimization of deep neural networks architectures for image classification
Introduction
In recent years, deep neural networks (DNNs) have become the gold standard algorithm in the field of computer vision. More specifically, deep convolutional neural networks (CNNs) can obtain the best results in almost all image classification benchmarks, surpassing the classification capabilities of human experts [1,2]. However, it remains non-trivial to design a meaningful CNN architecture. For example, VGG16 [3], Inception [4], ResNet [5], and DenseNet [6], some of the most successful CNNs, were carefully handcrafted by taking into consideration the characteristics of the problem domain knowledge.
The authors of the VGG16 neural network were the first ones to show that deeper networks with small convolutional filters could achieve better results than those of shallow networks. The Inception's authors showed that the concept of network-in-network [7], where each layer of the neural network is another network itself, could be used to build networks deeper while being computationally efficient. ResNets use shortcut connections, also called skip or residual connections, to connect a layer's inputs to its outputs. Instead of computing a mapping between inputs and outputs, ResNets compute the residual mapping which makes the training procedure easier. On the other hand, DenseNets connect the outputs of each layer to every subsequent layer resembling the connectivity pattern of fully-connected neural networks. The connectivity pattern used by ResNet and DenseNet avoids the problem of vanishing gradient allowing deeper networks with thousand of layers. However, these and many others improvements in CNN architectures were achieved by trial and error. Thus, the creation of better CNNs remains a heuristic process that needs many insights about the specific problem domain, and, therefore, no structured means of designing them has been devised.
However, one can take inspiration from nature. The structure of animals' brains has evolved during billions of years through the mechanism of natural selection. Such approach is equally applicable in the field of Artificial Neural Networks (ANNs), and it is known as NeuroEvolution. This approach allows us to evolve either ANNs’ architectures and weights. In its early inception, in the 1990s, NeuroEvolution was used specifically to update weights in a fixed ANN architecture avoiding many of the shortcomings produced by backpropagation. Back then, researchers did not have access to large enough data to train ANNs with backpropagation which frequently produced overfitting and local minima problems [[8], [9], [10]]. However, training ANNs with evolutionary algorithms has one major drawback: it takes longer to find a good set of weights than with backpropagation [11]. Thus, soon after the introduction of evolutionary algorithms for training fixed architectures ANNs, researchers began to use such algorithms to evolve weights and architectures at the same time. Such algorithms are known as Topology and Weight Evolving Artificial Neural Networks (TWEANNs). The NeuroEvolution of Augmenting Topologies (NEAT) created by Stanley and Miikkulainen in 2002 [12] is one of the most popular TWEANNs. NEAT starts with a basic single layer neural network and evolves it into more complex networks. NEAT is also capable of creating recurrent neural networks and avoids premature convergence by keeping a diverse number of neural networks through a speciation mechanism. The Evolutionary Acquisition of Neural Topologies (EANT) created by Siebel and Sommer in 2005 [13] is another example of TWEANN. Similar to NEAT, EANT starts with simple ANN architectures that grow more complex with each iteration of the algorithm. It contains two inner loops: one called structural exploration that tests new architectures through mutation, and another called structural exploitation that optimize weights using evolution strategy.
Due to the highly dimensional nature of neural networks, NeuroEvolution, however, has only been applied to shallow networks used mostly in reinforcement learning. For example, NEAT directly encodes ANNs which is not suitable for very complex networks because the computation quickly becomes intractable. Stanley et al. [14] tried to solve this problem with the Hypercube-Based NeuroEvolution of Augmenting Topologies (HyperNEAT) which makes use of connective Compositional Pattern Producing Networks (connective CPPNs) together with NEAT to indirectly encode neural networks. HyperNEAT takes into consideration the problem geometry when evolving neural networks, and, as an indirect encoding method, it can produce networks with millions of connections. However, neural networks produced by HyperNEAT cannot match the best CNN architectures produced by human experts. As a result, it is often considered better to use HyperNEAT as a feature extractor to other machine learning algorithms [15]. In 2017, researchers from Google developed the Large-Scale Evolution of Images Classifiers (LSEIC) algorithm which was able to overcome the limitations of traditional NeuroEvolution methods, and obtained state-of-the-art results in many benchmark datasets used with DNNs [16]. Liu et al. was also able to evolve DNNs achieving state-of-the-art results when compared with human-designed ones using hierarchical representations and mutation only [17]. Jin et al. developed AutoKeras in 2018 [18] which is also capable of searching for CNN architectures with state-of-the-art results. More recently, Sun et al. [19,20] developed the Evolving Deep Convolutional Neural Network (evoCNN), and the Evolving Unsupervised Deep Neural Network (EUDNN) algorithms to overcome the limitations of HyperNEAT. On evoCNN, a genetic algorithm is used to search for CNN architectures with specific crossover and mutation operators. While on EUDNN, a set of basis vectors is used to encode weights and connections efficiently. However, all those algorithms require huge amounts of computational resources that many researchers from other fields may not have access.
Particle Swarm Optimization (PSO) is another nature-inspired algorithm that can be used in the search for optimal neural networks architectures. Similarly to genetic algorithms, one can use PSO to evolve weights and architectures of neural networks. One of the earliest works using PSO to train an ANN was developed by Gudise and Venayagamoorthy in 2003 [21]. They showed that ANNs trained faster with PSO than with traditional backpropagation. Similarly, in 2007, Carvalho and Ludermir [22,23] developed two different PSO algorithms to both search for better architectures and train ANNs. They showed that PSO could also be used to improve ANN architectures producing competitive results when compared with other methods. However, the PSO algorithm developed therein and other works [[24], [25], [26], [27]] can only search for optimal architectures of fully connected neural networks which are not suitable for image classification tasks. To overcome this limitation, Sun et al. [28] developed a PSO-based algorithm to automatically construct convolutional autoencoders (CAEs) and obtained state-of-the-art performance in multiples image classification datasets. Nevertheless, to the best of our knowledge, the work done by Wang et al. was the first one to develop an algorithm to directly evolve CNN architectures based on PSO, called IPPSO [29]. In their work, the particle encoding is inspired by computer networks, where each layer is assigned an IP address allowing the use of standard PSO. However, their algorithm can only deal with particles up to a preset maximum length, and their results are severely limited to only three different image classification datasets.
The main objective of this paper is to address the problem of searching for CNN architectures for image classification with a good balance between searching speed and classification accuracy. Therefore, we present our own implementation of PSO to address such a problem, which is called here as psoCNN. Our main contributions are the following:
- 1.
A novel PSO algorithm is proposed that can search for optimal architectures in deep convolutional neural networks using variable-length particles with virtually no size limitation. Particles are allowed to grow in size without an upper bound.
- 2.
A novel difference operator is presented that allows two particles with a different number of layers and parameters to be compared and allows particles' velocity to be updated without using real-valued encoding schemes.
- 3.
A novel velocity operator is devised that can be used to modify a given CNN architecture to resemble the global best individual in the swarm or its personal best configuration. This velocity operator allows us to employ an almost standard PSO algorithm for searching, avoiding the use of multi-dimensional PSO algorithms.
- 4.
A faster algorithm to find meaningful CNN architectures than previously available ones. PSO has been shown to converge quickly than genetic algorithms. By combining its fast convergence with the ability to search for CNN architectures, the proposed algorithm can exceed state-of-the-art results taking less time than competing algorithms.
The remainder of this paper is organized as follows: a detailed background and motivation about PSO and CNNs are presented in Section 2. The proposed psoCNN algorithm is described in details in Section 3. The experimental design, such as datasets, peer competitors models and algorithm parameters, is presented in Section 4. The experimental results and a comparison with chosen algorithms are presented in Section 5. For last, Section 6 concludes this paper and proposes directions for future works.
Section snippets
Particle swarm optimization
Particle Swarm Optimization (PSO) is a meta-heuristic algorithm often used in discrete, continuous and combinatorial optimization problems. It was first developed by Kennedy and Eberhart in 2001 [30]. It is inspired by the flying pattern of a flock of birds. In the context of PSO, a single solution is called a particle, and the collection of all solutions is called a swarm. The main idea in PSO is that each particle only has knowledge about its current velocity, its own best configuration
Proposed algorithm
The proposed algorithm receives as inputs parameters related to the problem in question, such as the training data that will be used and parameters related to the CNN architectures that will be created, such as the particles’ maximum number of layers during initialization. The general framework of the proposed psoCNN can be seen in Algorithm 1. In the proposed algorithm, the global best particle is based on the best blocks found in the swarm by following the PSO algorithm. Thus, there is no
Experimental design
The proposed algorithm, psoCNN, is tested with state-of-the-art image classification datasets. The following subsections present in details the datasets, peer competitors models and the algorithm parameters used in our experiments. In addition, we are making the code of the proposed algorithm publicly available to other researchers by request allowing the reproduction of the results presented in this paper.
Experimental results and analysis
This section presents the results obtained with the psoCNN algorithm compared with other deep learning algorithms, and a discussion of the results. Our reported results for each of the datasets were obtained using only the test set. This ensures that the algorithm can learn and generalize well.
Conclusion
In this work, we propose a novel algorithm to search for deep convolutional neural networks (CNNs) architectures based on particle swarm optimization (psoCNN). A novel directly encoding strategy is also proposed in which a CNN architecture is divided into two blocks: one block contains only convolutional and pooling layers, while the other contains only fully connected layers. This encoding strategy allows for variable length CNN architectures to be compared and combined using an almost
Acknowledgments
This work is partially supported by the National Council for Scientific and Technological Development (CNPq, Brazil) grant 203076/2015-0. It used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (NSF, USA) grant number ACI-1548562, and the Bridges system, which is supported by National Science Foundation (NSF, USA) award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). It also used the OSU High Performance
References (49)
- et al.
Using genetic search to exploit the emergent behavior of neural networks
Phys. Nonlinear Phenom.
(1990) - et al.
Evolutionary artificial neural networks by multi-dimensional particle swarm optimization
Neural Netw.
(2009) - et al.
A hybrid particle swarm optimizationback-propagation algorithm for feedforward neural network training
Appl. Math. Comput.
(2007) - et al.
Fast convergence particle swarm optimization for functions optimization
Procedia Technol.
(2012) - et al.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
- et al.
Batch normalization: accelerating deep network training by reducing internal covariate shift
- et al.
Very Deep Convolutional Networks for Large-Scale Image Recognition
(2014) - et al.
Going deeper with convolutions
- et al.
Deep residual learning for image recognition
- et al.
Densely connected convolutional networks
Network in Network
Neuro-genetic truck backer-upper controller
Genetic lander: an experiment in accurate neuro-genetic control
Empirical studies on the speed of convergence of neural network training using genetic algorithms
Evolving neural networks through augmenting Topologies
Evol. Comput.
Evolutionary reinforcement learning of artificial neural networks
Int. J. Hybrid Intell. Syst.
A hypercube-based encoding for evolving large-scale neural networks
Artif. Life
Generative Neuroevolution for Deep Learning
Large-scale evolution of image classifiers
Hierarchical Representations for Efficient Architecture Search
Efficient Neural Architecture Search with Network Morphism
Evolving deep convolutional neural networks for image classification
IEEE Trans. Evol. Comput.
Evolving unsupervised deep neural networks for learning meaningful representations
IEEE Trans. Evol. Comput.
Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks
Cited by (287)
Modified symbiotic organisms search optimization for automatic construction of convolutional neural network architectures
2024, Intelligent Systems with ApplicationsVideo Deepfake classification using particle swarm optimization-based evolving ensemble models
2024, Knowledge-Based SystemsA novel framework to assess apple leaf nitrogen content: Fusion of hyperspectral reflectance and phenology information through deep learning
2024, Computers and Electronics in AgriculturepsoResNet: An improved PSO-based residual network search algorithm
2024, Neural NetworksEnsemble strategy using particle swarm optimisation variant and enhanced local search capability
2024, Swarm and Evolutionary ComputationResfEANet: ResNet-fused External Attention Network for Tuberculosis Diagnosis using Chest X-ray Images
2024, Computer Methods and Programs in Biomedicine Update