Elsevier

Swarm and Evolutionary Computation

Volume 49, September 2019, Pages 62-74
Swarm and Evolutionary Computation

Particle swarm optimization of deep neural networks architectures for image classification

https://doi.org/10.1016/j.swevo.2019.05.010Get rights and content

Highlights

  • An algorithm to search for optimal convolutional neural networks based on particle swarm optimization is proposed.

  • A variable length encoding scheme is devised allowing the use of an almost standard particle swarm optimization algorithm.

  • Particles are allowed to shrink or increase in size without any upper bound limits.

  • Novel PSO operators are proposed to avoid the need of using real-valued vectors during the optimization process.

  • Convolutional neural networks produced by the algorithm perform competitively in comparison with state-of-the-art models.

Abstract

Deep neural networks have been shown to outperform classical machine learning algorithms in solving real-world problems. However, the most successful deep neural networks were handcrafted from scratch taking the problem domain knowledge into consideration. This approach often consumes very significant time and computational resources. In this work, we propose a novel algorithm based on particle swarm optimization (PSO), capable of fast convergence when compared with others evolutionary approaches, to automatically search for meaningful deep convolutional neural networks (CNNs) architectures for image classification tasks, named psoCNN. A novel directly encoding strategy and a velocity operator were devised allowing the optimization use of PSO with CNNs. Our experimental results show that psoCNN can quickly find good CNN architectures that achieve quality performance comparable to the state-of-the-art designs.

Introduction

In recent years, deep neural networks (DNNs) have become the gold standard algorithm in the field of computer vision. More specifically, deep convolutional neural networks (CNNs) can obtain the best results in almost all image classification benchmarks, surpassing the classification capabilities of human experts [1,2]. However, it remains non-trivial to design a meaningful CNN architecture. For example, VGG16 [3], Inception [4], ResNet [5], and DenseNet [6], some of the most successful CNNs, were carefully handcrafted by taking into consideration the characteristics of the problem domain knowledge.

The authors of the VGG16 neural network were the first ones to show that deeper networks with small convolutional filters could achieve better results than those of shallow networks. The Inception's authors showed that the concept of network-in-network [7], where each layer of the neural network is another network itself, could be used to build networks deeper while being computationally efficient. ResNets use shortcut connections, also called skip or residual connections, to connect a layer's inputs to its outputs. Instead of computing a mapping between inputs and outputs, ResNets compute the residual mapping which makes the training procedure easier. On the other hand, DenseNets connect the outputs of each layer to every subsequent layer resembling the connectivity pattern of fully-connected neural networks. The connectivity pattern used by ResNet and DenseNet avoids the problem of vanishing gradient allowing deeper networks with thousand of layers. However, these and many others improvements in CNN architectures were achieved by trial and error. Thus, the creation of better CNNs remains a heuristic process that needs many insights about the specific problem domain, and, therefore, no structured means of designing them has been devised.

However, one can take inspiration from nature. The structure of animals' brains has evolved during billions of years through the mechanism of natural selection. Such approach is equally applicable in the field of Artificial Neural Networks (ANNs), and it is known as NeuroEvolution. This approach allows us to evolve either ANNs’ architectures and weights. In its early inception, in the 1990s, NeuroEvolution was used specifically to update weights in a fixed ANN architecture avoiding many of the shortcomings produced by backpropagation. Back then, researchers did not have access to large enough data to train ANNs with backpropagation which frequently produced overfitting and local minima problems [[8], [9], [10]]. However, training ANNs with evolutionary algorithms has one major drawback: it takes longer to find a good set of weights than with backpropagation [11]. Thus, soon after the introduction of evolutionary algorithms for training fixed architectures ANNs, researchers began to use such algorithms to evolve weights and architectures at the same time. Such algorithms are known as Topology and Weight Evolving Artificial Neural Networks (TWEANNs). The NeuroEvolution of Augmenting Topologies (NEAT) created by Stanley and Miikkulainen in 2002 [12] is one of the most popular TWEANNs. NEAT starts with a basic single layer neural network and evolves it into more complex networks. NEAT is also capable of creating recurrent neural networks and avoids premature convergence by keeping a diverse number of neural networks through a speciation mechanism. The Evolutionary Acquisition of Neural Topologies (EANT) created by Siebel and Sommer in 2005 [13] is another example of TWEANN. Similar to NEAT, EANT starts with simple ANN architectures that grow more complex with each iteration of the algorithm. It contains two inner loops: one called structural exploration that tests new architectures through mutation, and another called structural exploitation that optimize weights using evolution strategy.

Due to the highly dimensional nature of neural networks, NeuroEvolution, however, has only been applied to shallow networks used mostly in reinforcement learning. For example, NEAT directly encodes ANNs which is not suitable for very complex networks because the computation quickly becomes intractable. Stanley et al. [14] tried to solve this problem with the Hypercube-Based NeuroEvolution of Augmenting Topologies (HyperNEAT) which makes use of connective Compositional Pattern Producing Networks (connective CPPNs) together with NEAT to indirectly encode neural networks. HyperNEAT takes into consideration the problem geometry when evolving neural networks, and, as an indirect encoding method, it can produce networks with millions of connections. However, neural networks produced by HyperNEAT cannot match the best CNN architectures produced by human experts. As a result, it is often considered better to use HyperNEAT as a feature extractor to other machine learning algorithms [15]. In 2017, researchers from Google developed the Large-Scale Evolution of Images Classifiers (LSEIC) algorithm which was able to overcome the limitations of traditional NeuroEvolution methods, and obtained state-of-the-art results in many benchmark datasets used with DNNs [16]. Liu et al. was also able to evolve DNNs achieving state-of-the-art results when compared with human-designed ones using hierarchical representations and mutation only [17]. Jin et al. developed AutoKeras in 2018 [18] which is also capable of searching for CNN architectures with state-of-the-art results. More recently, Sun et al. [19,20] developed the Evolving Deep Convolutional Neural Network (evoCNN), and the Evolving Unsupervised Deep Neural Network (EUDNN) algorithms to overcome the limitations of HyperNEAT. On evoCNN, a genetic algorithm is used to search for CNN architectures with specific crossover and mutation operators. While on EUDNN, a set of basis vectors is used to encode weights and connections efficiently. However, all those algorithms require huge amounts of computational resources that many researchers from other fields may not have access.

Particle Swarm Optimization (PSO) is another nature-inspired algorithm that can be used in the search for optimal neural networks architectures. Similarly to genetic algorithms, one can use PSO to evolve weights and architectures of neural networks. One of the earliest works using PSO to train an ANN was developed by Gudise and Venayagamoorthy in 2003 [21]. They showed that ANNs trained faster with PSO than with traditional backpropagation. Similarly, in 2007, Carvalho and Ludermir [22,23] developed two different PSO algorithms to both search for better architectures and train ANNs. They showed that PSO could also be used to improve ANN architectures producing competitive results when compared with other methods. However, the PSO algorithm developed therein and other works [[24], [25], [26], [27]] can only search for optimal architectures of fully connected neural networks which are not suitable for image classification tasks. To overcome this limitation, Sun et al. [28] developed a PSO-based algorithm to automatically construct convolutional autoencoders (CAEs) and obtained state-of-the-art performance in multiples image classification datasets. Nevertheless, to the best of our knowledge, the work done by Wang et al. was the first one to develop an algorithm to directly evolve CNN architectures based on PSO, called IPPSO [29]. In their work, the particle encoding is inspired by computer networks, where each layer is assigned an IP address allowing the use of standard PSO. However, their algorithm can only deal with particles up to a preset maximum length, and their results are severely limited to only three different image classification datasets.

The main objective of this paper is to address the problem of searching for CNN architectures for image classification with a good balance between searching speed and classification accuracy. Therefore, we present our own implementation of PSO to address such a problem, which is called here as psoCNN. Our main contributions are the following:

  • 1.

    A novel PSO algorithm is proposed that can search for optimal architectures in deep convolutional neural networks using variable-length particles with virtually no size limitation. Particles are allowed to grow in size without an upper bound.

  • 2.

    A novel difference operator is presented that allows two particles with a different number of layers and parameters to be compared and allows particles' velocity to be updated without using real-valued encoding schemes.

  • 3.

    A novel velocity operator is devised that can be used to modify a given CNN architecture to resemble the global best individual in the swarm or its personal best configuration. This velocity operator allows us to employ an almost standard PSO algorithm for searching, avoiding the use of multi-dimensional PSO algorithms.

  • 4.

    A faster algorithm to find meaningful CNN architectures than previously available ones. PSO has been shown to converge quickly than genetic algorithms. By combining its fast convergence with the ability to search for CNN architectures, the proposed algorithm can exceed state-of-the-art results taking less time than competing algorithms.

The remainder of this paper is organized as follows: a detailed background and motivation about PSO and CNNs are presented in Section 2. The proposed psoCNN algorithm is described in details in Section 3. The experimental design, such as datasets, peer competitors models and algorithm parameters, is presented in Section 4. The experimental results and a comparison with chosen algorithms are presented in Section 5. For last, Section 6 concludes this paper and proposes directions for future works.

Section snippets

Particle swarm optimization

Particle Swarm Optimization (PSO) is a meta-heuristic algorithm often used in discrete, continuous and combinatorial optimization problems. It was first developed by Kennedy and Eberhart in 2001 [30]. It is inspired by the flying pattern of a flock of birds. In the context of PSO, a single solution is called a particle, and the collection of all solutions is called a swarm. The main idea in PSO is that each particle only has knowledge about its current velocity, its own best configuration

Proposed algorithm

The proposed algorithm receives as inputs parameters related to the problem in question, such as the training data that will be used and parameters related to the CNN architectures that will be created, such as the particles’ maximum number of layers during initialization. The general framework of the proposed psoCNN can be seen in Algorithm 1. In the proposed algorithm, the global best particle is based on the best blocks found in the swarm by following the PSO algorithm. Thus, there is no

Experimental design

The proposed algorithm, psoCNN, is tested with state-of-the-art image classification datasets. The following subsections present in details the datasets, peer competitors models and the algorithm parameters used in our experiments. In addition, we are making the code of the proposed algorithm publicly available to other researchers by request allowing the reproduction of the results presented in this paper.

. Initialization of the swarm in the proposed psoCNN (InitializeSwarm()).

Experimental results and analysis

This section presents the results obtained with the psoCNN algorithm compared with other deep learning algorithms, and a discussion of the results. Our reported results for each of the datasets were obtained using only the test set. This ensures that the algorithm can learn and generalize well.

Conclusion

In this work, we propose a novel algorithm to search for deep convolutional neural networks (CNNs) architectures based on particle swarm optimization (psoCNN). A novel directly encoding strategy is also proposed in which a CNN architecture is divided into two blocks: one block contains only convolutional and pooling layers, while the other contains only fully connected layers. This encoding strategy allows for variable length CNN architectures to be compared and combined using an almost

Acknowledgments

This work is partially supported by the National Council for Scientific and Technological Development (CNPq, Brazil) grant 203076/2015-0. It used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (NSF, USA) grant number ACI-1548562, and the Bridges system, which is supported by National Science Foundation (NSF, USA) award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). It also used the OSU High Performance

References (49)

  • M. Lin et al.

    Network in Network

    (2013)
  • M. Schoenauer et al.

    Neuro-genetic truck backer-upper controller

  • E. Ronald et al.

    Genetic lander: an experiment in accurate neuro-genetic control

  • H. Kitano

    Empirical studies on the speed of convergence of neural network training using genetic algorithms

  • K.O. Stanley et al.

    Evolving neural networks through augmenting Topologies

    Evol. Comput.

    (2002)
  • N.T. Siebel et al.

    Evolutionary reinforcement learning of artificial neural networks

    Int. J. Hybrid Intell. Syst.

    (2007)
  • K.O. Stanley et al.

    A hypercube-based encoding for evolving large-scale neural networks

    Artif. Life

    (2009)
  • P. Verbancsics et al.

    Generative Neuroevolution for Deep Learning

    (2013)
  • E. Real et al.

    Large-scale evolution of image classifiers

  • H. Liu et al.

    Hierarchical Representations for Efficient Architecture Search

    (2017)
  • H. Jin et al.

    Efficient Neural Architecture Search with Network Morphism

    (2018)
  • Y. Sun et al.

    Evolving deep convolutional neural networks for image classification

    IEEE Trans. Evol. Comput.

    (2018)
  • Y. Sun et al.

    Evolving unsupervised deep neural networks for learning meaningful representations

    IEEE Trans. Evol. Comput.

    (2019)
  • V. Gudise et al.

    Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks

  • Cited by (287)

    View all citing articles on Scopus
    View full text