Designing architectures of convolutional neural networks to solve practical problems

doi:10.1016/j.eswa.2017.10.052

Expert Systems with Applications

Volume 94, 15 March 2018, Pages 205-217

https://doi.org/10.1016/j.eswa.2017.10.052 Get rights and content

Highlights

•
Our approach aims to support the estimation of Convolutional Neural Network (CNN) parameters.
•
It intends to produce simpler CNN, reducing the complexity.
•
This estimation was based on False Nearest Neighbors method.
•
Caffe deep learning framework was used to conduct the training of CNN.
•
Our results are comparable even to very complex and empirical CNN architectures.

Abstract

The Convolutional Neural Network (CNN) figures among the state-of-the-art Deep Learning (DL) algorithms due to its robustness to support data shift, scale variations, and its capability of extracting relevant information from large-scale input data. However, setting appropriate parameters to define CNN architectures is still a challenging issue, mainly to tackle real-world problems. A typical approach consists in empirically assessing different CNN settings in order to select the most appropriate one. This procedure has clear limitations, including the choice of suitable predefined configurations as well as the high computational cost involved in evaluating each of them. This work presents a novel methodology to tackle the previously mentioned issues, providing mechanisms to estimate effective CNN configurations, including the size of convolutional masks (convolutional kernels) and the number of convolutional units (CNN neurons) per layer. Based on the False Nearest Neighbors (FNN), a well-known tool from the area of Dynamical Systems, the proposed method helps estimating CNN architectures that are less complex and produce good results. Our experiments confirm that architectures estimated through the proposed approach are as effective as the complex ones defined by empirical and computationally intensive strategies.

Introduction

The vast amount of data currently available has fostered the development of methodologies capable of processing and extracting meaningful features to assist the interpretation, understanding, and solution of complex problems. In this context, the area of Deep Learning (DL) has emerged as a main alternative to analyze massive data, presenting breakthrough results in tasks such as speech recognition (Graves, Mohamed, & Hinton, 2013), machine translation (Luong, Sutskever, Le, Vinyals, & Zaremba, 2014), and data classification (Lauer, Suen, Bloch, 2007, LeCun, Bottou, Bengio, Haffner, 1998, Sharif Razavian, Azizpour, Sullivan, Carlsson, 2014, Zhou, Lapedriza, Xiao, Torralba, Oliva, 2014).

Deep Learning algorithms operate in multiple levels, each of which composed of a set of regression models that involve linear and nonlinear components. The combination of multiple models makes possible the representation of complex functions (LeCun, Bengio, & Hinton, 2015). Most DL algorithms resemble Artificial Neural Networks, such as the Multilayer Perceptron (Haykin & Network, 2004), where input vectors are processed throughout consecutive layers containing operation units to emphasize or inhibit features (LeCun, Bottou, Bengio, Haffner, 1998, Sharif Razavian, Azizpour, Sullivan, Carlsson, 2014).

A particularly important DL algorithm is the so-called Convolutional Neural Network (CNN), which has gained prestige mainly due to its good performance in computer vision tasks (Oquab, Bottou, Laptev, Sivic, 2015, Scherer, Müller, Behnke, 2010a, Scherer, Schulz, Behnke, 2010b, Sharif Razavian, Azizpour, Sullivan, Carlsson, 2014, Zhou, Lapedriza, Xiao, Torralba, Oliva, 2014), and its feature extraction ability from time-dependent data such as audio and video (Karpathy, Toderici, Shetty, Leung, Sukthankar, Fei-Fei, 2014, Osadchy, Cun, Miller, 2007, Schluter, Bock, 2014). However, the performance of CNN strongly depends on its architecture, including the number of layers, units per layer, and convolutional mask sizes.

Moreover, an architecture that works well for a given problem may not be appropriate when dealing with different data types or tasks. An usual alternative to remedy this issue is to evaluate different CNN architectures, choosing the one with best performance. Such procedure clearly bears a number of drawbacks: (i) the training and evaluation of a single architecture is already computationally intensive, thus the overall assessment of several of those may be unfeasible to many scenarios; and (ii) the empirically defined architectures may not be appropriate for the problem under consideration (Lappas, Chen, 2009, Menotti, Chiachia, Falcao, Oliveira Neto, 2014), thus acceptable results may never be reached.

A strategy commonly employed to avoid the assessment of multiple CNN settings considers an additional training stage to tune weights associated with convolutional units. This also increases the computational burden and requires thousands/millions of examples to produce reasonable results (Lauer, Suen, Bloch, 2007, LeCun, Bottou, Bengio, Haffner, 1998, Simard, Steinkraus, Platt, 2003). Another strategy employed is the ensemble of CNNs, which usually allows a reduction in the maximum number of iterations at the cost of training more architectures to obtain relevant results. It is also time consuming though, threatening its application into production environments (Ciregan, Meier, & Schmidhuber, 2012).

In this work, we present a novel methodology to assist the definition of CNN architectures that differs substantially from the described alternatives. Specifically, our approach analyzes the input and output images produced by convolutional operations at each CNN layer in order to estimate the adequate dimensions for the convolutional masks, and to suitable set the number of convolutional units per layer. In addition, motivated by the Occam’s razor problem-solving principle (Blumer, Ehrenfeucht, Haussler, & Warmuth, 1987), we also aim to design as simple as possible, but yet efficient, architectures to address target problems.

The faster convergence of simpler CNN architectures is here demonstrated using the Statistical Learning Theory, most specifically as a result of the Chernoff bound and the Hoeffiding inequality (Devroye, Györfi, Lugosi, 2013, Vapnik, 2013, Von Luxburg, & Schölkopf, 2011). Eq. (1) presents the condition that ensures learning bounds, in which $N (F, 2 n)$ corresponds to the Shattering coefficient, $F$ is the set of all possible functions provided by an algorithm (a.k.a. algorithm bias), and n is the sample size (here two samples with n elements are provided). The Shattering coefficient is indeed a function of n which defines the maximum number of admissible functions contained in $F$ that produce distinct classifications considering the worst possible sample organization with size n (Smola & Schölkopf, 1998). $\frac{\log N (F, 2 n)}{n} \to 0$

As already discussed in Von Luxburg and Schölkopf (20011), the Shattering coefficient can be approximated to an² when considering just one neuron of some deep network (see Eq. (2)), for some constant a > 0. Thus, Eq. (3) provides the Shattering coefficient for a given deep architecture composed of k units in total. $\frac{\log N (F, 2 n)}{n} = \frac{\log a n^{2}}{n}$ $\frac{\log N (F, 2 n)}{n} = \frac{\log {(a n^{2})}^{k}}{n} = \frac{k \log a n^{2}}{n}$

The sample size required to ensure convergence is defined in Eq. (4), in which R(f) represents the real risk (the expected value for the loss function, in range [0, 1]), $R_{emp}$ corresponds to the empirical risk (the average loss computed on a given sample, in range [0, 1]), 0 < ϵ < 1 is a threshold that indicates an acceptable divergence limit for risks, $F$ is the set of all functions provided by some supervised learning algorithm, n is the sample size. Thus, the convergence of some algorithm for an given network architecture is here analyzed in terms of its number of units and sample size. Eq. (5) defines the generalization probability, in which $P (\sup_{f \in F} | R (f) - R_{emp} | > ϵ)$ represents the probability of an algorithm does not generalize. So, the main goal of the Statistical Learning Theory, as proved by Vapnik (2013), is to make sure term $2 {(a n^{2})}^{k} e^{- n ϵ / 4}$ converges to zero, as a greater sample size is provided to guarantee generalization. $P (\sup_{f \in F} | R (f) - R_{emp} | > ϵ) \leq 2 N (F, 2 n) e^{- n ϵ / 4}$ $P (\sup_{f \in F} | R (f) - R_{emp} | > ϵ) \leq 2 {(a n^{2})}^{k} e^{- n ϵ / 4}$

From Eq. (5), it is possible to conclude that additional CNN units directly require greater sample sizes in order to ensure the right-side term of such inequality approaches zero. Fig. 1 illustrates the generalization probability produced according to Eq. 5, in which the sample size n varies from 1 to 1 million, the number of network units k is set as {10, 50, 100, 250}, $a = 10^{- 10}$ and $ϵ = 0.05$ (5% percent of divergence is accepted between the expected and the empirical risks). Convergences initiate when curves start decaying, but they only occur when an enough sample size is provided, so that it tends to zero.

The Shattering coefficient is the main responsible for holding back the convergence and make it require more data examples, as shown in Fig. 1. The greater the complexity of such Shattering coefficient, the longer it behaves as an exponential function, only after a great enough sample size it is dominated by the other Chernoff terms and, finally, decreases and converges. In addition, the theoretical convergence could not be illustrated for any network with 250 or more neurons: (i) first of all, term a should be smaller to make the Chernoff bound approximate zero; (ii) secondly, it tended to infinity as more data examples were provided; and finally, (iii) not even using 1 million examples it could converge in theory. We believe this theoretical demonstration is necessary and enough to confirm the need for designing simpler (less neurons and layers) architectures.

Based on this theoretical formulation after Vapnik (2013), it is obvious that large CNN networks require greater samples to guarantee the learning convergence and, finally, prove a good classifier was obtained. Thus, the goal of this paper is to estimate adequate parameters in order to design simpler and yet efficient CNN architectures, producing networks that converge faster with a reasonable number of training examples. In summary, we propose a method to estimate the adequate dimensions for the convolutional masks (convolutional kernels), and the number of convolutional units (CNN neurons) at each layer of a CNN architecture for general purpose classification tasks.

Motivated by the False Nearest Neighbors (FNN) (Kennel, Brown, & Abarbanel, 1992), a well-known tool from the area of Dynamical Systems, our method analyzes the input and output images produced by the convolutional operation of each CNN layer in order to estimate the adequate dimensions for the convolutional masks, and the suitable number of convolutional units per layer. In more details, this analysis takes each image and builds up vectors to embed data into high-dimensional spaces in attempt to increase the recurrence levels. Recurrences are here associated with the prevalent and most similar patterns happening in a given input image.

The CNN architectures produced by our approach were compared against more complex ones, using the Caffe deep learning framework (Jia et al., 2014). Four datasets were used: i) CMU Face Images (Roweis & Saul, 2000), that contains images of human faces; ii) MNIST, that is a dataset of handwritten digits; iii) Columbia University Image Library, which is referred as COIL-100 (Nene, Nayar, & Murase, 1996), that contains object images divided in 100 classes; and iv) German Traffic Sign Recognition Benchmark, which is referred as GTSRB (Stallkamp, Schlipsing, Salmen, & Igel, 2012), that is a dataset of traffic sign images.

Experimental results confirm our CNN architectures have error rates similar with the ones listed in the literature, but presenting a significant lower complexity (less layers and units per layer). Therefore, our methodology turns out to be a viable alternative to design simpler CNN architectures that are faster to be trained and less prone to produce overfitting (Cogswell, Ahmed, Girshick, Zitnick, & Batra, Lappas, Chen, 2009).

This paper is organized as follows: Section 2 discusses about Deep Learning approaches and their results, with particular attention to the datasets used in our experiments; Section 3 uses Linear Algebra to formulate CNN operations; Section 4 addresses the original False Nearest Neighbors (FNN) method, proposed by Kennel et al. (1992); in Section 5, we describe all necessary FNN modifications in order to make possible the estimation of CNN architectures; Section 6 shows experimental results performed on the four selected datasets in order to evaluate the mask sizes and the number of convolutional units for a CNN with one, two, and three convolutional layers; Section 7 presents the concluding remarks and perspectives for future work.

Section snippets

Related work

The good performance of DL algorithms in classification problems has motivated their use in several domains, in particular to tackle the handwritten digit recognition, where the MNIST dataset ¹ is considered one of the main benchmarks. Among the best results reported for it, LeCun, Haffner, Bottou, and Bengio (1999) achieved 0.95% of error rate using a CNN with the LeNet-5 architecture, reducing to 0.8% after

Convolutional neural network

The Convolutional Neural Network (CNN) is a Deep Learning algorithm designed to process multidimensional data, such as signals, images and videos (LeCun et al., 2015), and extract relevant features even in the presence of noise, shifting, rescaling and other types of data distortions (Goodfellow, Courville, Bengio (2016), LeCun, Bengio, 1995, LeCun, Bottou, Bengio, Haffner, 1998). CNN is a multilayer network, and each layer is composed of units responsible for different types of operations,

False nearest neighbors

Takens (1981) proposed a methodology to unfold time-dependent data into multidimensional spaces, also referred to as phase spaces, which make easier the identification of recurrences, thus simplifying tasks such as modeling and forecasting. The method embeds a time series $X = {x_{0}, \dots, x_{n}}$ in the phase space by producing states in the form $x_{i} (m, d) = (x_{i}, x_{i + d}, \dots, x_{i + (m - 1) d}),$ where parameter m defines the dimension of the phase space (also called embedding dimension) and d is the time delay. Although very

Adapting the false nearest neighbors method

We noticed that the vector representation ${\vec{v}}_{i, j} = [I_{i - x_{1}, j - y_{1}},$ $\dots, I_{i, j}, \dots, I_{i - x_{m}, j - y_{n}}]$ used by CNN on a local region of some image I (Section 3) is obtained after an application of the Takens’ immersion theorem considering two embedding dimensions: the embedding dimension along rows (M), and the embedding dimensions over columns (N), which together define the dimensionality of vectors ${\vec{v}}_{i, j}$ (see Section 3). Thus, we adapted the False Nearest Neighbors (FNN) method in order to properly estimate

Experiments

This section presents the datasets considered, then we show the setup for our approach based on the False Nearest Neighbors method, and for the Convolutional Neural Network (CNN). Next, we present the results that our FNN approach produced when evaluating the training examples contained in each dataset, while: i) assessing the mask sizes for convolutional units; and ii) varying the number of convolutional units. Then, the best mask sizes and number of units found were used to perform

Conclusions

Studies on deep Learning have usually considered very complex CNN architectures, containing many layers and convolutional units in attempt to improve classification tasks. However, most of those lack in terms of justifying the parametrization considered. In fact, most of them simply analyze several CNN settings to empirically find an adequate architecture. What they probably miss is that those settings may not be enough to provide simpler and adequate architectures to tackle practical problems,

Acknowledgements

We would like to thank Prof. Moacir Ponti for reviewing this work as well as for his suggestions. This paper is supported by CAPES, Brazil, under grant no. 7901561/D, by CNPq, Brazil, under grants 03051/2014-0 and 302643/2013-3, and by FAPESP, Brazil, under grant no. 2011/22749-8, 2012/17961-0 and 2014/13323-5. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the CAPES, CNPq or FAPESP.

References (60)

A. Blumer et al.
Occam’s razor
Information Processing Letters
(1987)
Y. LeCun
Learning invariant feature hierarchies
Computer vision–ECCV 2012. Workshops and demonstrations
(2012)
G. Strang
Linear algebra and its applications
(1988)
S. Albelwi et al.
A framework for designing the architectures of deep convolutional neural networks
Entropy
(2017)
K.T. Alligood et al.
Chaos in differential equations
Chaos
(1997)
L. Bottou
Stochastic gradient learning in neural networks
Proceedings of Neuro-Nımes
(1991)
D. Ciregan et al.
Multi-column deep neural networks for image classification
Computer vision and pattern recognition (CVPR), 2012 IEEE conference on
(2012)
Cogswell, M., Ahmed, F., Girshick, R.B., Zitnick, L., & Batra, D. (2015). Reducing overfitting in deep networks by...
Daumé IIIH. et al.
Frustratingly easy domain adaptation
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
(2010)
J. Dean et al.
Large scale distributed deep networks
Advances in neural information processing systems
(2012)

L. Devroye et al.

A probabilistic theory of pattern recognition

(2013)

B.A. Garro et al.

Designing artificial neural networks using particle swarm optimization algorithms

Computational Intelligence and Neuroscience

(2015)

I.J. Goodfellow et al.

Maxout networks

ICML

(2013)

Goodfellow, I., Bengio, Y., & Courville, A., (2016). Deep learning. Book in preparation for MIT Press....

A. Graves et al.

Speech recognition with deep recurrent neural networks

2013 IEEE international conference on acoustics, speech and signal processing

(2013)

S. Hassairi et al.

Supervised image classification using deep convolutional wavelets network

Tools with artificial intelligence (ICTAI), 2015 IEEE 27th international conference on

(2015)

S. Haykin et al.

A comprehensive foundation

Neural Networks

(2004)

G.E. Hinton et al.

A fast learning algorithm for deep belief nets

Neural Computation

(2006)

F.J. Huang et al.

Large-scale learning with SVM and convolutional for generic object categorization

Computer vision and pattern recognition, 2006 IEEE computer society conference on

(2006)

R. Ihaka et al.

R: a language for data analysis and graphics

Journal of computational and graphical statistics

(1996)

K. Jarrett et al.

What is the best multi-stage architecture for object recognition?

Computer vision, 2009 IEEE 12th international conference on

(2009)

Y. Jia et al.

Caffe: Convolutional architecture for fast feature embedding

Proceedings of the ACM international conference on multimedia

(2014)

H. Kantz et al.

Nonlinear time series analysis

(2004)

A. Karpathy et al.

Large-scale video classification with convolutional neural networks

Proceedings of the IEEE conference on computer vision and pattern recognition

(2014)

M.B. Kennel et al.

Determining embedding dimension for phase-space reconstruction using a geometrical construction

Physical review A

(1992)

G. Lappas et al.

Neural networks and multimedia datasets: Estimating the size of neural networks for achieving high classification accuracy

Wseas international conference. Proceedings. Mathematics and computers in science and engineering

(2009)

F. Lauer et al.

A trainable feature extractor for handwritten digit recognition

Pattern Recognition

(2007)

Y. LeCun et al.

Convolutional networks for images, speech, and time series

The handbook of brain theory and neural networks

(1995)

Y. LeCun et al.

Deep learning

Nature

(2015)

Y. LeCun et al.

Gradient-based learning applied to document recognition

Proceedings of the IEEE

(1998)

Cited by (32)

IntelliSwAS: Optimizing deep neural network architectures using a particle swarm-based approach
2022, Expert Systems with Applications
Citation Excerpt :
Neural architecture search algorithms aim at searching for and discovering efficient neural network architectures in an automated manner. These algorithms are designed to find CNNs (Baker, Gupta, Naik, & Raskar, 2017; Ferreira, Corrêa, Nonato, & de Mello, 2018; Liu et al., 2018; Zoph, Vasudevan, Shlens, & Le, 2018), RNNs (Bayer, Wierstra, Togelius, & Schmidhuber, 2009; Nistor, Moca, & Nistor, 2020; Rawal & Miikkulainen, 2018) or can be applied for searching for both of these two types of networks (Pham, Guan, Zoph, Le, & Dean, 2018; Zoph & Le, 2016). While a larger number of NAS algorithms were proposed in the recent years for optimizing the structures of neural networks for given tasks, the idea of automating the selection of the structure and the hyperparameters of machine learning algorithms has existed for a long period of time.
Deep learning is in a continuous evolution and many domains benefit from this substantial progress in the development of intelligent solutions. While this progress has been swift, there are more and more applications and the specific requirements of each application domain entails much work done by researchers to adapt existing models or create new ones. This traditional approach has produced many successful designs, but recently, automated methods of finding neural network architectures emerged. We introduce IntelliSwAS approach for optimizing deep neural network architectures for a classification or regression task. Image classification was selected as the task, but IntelliSwAS is generic enough such that it could be successfully applied on other classification or regression problems. A particle swarm-based optimization algorithm is proposed for the automatic search for convolutional neural network architectures. The search technique is enhanced by a machine learning model (DAGRNN) which we designed for predicting the quality of the network architectures and thus increasing the performance of the algorithm. The proposed model is able to process data structured as directed acyclic graphs in general and we applied it specifically on network architectures. The network architecture that we discovered using IntelliSwAS surpassed 89.8% of the competing image classification models that we considered for comparison.
A machine learning approach for forecasting hierarchical time series
2021, Expert Systems with Applications
In this paper, we propose a machine learning approach for forecasting hierarchical time series. When dealing with hierarchical time series, apart from generating accurate forecasts, one needs to select a suitable method for producing reconciled forecasts. Forecast reconciliation is the process of adjusting forecasts to make them coherent across the hierarchy. In literature, coherence is often enforced by using a post-processing technique on the base forecasts produced by suitable time series forecasting methods. On the contrary, our idea is to use a deep neural network to directly produce accurate and reconciled forecasts. We exploit the ability of a deep neural network to extract information capturing the structure of the hierarchy. We impose the reconciliation at training time by minimizing a customized loss function. In many practical applications, besides time series data, hierarchical time series include explanatory variables that are beneficial for increasing the forecasting accuracy. Exploiting this further information, our approach links the relationship between time series features extracted at any level of the hierarchy and the explanatory variables into an end-to-end neural network providing accurate and reconciled point forecasts. The effectiveness of the approach is validated on three real-world datasets, where our method outperforms state-of-the-art competitors in hierarchical forecasting.
AI applications to medical images: From machine learning to deep learning
2021, Physica Medica
Artificial intelligence (AI) models are playing an increasing role in biomedical research and healthcare services. This review focuses on challenges points to be clarified about how to develop AI applications as clinical decision support systems in the real-world context.
A narrative review has been performed including a critical assessment of articles published between 1989 and 2021 that guided challenging sections.
We first illustrate the architectural characteristics of machine learning (ML)/radiomics and deep learning (DL) approaches. For ML/radiomics, the phases of feature selection and of training, validation, and testing are described. DL models are presented as multi-layered artificial/convolutional neural networks, allowing us to directly process images. The data curation section includes technical steps such as image labelling, image annotation (with segmentation as a crucial step in radiomics), data harmonization (enabling compensation for differences in imaging protocols that typically generate noise in non-AI imaging studies) and federated learning. Thereafter, we dedicate specific sections to: sample size calculation, considering multiple testing in AI approaches; procedures for data augmentation to work with limited and unbalanced datasets; and the interpretability of AI models (the so-called black box issue). Pros and cons for choosing ML versus DL to implement AI applications to medical imaging are finally presented in a synoptic way.
Biomedicine and healthcare systems are one of the most important fields for AI applications and medical imaging is probably the most suitable and promising domain. Clarification of specific challenging points facilitates the development of such systems and their translation to clinical practice.
Sound quality prediction and improving of vehicle interior noise based on deep convolutional neural networks
2020, Expert Systems with Applications
Interior sound quality plays a vital role in vehicle quality assessment because it forms users' general impressions of vehicles and influences consumers' purchase intentions. Thus, evaluating vehicle interior sound quality is important. Many researchers have developed intelligent prediction models to precisely evaluate vehicle interior sound quality. Deep convolutional neural networks (CNNs) can automatically learn features and many studies have applied deep CNNs to address noise and vibration issues. However, those studies suffer from two problems: i) the time and frequency characteristics of noise that influence interior sound quality have not been considered simultaneously; ii) the noise features that deep CNNs have learned need to be explored. Therefore, in this paper, to overcome the first problem, we develop a regularized deep CNN model that takes a noise time–frequency image as input. In addition, we introduce a neuron visualization algorithm for deep CNNs to solve the second problem. To verify the proposed methods, we establish an interior noise dataset through vehicular road tests and subjective evaluations. The sound quality of this recorded interior noise is evaluated through the developed deep CNN model, which reveals that deep CNNs that use a noise time–frequency image as input perform better than do those using time vector and frequency vector data as input. By analyzing feature maps extracted from the convolutional layers and the fully connected layer of the CNNs, we found that the deep CNN feature learning process can be regarded as color filter and Gabor filter processes applied to the noise time–frequency image. These results provide a new approach for evaluating vehicle interior sound quality and help in understanding which noise features deep CNNs learn.
Designing a composite deep learning based differential protection scheme of power transformers
2020, Applied Soft Computing Journal
Citation Excerpt :
During recent years, DNN has exhibited remarkable capability in different areas, such as medical diagnosis and image analysis [26–28], sentiment analysis [29,30], fault detection [31–34], nonlinear modeling and parameter estimation [35,36], cybersecurity [37], image processing [38–40], speech–music recognition [41], language modeling [42], time-series forecasting [43–45], and activity recognition [46,47]. In the classification process, DNN-based methods especially convolutional neural networks (CNN) [48–50] (originally introduced by [51]) are able to incorporate spatial and temporal networks to diagnose specific anomaly signals/images/videos [52]. On the other hand, there is a major complexity related to external factors in differential protection, which can dramatically reduce the reliability due to the inherent limitations.
This paper proposes a novel differential protection scheme based on deep neural networks (DNN). The goal is to propose a fast, reliable, and independent protection scheme in distinguishing inrush current from internal fault in power transformers, as the most challenging issue in power transformers protection. Shallow-based techniques require spectral analysis and handcraft feature extraction to be proper methods in this major. However, they require a significant computational cost. In order to address this issue, in this paper, a novel DNN-based approach is proposed based on combining convolutional neural network (CNN) and light-gated recurrent unit (LGRU), namely CLGNN. The results show a more accurate and more reliable performance than three different shallow and three state-of-the-art DNN based techniques. Adaptability and robustness of the proposed scheme are evaluated considering CT saturation, superconducting fault current limiter (SFCL), and series compensation impacts. The obtained results prove the effectiveness and validity of the proposed DNN-based protection scheme in this paper.
Gated recurrent unit based frequency-dependent hysteresis modeling and end-to-end compensation
2020, Mechanical Systems and Signal Processing
Citation Excerpt :
For the constructed BPNN, the input layer consists of three nodes corresponding to input voltage, voltage frequency, and output displacement of a GRU, the output layer includes a node for the final output displacement of the hysteresis model. According to Occam’s razor, the structure of a neural network should be selected as compact as possible while maintaining the accuracy of the model [34]. Meanwhile, it has been theoretically proven that a BPNN with a single hidden layer is capable of approximating any continuous function with desired accuracy as far as there are sufficient hidden neurons [35].
As the kernel part in such precise instruments as an atomic force microscopy, a piezoelectric actuator achieves nano-scale displacement resolution with fast response. However, the inherent hysteresis of a piezoelectric actuator badly limits its position accuracy and further results in image distortion of an atomic force microscopy. Hysteresis occurs with three coupled characteristics, respectively, nonlinearity, memory, and frequency-dependence, thereby increasing the difficulty of hysteresis modeling. Aiming at this problem, this paper sets up a gated recurrent unit based frequency-dependent hysteresis model and then proposes an end-to-end compensation method to correct image distortion. To be specific, a gated recurrent unit layer is designed to accurately describe the nonlinearity and memory of hysteresis, based on which, a modified back propagation neural network is constructed by introducing the frequency of input voltage to simulate the frequency-dependence characteristic, finally yielding a very accurate hysteresis model with strong generalization ability. Based on the constructed model, a novel piecewise Hermitian interpolation method is then proposed to deal with the uncompensated AFM images, obtained in both forward and backward scanning directions, so as to implement end-to-end compensation for hysteresis to generate a high-quality image. Experimental and application results are presented to demonstrate the satisfactory performance of the proposed modeling and compensation methods.

View all citing articles on Scopus

View full text

Designing architectures of convolutional neural networks to solve practical problems

Highlights

Abstract

Introduction

Section snippets

Related work

Convolutional neural network

False nearest neighbors

Adapting the false nearest neighbors method

Experiments

Conclusions

Acknowledgements

Information Processing Letters

A framework for designing the architectures of deep convolutional neural networks

Entropy

Chaos in differential equations

Chaos

Stochastic gradient learning in neural networks

Proceedings of Neuro-Nımes

Multi-column deep neural networks for image classification

Computer vision and pattern recognition (CVPR), 2012 IEEE conference on

Frustratingly easy domain adaptation

Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

Large scale distributed deep networks

Advances in neural information processing systems

A probabilistic theory of pattern recognition

Designing artificial neural networks using particle swarm optimization algorithms

Computational Intelligence and Neuroscience

Maxout networks

ICML

Speech recognition with deep recurrent neural networks

2013 IEEE international conference on acoustics, speech and signal processing

Supervised image classification using deep convolutional wavelets network

Tools with artificial intelligence (ICTAI), 2015 IEEE 27th international conference on

A comprehensive foundation

Neural Networks

A fast learning algorithm for deep belief nets

Neural Computation

Large-scale learning with SVM and convolutional for generic object categorization

Computer vision and pattern recognition, 2006 IEEE computer society conference on

R: a language for data analysis and graphics

Journal of computational and graphical statistics

What is the best multi-stage architecture for object recognition?

Computer vision, 2009 IEEE 12th international conference on

Caffe: Convolutional architecture for fast feature embedding

Proceedings of the ACM international conference on multimedia

Nonlinear time series analysis

Large-scale video classification with convolutional neural networks

Proceedings of the IEEE conference on computer vision and pattern recognition

Determining embedding dimension for phase-space reconstruction using a geometrical construction

Physical review A

Neural networks and multimedia datasets: Estimating the size of neural networks for achieving high classification accuracy

Wseas international conference. Proceedings. Mathematics and computers in science and engineering

A trainable feature extractor for handwritten digit recognition

Pattern Recognition

Convolutional networks for images, speech, and time series

The handbook of brain theory and neural networks

Deep learning

Nature

Gradient-based learning applied to document recognition

Proceedings of the IEEE