Skip to main content

2017 | Buch

Guide to Convolutional Neural Networks

A Practical Application to Traffic-Sign Detection and Classification

insite
SUCHEN

Über dieses Buch

This must-read text/reference introduces the fundamental concepts of convolutional neural networks (ConvNets), offering practical guidance on using libraries to implement ConvNets in applications of traffic sign detection and classification. The work presents techniques for optimizing the computational efficiency of ConvNets, as well as visualization techniques to better understand the underlying processes. The proposed models are also thoroughly evaluated from different perspectives, using exploratory and quantitative analysis.

Topics and features: explains the fundamental concepts behind training linear classifiers and feature learning; discusses the wide range of loss functions for training binary and multi-class classifiers; illustrates how to derive ConvNets from fully connected neural networks, and reviews different techniques for evaluating neural networks; presents a practical library for implementing ConvNets, explaining how to use a Python interface for the library to create and assess neural networks; describes two real-world examples of the detection and classification of traffic signs using deep learning methods; examines a range of varied techniques for visualizing neural networks, using a Python interface; provides self-study exercises at the end of each chapter, in addition to a helpful glossary, with relevant Python scripts supplied at an associated website.

This self-contained guide will benefit those who seek to both understand the theory behind deep learning, and to gain hands-on experience in implementing ConvNets in practice. As no prior background knowledge in the field is required to follow the material, the book is ideal for all students of computer vision and machine learning, and will also be of great interest to practitioners working on autonomous cars and advanced driver assistance systems.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Traffic Sign Detection and Recognition
Abstract
In this chapter, we formulated the problem of traffic sign recognition in two stages namely detection and classification. The detection stage is responsible for locating regions of image containing traffic signs and the classification stage is responsible for finding class of traffic signs. Related work in the field of traffic sign detection and classification is also reviewed. We mentioned several methods based on hand-crafted features and then introduced the idea behind feature learning. Then, we explained some of the works based on convolutional neural networks.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 2. Pattern Classification
Abstract
In this chapter, we first explained what are classification problems and what is a decision boundary. Then, we showed how to model a decision boundary using linear models. In order to better understand the intuition behind a linear model, they were also studied from geometrical perspective. A linear model needs to be trained on a training dataset. To this end, there must be a way to assess how good is a linear model in classification of training samples. For this purpose, we thoroughly explained different loss functions including 0/1 loss, squared loss, hinge loss and logistic loss. Then, methods for extending binary models to multiclass models including one-versus-one and one-versus-rest were reviewed. It is possible to generalize a binary linear model directly into a multiclass model. This requires loss functions that can be applied on multiclass dataset. We showed how to extend hinge loss and logistic loss into multiclass datasets. The big issue with linear models is that that they perform poorly on datasets in which classes are not linearly separable. To overcome this problem, we introduced the idea of feature transformation function and applied it on a toy example. Designing a feature transformation function by hand could be a tedious task especially, when they have to be applied on high-dimensional datasets. A better solution is to learn a feature transformation function directly from training data and training a linear classifier on top of it. We developed the idea of feature transformation from simple functions to compositional functions and explained how neural networks can be used for simultaneously learning a feature transformation function together with a linear classifier. Training a complex model such as neural network requires computing gradient of loss function with respect to every parameter in the model. Computing gradients using conventional chain rule might not be tractable. We explained how to factorize a multivariate chain rule and reduce the number of arithmetic operations. Using this formulation, we explained the backpropagation algorithm for computing gradients on any computational graph. Next, we explained different activation functions that can be used in designing neural networks. We mentioned why ReLU activations are preferable over traditional activations such as hyperbolic tangent. Role of bias in neural networks is also discussed in detail. Finally, we finished the chapter by mentioning how an image can be used as the input of a neural network.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 3. Convolutional Neural Networks
Abstract
Understanding the underlying process in a convolutional neural networks is crucial for developing reliable architectures. In this chapter, we explained how convolution operations are derived from fully connected layers. For this purpose, weight sharing mechanism of convolutional neural networks was discussed. Next basic building block in convolutional neural network is pooling layer. We saw that pooling layers are intelligent ways to reduce dimensionality of feature maps. To this end, a max pooling, average pooling, or a mixed pooling is applied on feature maps with a stride bigger than one. In order to explain how to design a neural network, two classical network architectures were illustrated and explained. Then, we formulated the problem of designing network in three stages namely idea, implementation, and evaluation. All these stages were discussed in detail. Specifically, we reviewed some of the libraries that are commonly used for training deep networks. In addition, common metrics (i.e., classification accuracy, confusion matrix, precision, recall, and F1 score) for evaluating classification models were mentioned together with their advantages and disadvantages. Two important steps in training a neural network successfully are initializing its weights and regularizing the network. Three commonly used methods for initializing weights were introduced. Among them, Xavier initialization and its successors were discussed thoroughly. Moreover, regularization techniques such as \(L_1\), \(L_2\), max-norm, and dropout were discussed. Finally, we finished this chapter by explaining more advanced layers that are used in designing neural networks.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 4. Caffe Library
Abstract
There are various powerful libraries such as Theano, Lasagne, Keras, mxnet, Torch, and TensorFlow that can be used for designing and training neural networks including convolutional neural networks. Among them, Caffe is a library that can be used for both doing research and developing real-world applications. In this chapter, we explained how to design and train neural networks using the Caffe library. Moreover, the Python interface of Caffe was discussed using real examples. Then, we mentioned how to develop new layers in Python and use them in neural networks.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 5. Classification of Traffic Signs
Abstract
This chapter started with reviewing related work in the field of traffic sign classification. Then, it explained the necessity of splitting data and some of methods for splitting data into training, validation, and test sets. A network should be constantly assessed during training in order to diagnose it if it is necessary. For this reason, we showed how to train a network using Python interface of Caffe and evaluate it constantly using training-validation curve. We also explained different scenarios that may happen during training together with their causes and remedies. Then, some of the successful architectures that are proposed in literature for classification of traffic signs were introduced. We implemented and trained these architectures and analyzed their training-validation plots. Creating ensemble is a method to increase classification accuracy. We mentioned various methods that can be used for creating ensemble of models. Then, a method based on optimal subset selection using genetic algorithms were discussed. This way, we create ensembles with minimum number of models that together they increase the classification accuracy. After that, we showed how to interpret and analyze quantitative results such as precision, recall, and accuracy on a real dataset of traffic signs. We also explained how to understand behavior of convolutional neural networks using data-driven visualization techniques and nonlinear embedding methods such as t-SNE. Finally, we finished the chapter by implementing a more accurate and computationally efficient network that is proposed in literature. The performance of this network was also analyzed using various metrics and from different perspective.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 6. Detecting Traffic Signs
Abstract
Object detection is one of the hard problems in computer vision. It gets even harder in time demanding tasks such as ADAS. In this chapter, we explained a convolutional neural network that is able to analyze high-resolution images in real time and it accurately finds traffic signs. We showed how to quantitatively analyze the networks and visualize it using an embedding approach.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Chapter 7. Visualizing Neural Networks
Abstract
Understanding behavior of neural networks is necessary in order to better analyze and diagnose them. Quantitative metrics such as classification accuracy and F1 score just give us numbers indicating how good is the classifier in our problem. They do not tell us how a neural network achieves this result. Visualization is a set of techniques that are commonly used for understanding structure of high-dimensional vectors. In this chapter, we briefly reviewed data-driven techniques for visualization and showed that how to apply them on neural networks. Then, we focused on techniques that visualize neural networks by minimizing an objective function. Among them, we explained three different methods. In the first method, we defined a loss function and found an image that maximizes the classification score of a particular class. In order to generate more interpretable images, the objective function was regularized using \(L_2\) norm of the image. In the second method, gradient of a particular neuron was computed with respect to the input image and it is illustrated by computing its magnitude. The third method formulated the visualizing problem as an image reconstruction problem. To be more specific, we explained a method that tries to find an image in which the representation of this image is very close to the representation of the original image. This technique usually tells us what information is usually discarded by a particular layer.
Hamed Habibi Aghdam, Elnaz Jahani Heravi
Backmatter
Metadaten
Titel
Guide to Convolutional Neural Networks
verfasst von
Hamed Habibi Aghdam
Elnaz Jahani Heravi
Copyright-Jahr
2017
Electronic ISBN
978-3-319-57550-6
Print ISBN
978-3-319-57549-0
DOI
https://doi.org/10.1007/978-3-319-57550-6