Skip to main content
Top
Published in: International Journal of Computer Vision 2/2017

21-09-2016

A Practical and Highly Optimized Convolutional Neural Network for Classifying Traffic Signs in Real-Time

Authors: Hamed Habibi Aghdam, Elnaz Jahani Heravi, Domenec Puig

Published in: International Journal of Computer Vision | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Classifying traffic signs is an indispensable part of Advanced Driver Assistant Systems. This strictly requires that the traffic sign classification model accurately classifies the images and consumes as few CPU cycles as possible to immediately release the CPU for other tasks. In this paper, we first propose a new ConvNet architecture. Then, we propose a new method for creating an optimal ensemble of ConvNets with highest possible accuracy and lowest number of ConvNets. Our experiments show that the ensemble of our proposed ConvNets (the ensemble is also constructed using our method) reduces the number of arithmetic operations 88 and \(73\,\%\) compared with two state-of-art ensemble of ConvNets. In addition, our ensemble is \(0.1\,\%\) more accurate than one of the state-of-art ensembles and it is only \(0.04\,\%\) less accurate than the other state-of-art ensemble when tested on the same dataset. Moreover, ensemble of our compact ConvNets reduces the number of the multiplications 95 and \(88\,\%\), yet, the classification accuracy drops only 0.2 and \(0.4\,\%\) compared with these two ensembles. Besides, we also evaluate the cross-dataset performance of our ConvNet and analyze its transferability power in different layers. We show that our network is easily scalable to new datasets with much more number of traffic sign classes and it only needs to fine-tune the weights starting from the last convolution layer. We also assess our ConvNet through different visualization techniques. Besides, we propose a new method for finding the minimum additive noise which causes the network to incorrectly classify the image by minimum difference compared with the highest score in the loss vector.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
The ConvNet architecture and its trained models are available at https://​github.​com/​pcnn/​traffic-sign-recognition.
 
3
The percent of the samples which are always within the top 2 classification scores.
 
4
We calculated the number of the multiplications of a ConvNet taking into account the number of the multiplications for convolving the filters of each layer with the N-channel input from the previous layer, number of the multiplications required for computing the activations of each layer and the number of the multiplications imposed by normalization layers. We showed in Sect. 3 that tanh function utilized in Ciresan et al. (2012) can be efficiently computed using 10 multiplications. ReLU activation used in Jin et al. (2014) does not need any multiplications and Leaky ReLU units in our ConvNet compute the results using only 1 multiplication. Finally, considering that pow(float, float) function needs only 1 multiplication and 64 shift operations (http://​tinyurl.​com/​yehg932), the normalization layer in Jin et al. (2014) requires \(k\times k+3\) multiplications per each element in the feature map.
 
Literature
go back to reference Aghdam, H. H., Heravi, E. J., & Puig, D. (2015). A unified framework for coarse-to-fine recognition of traffic signs using Bayesian network and visual attributes. In: 10th international conference on computer vision theory and applications (VISAPP) (pp. 87–96). doi:10.5220/0005303500870096 Aghdam, H. H., Heravi, E. J., & Puig, D. (2015). A unified framework for coarse-to-fine recognition of traffic signs using Bayesian network and visual attributes. In: 10th international conference on computer vision theory and applications (VISAPP) (pp. 87–96). doi:10.​5220/​0005303500870096​
go back to reference Baró, X., Escalera, S., Vitrià, J., Pujol, O., & Radeva, P. (2009). Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification. IEEE Transactions on Intelligent Transportation Systems, 10(1), 113–126. doi:10.1109/TITS.2008.2011702.CrossRef Baró, X., Escalera, S., Vitrià, J., Pujol, O., & Radeva, P. (2009). Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification. IEEE Transactions on Intelligent Transportation Systems, 10(1), 113–126. doi:10.​1109/​TITS.​2008.​2011702.CrossRef
go back to reference Coates, A., & Ng, A. Y. (2012). Learning feature representations with K-means. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7700 LECTU:561–580, doi:10.1007/978-3-642-35289-8-30 Coates, A., & Ng, A. Y. (2012). Learning feature representations with K-means. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7700 LECTU:561–580, doi:10.​1007/​978-3-642-35289-8-30
go back to reference Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning (pp. 647–655) arXiv:1310.1531. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning (pp. 647–655) arXiv:​1310.​1531.
go back to reference Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks (pp. 1–15). arXiv preprint arXiv:1506.02753 Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks (pp. 1–15). arXiv preprint arXiv:​1506.​02753
go back to reference Gao, X. W., Podladchikova, L., Shaposhnikov, D., Hong, K., & Shevtsova, N. (2006). Recognition of traffic signs based on their colour and shape features extracted using human vision models. Journal of Visual Communication and Image Representation, 17(4), 675–685. doi:10.1016/j.jvcir.2005.10.003.CrossRef Gao, X. W., Podladchikova, L., Shaposhnikov, D., Hong, K., & Shevtsova, N. (2006). Recognition of traffic signs based on their colour and shape features extracted using human vision models. Journal of Visual Communication and Image Representation, 17(4), 675–685. doi:10.​1016/​j.​jvcir.​2005.​10.​003.CrossRef
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:​1502.​01852
go back to reference Hinton, G. (2014). Dropout : A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15, 1929–1958.MathSciNetMATH Hinton, G. (2014). Dropout : A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15, 1929–1958.MathSciNetMATH
go back to reference Huang, G., Mao, K. Z., Siew, C., Huang, D. (2013). A hierarchical method for traffic sign classification with support vector machines. In: The 2013 international joint conference on neural networks (IJCNN) pp 1–6. IEEE. doi:10.1109/IJCNN.2013.6706803 Huang, G., Mao, K. Z., Siew, C., Huang, D. (2013). A hierarchical method for traffic sign classification with support vector machines. In: The 2013 international joint conference on neural networks (IJCNN) pp 1–6. IEEE. doi:10.​1109/​IJCNN.​2013.​6706803
go back to reference Jin, J., Fu, K., & Zhang, C. (2014). Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems, 15(5), 1991–2000. doi:10.1109/TITS.2014.2308281.CrossRef Jin, J., Fu, K., & Zhang, C. (2014). Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems, 15(5), 1991–2000. doi:10.​1109/​TITS.​2014.​2308281.CrossRef
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. Curran Associates, Inc. Krizhevsky, A., Sutskever, I., Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. Curran Associates, Inc.
go back to reference Larsson, F., & Felsberg, M. (2011). Using Fourier descriptors and spatial models for traffic sign recognition. In: Image analysis lecture notes in computer science (Vol 6688, pp. 238–249). Springer. doi:10.1007/978-3-642-21227-7_23 Larsson, F., & Felsberg, M. (2011). Using Fourier descriptors and spatial models for traffic sign recognition. In: Image analysis lecture notes in computer science (Vol 6688, pp. 238–249). Springer. doi:10.​1007/​978-3-642-21227-7_​23
go back to reference Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML) workshop on deep learning (Vol 30) Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML) workshop on deep learning (Vol 30)
go back to reference Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez, P., Gomez-Moreno, H., & Lopez-Ferreras, F. (2007). Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. doi:10.1109/TITS.2007.895311.CrossRefMATH Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez, P., Gomez-Moreno, H., & Lopez-Ferreras, F. (2007). Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. doi:10.​1109/​TITS.​2007.​895311.CrossRefMATH
go back to reference Maldonado Bascón, S., Acevedo Rodríguez, J., Lafuente Arroyo, S., Fernndez Caballero, A., & López-Ferreras, F. (2010). An optimization on pictogram identification for the road-sign recognition task using SVMs. Computer Vision and Image Understanding, 114(3), 373–383. doi:10.1016/j.cviu.2009.12.002.CrossRef Maldonado Bascón, S., Acevedo Rodríguez, J., Lafuente Arroyo, S., Fernndez Caballero, A., & López-Ferreras, F. (2010). An optimization on pictogram identification for the road-sign recognition task using SVMs. Computer Vision and Image Understanding, 114(3), 373–383. doi:10.​1016/​j.​cviu.​2009.​12.​002.CrossRef
go back to reference Mathias, M., Timofte, R., Benenson, R., & Van Gool, L. (2013). Traffic sign recognition–How far are we from the solution? International joint conference on neural networks,. doi:10.1109/IJCNN.2013.6707049. Mathias, M., Timofte, R., Benenson, R., & Van Gool, L. (2013). Traffic sign recognition–How far are we from the solution? International joint conference on neural networks,. doi:10.​1109/​IJCNN.​2013.​6707049.
go back to reference Møgelmose, A., Trivedi, M. M., & Moeslund, T. B. (2012). Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1484–1497. doi:10.1109/TITS.2012.2209421.CrossRef Møgelmose, A., Trivedi, M. M., & Moeslund, T. B. (2012). Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1484–1497. doi:10.​1109/​TITS.​2012.​2209421.CrossRef
go back to reference Moiseev, B., Konev, A., Chigorin, A., & Konushin, A. (2013). Evaluation of traffic sign recognition methods trained on synthetically generated data. In: 15th international conference on advanced concepts for intelligent vision systems (ACIVS), Springer, Pozna?, pp 576–583, doi:10.1007/978-3-319-02895-8_52 Moiseev, B., Konev, A., Chigorin, A., & Konushin, A. (2013). Evaluation of traffic sign recognition methods trained on synthetically generated data. In: 15th international conference on advanced concepts for intelligent vision systems (ACIVS), Springer, Pozna?, pp 576–583, doi:10.​1007/​978-3-319-02895-8_​52
go back to reference Sermanet, P., & Lecun, Y. (2011). Traffic sign recognition with multi-scale convolutional networks. In Proceedings of the international joint conference on neural networks (pp. 2809–2813). doi:10.1109/IJCNN.2011.6033589 Sermanet, P., & Lecun, Y. (2011). Traffic sign recognition with multi-scale convolutional networks. In Proceedings of the international joint conference on neural networks (pp. 2809–2813). doi:10.​1109/​IJCNN.​2011.​6033589
go back to reference Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y. (2013). OverFeat : Integrated recognition , localization and detection using convolutional networks. In arXiv preprint arXiv:1312.6229, pp. 1–15 Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y. (2013). OverFeat : Integrated recognition , localization and detection using convolutional networks. In arXiv preprint arXiv:​1312.​6229, pp. 1–15
go back to reference Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representation (ICLR) (pp. 1–13), 1409.1556v5. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representation (ICLR) (pp. 1–13), 1409.1556v5.
go back to reference Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv. preprint, 13126034, 1–8. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv. preprint, 13126034, 1–8.
go back to reference Szegedy, C., Reed, S., Sermanet, P., Vanhoucke, V., & Rabinovich, A. (2014a). Going deeper with convolutions. In: arXiv preprint arXiv:1409.4842, pp. 1–12. Szegedy, C., Reed, S., Sermanet, P., Vanhoucke, V., & Rabinovich, A. (2014a). Going deeper with convolutions. In: arXiv preprint arXiv:​1409.​4842, pp. 1–12.
go back to reference Timofte R, Van Gool, L. (2011). Sparse representation based projections. In: 22nd British Machine Vision Conference (pp. 61.1–61.12). BMVA Press. doi:10.5244/C.25.61 Timofte R, Van Gool, L. (2011). Sparse representation based projections. In: 22nd British Machine Vision Conference (pp. 61.1–61.12). BMVA Press. doi:10.​5244/​C.​25.​61
go back to reference Timofte, R., Zimmermann, K., & Van Gool, L. (2011). Multi-view traffic sign detection, recognition, and 3D localisation. Machine Vision and Applications (November):1–15. doi:10.1007/s00138-011-0391-3. Timofte, R., Zimmermann, K., & Van Gool, L. (2011). Multi-view traffic sign detection, recognition, and 3D localisation. Machine Vision and Applications (November):1–15. doi:10.​1007/​s00138-011-0391-3.
go back to reference Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE computer vision and pattern recognition (CVPR) (pp. 3360–3367). doi:10.1109/CVPR.2010.5540018. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE computer vision and pattern recognition (CVPR) (pp. 3360–3367). doi:10.​1109/​CVPR.​2010.​5540018.
go back to reference Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks ? Neural Information Processing System (NIPS), 27. arXiv:1411.1792v1. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks ? Neural Information Processing System (NIPS), 27. arXiv:​1411.​1792v1.
go back to reference Yuan, X., Hao, X., Chen, H., & Wei, X. (2014). Robust traffic sign recognition based on color global and local oriented edge magnitude patterns. IEEE Transactions on Intelligent Transportation Systems, 15(4), 1466–1474. doi:10.1109/TITS.2014.2298912.CrossRef Yuan, X., Hao, X., Chen, H., & Wei, X. (2014). Robust traffic sign recognition based on color global and local oriented edge magnitude patterns. IEEE Transactions on Intelligent Transportation Systems, 15(4), 1466–1474. doi:10.​1109/​TITS.​2014.​2298912.CrossRef
go back to reference Zaklouta, F., Stanciulescu, B., & Hamdoun, O. (2011). Traffic sign classification using K-d trees and random forests. In Proceedings of the international joint conference on neural networks (pp. 2151–2155). doi:10.1109/IJCNN.2011.6033494. Zaklouta, F., Stanciulescu, B., & Hamdoun, O. (2011). Traffic sign classification using K-d trees and random forests. In Proceedings of the international joint conference on neural networks (pp. 2151–2155). doi:10.​1109/​IJCNN.​2011.​6033494.
go back to reference Zeng, Y., Xu, X., Fang, Y., & Zhao, K. (2015). Traffic sign recognition using deep convolutional networks and extreme learning machine. In Intelligence science and big data engineering. image and video data engineering (IScIDE ) (pp. 272–280). Springer. doi:10.1007/978-3-319-23989-7_28. Zeng, Y., Xu, X., Fang, Y., & Zhao, K. (2015). Traffic sign recognition using deep convolutional networks and extreme learning machine. In Intelligence science and big data engineering. image and video data engineering (IScIDE ) (pp. 272–280). Springer. doi:10.​1007/​978-3-319-23989-7_​28.
Metadata
Title
A Practical and Highly Optimized Convolutional Neural Network for Classifying Traffic Signs in Real-Time
Authors
Hamed Habibi Aghdam
Elnaz Jahani Heravi
Domenec Puig
Publication date
21-09-2016
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 2/2017
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-016-0955-9

Other articles of this Issue 2/2017

International Journal of Computer Vision 2/2017 Go to the issue

Premium Partner