Urban noise recognition with convolutional neural network

Cao, Jiuwen; Cao, Min; Wang, Jianzhong; Yin, Chun; Wang, Danping; Vidal, Pierre-Paul

doi:10.1007/s11042-018-6295-8

Urban noise recognition with convolutional neural network

Published: 05 July 2018

Volume 78, pages 29021–29041, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jiuwen Cao¹,
Min Cao¹,
Jianzhong Wang¹,
Chun Yin²,
Danping Wang^1,3 &
…
Pierre-Paul Vidal^1,4

1318 Accesses
39 Citations
Explore all metrics

Abstract

Urban noise recognition play a vital role in city management and safety operation, especially in the recent smart city engineering. Exiting studies on urban noise recognition are mostly based on conventional acoustic features, such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC), and the shallow structure based classifiers, such as support vector machine (SVM). However, the urban acoustic environment is complicated and changeable. Conventional acoustic representation and recognition methods may be insufficient in characterizing urban noises, and generally suffer from a degraded performance. In this paper, we study the recent deep neural network based urban noise recognition. The log-Mel-spectrogram, namely, the FBank feature is first derived for acoustic representation. Then, the FBank spectrum constructed with a set of FBank feature vectors from multiple acoustic signal frames is fed to a convolutional neural network (CNN) for urban noise recognition. Comprehensive studies on the dimension of FBank spectrums and the parameters in CNN, including the size of learnable kernels, the dropout rate, and the activation function, etc., are presented in the paper. An acoustic database collected in real environment covering 11 most common urban noises with more than 56,000 samples is constructed for model verification and performance evaluation. In addition, the traditional LPCC and MFCC acoustic feature combining with two popular machine learning algorithms, extreme learning machine (ELM) and support vector machine (SVM), and the FBank image feature combining with extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM) and multilayer extreme learning machine (ML-ELM), have also been presented for discussions. Experimental results show that the proposed method generally outperforms conventional shallow structure based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Article 26 May 2021

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Article 01 July 2021

Urban Sound Classification Using Machine Learning and Neural Networks

References

Abdel-Hamid O, Mohamed AR et al. (2014) Convolutional neural networks for speech recognition. IEEE-ACM Trans Audio Speech Language Process 22(10):1533–1545
Article Google Scholar
Agha A, Ranjan R, Gan WS (2016) Noisy vehicle surveillance camera: A system to deter noisy vehicle in smart city. Appl Acoust 117:236–245
Article Google Scholar
Ahmad K, Thosarz A, Jagannath H (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: IEEE eighth international conference on advances in pattern recognition, pp 1–6
Asensio C (2017) Acoustics in Smart Cities. Appl Acoust 117:191–192
Article Google Scholar
Calixto A, Diniz FB, Zannin PHT (2003) The statistical modeling of road traffic noise in an urban setting. Cities 20(1):23–29
Article Google Scholar
Cao J, Chen T, Fan J (2016) Landmark recognition with compact BoW histogram and ensemble ELM. Multimed Tools Appl 75(5):2839–2857
Article Google Scholar
Cao J, Huang W, Zhao T, Wang J, Wang R (2017) An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature. Multidim Syst Sign Process 28(3):921–943
Article Google Scholar
Cao J, Shang L, Wang J, Vong C, Yin C, Cheng Y, Huang X (2017) A novel distance estimation algorithm for periodic surface vibrations based on frequency band energy percentage feature. Mechanical Systems and Signal Processing. https://doi.org/10.1016/j.ymssp.2017.10.016
Cao J, Wang W, Wang J, Wang R (2017) Excavation equipment recognition based on novel acoustic statistical Features. IEEE Trans Cybern 47(12):4392–4404
Article Google Scholar
Cao J, Zhang K, Luo M, Yin C, Lai X (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102
Article Google Scholar
Cao J, Zhao T, Wang W, Wang J, Wang R (2017) Excavation equipments classification based on improved MFCC features and ELM. Neurocomputing 261:231–241
Article Google Scholar
Cao M, Wang J, Cao J, Zeng H (2017) Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM. In: Proceedings of the 36th Chinese control conference, pp 5400–5404
Chutani S, Goyal A (2017) Improved universal quantitative steganalysis in spatial domain using ELM ensemble. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-4656-3
Davis B, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Deng L, Yu D (2014) Deep learning: Methods and applications. Found Trends Signal Process 7(3-4):197–387
Article MathSciNet MATH Google Scholar
Fernández LPS, Fernández XLAS, Hernández JJC et al. (2015) Methods of analysis for urban environmental noise. In: IEEE Sai intelligent systems conference, pp 381–389
Han Y, Kim J, Lee K (2017) Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans Audio Speech Language Process 25(1):208–221
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of IEEE international conference on computer vision (ICCV), pp 1026–1034
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Huang B, Pan Z, Zhang B (2015) A virtual perception method for urban noise: The calculation of noise annoyance threshold and facial emotion expression in the virtual noise scene. Appl Acoust 99:125–134
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1-3):489–501
Article Google Scholar
Huang Y, Yu D, Liu C, Gong Y (2014) A comparative analytic study on the gaussian mixture and context dependent deep neural network hidden Markov models, Interspeech
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 60(2):1097–1105
Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li B, Tao S, Dawson RW (2002) Evalution and analysis of traffic noise from the main urban roads in Beijing. Appl Acoust 63(10):1137–1142
Article Google Scholar
Morillas JMB, Escobar VG, Sierra JAM et al. (2002) An environmental noise study in the city of Cáceres. Spain Appl. Acoust. 63(10):1061–1070
Article Google Scholar
Mydlarz C, Salamon J, Bello JP (2016) The implementation of low-cost urban acoustic monitoring devices. Appl Acoust 117:207–218
Article Google Scholar
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, 2010, pp 807–814
Nan S, Sun L, Chen B, Lin Z, Toh K-A (2017) Density-dependent quantized least squares support vector machine for large data sets. IEEE Trans Neural Netw Learn Syst 28(1):94–106
Article Google Scholar
Ntalampiras S (2014) Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process 31:69–78
Article Google Scholar
Piczak KJ (2015) Environmental sound classification with convoltional neural networks. In: IEEE international workshop on machine learning for signal processing, pp 1–6
Qian Y et al. (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Language Process 24(12):2263–2276
Article Google Scholar
Rezazadeh Azar E, McCabe B (2011) Vision-based equipment detection in construction images.. In: The 3rd international/9th construction specialty conference, Ottawa ON, Canada, Accepted
Rezazadeh Azar E, McCabe B (2012) Part based model and spatialtemporal reasoning to recognize hydraulic excavators in construction images and videos. Autom Constr 24(7):194–202
Article Google Scholar
Sainath TN, Kingsbury B, Saon G, Soltau H et al. (2015) Deep convolutional neural networks for large-scale speech tasks. Neural Netw 64:39–48
Article Google Scholar
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Computer Science, pp 338–342
Salomons EM, Pont MB (2012) Urban traffic noise and the relation to urban desity, form, and traffic elasticity. Landsc Urban Plan 108(1):2–16
Article Google Scholar
Schroeder M (1985) Linear predictive coding of speech: review and current directions. IEEE Commun Mag 23(8):54–61
Article Google Scholar
Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: IEEE international conference on pattern recognition, pp 3288–3291
Souza LCLD, Giunta MB (2011) Urban indices as environmental noise indicators. Comput Environ Urban Syst 35(5):421–430
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A et al. (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Stoeckle S, Path N, Kumar DK et al. (2001) Environmental sound sources classification using neural networks. In: IEEE intelligent information systems conference, the 7th Australian and New Zealand, pp 399–403
Torija AJ, Ruiz DP (2016) Automated classification of urban locations for environmental noise impact assessment on the basis of road-traffic content. Expert Syst Appl 53:1–13
Article Google Scholar
Tsai KT, Lin MD, Chen YH (2009) Noise mapping in urban environments: A Taiwan study. Appl Acoust 70(7):964–972
Article Google Scholar
Yang S, Cao J, Wang J, Wang R (2016) Linear prediction of one-sided autocorrelation sequence for noisy acoustics recognition of excavation equipment. In: 12th world congress on intelligent control and automation, pp 924–928
Ye J, Kobayashi T, Murakawa M (2016) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256
Article Google Scholar
Zannin PHT, Calixto A, Diniz FB et al. (2003) A survey of urban noise annoyance in a large Brazilian city: the importance of a subjective analysis in conjunction with an objective analysis. Environ Impact Assess Rev 23(2):245–255
Article Google Scholar
Zhang Y, Zhao G, Sun J et al. (2017) Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and Jaya algorithm, Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-5023-0
Zhao J, Zhang X, Chen Y (2012) A novel traffic-noise prediction method for nonstraight roads. Appl Acoust 73(3):276–280
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Hangzhou Dianzi University, Zhejiang, 310018, China
Jiuwen Cao, Min Cao, Jianzhong Wang, Danping Wang & Pierre-Paul Vidal
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, People’s Republic of China
Chun Yin
Plateforme Sensorimotricité, Université Paris Descartes, 75270, Paris, France
Danping Wang
COGNAC-G (COGNition and ACtion Group), Université Paris Descartes, 75270, Paris, France
Pierre-Paul Vidal

Authors

Jiuwen Cao
View author publications
You can also search for this author in PubMed Google Scholar
Min Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Danping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Paul Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiuwen Cao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (61503104, U1509205) and Hangzhou Smart City Research Center of Zhejiang/Zhejiang Smart City Regional Collaborative Innovation Center (GK150906299001/019).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, J., Cao, M., Wang, J. et al. Urban noise recognition with convolutional neural network. Multimed Tools Appl 78, 29021–29041 (2019). https://doi.org/10.1007/s11042-018-6295-8

Download citation

Received: 25 February 2018
Revised: 02 June 2018
Accepted: 21 June 2018
Published: 05 July 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-6295-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Urban noise recognition with convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Urban Sound Classification Using Machine Learning and Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Urban noise recognition with convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Urban Sound Classification Using Machine Learning and Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation