Abstract
Urban noise recognition play a vital role in city management and safety operation, especially in the recent smart city engineering. Exiting studies on urban noise recognition are mostly based on conventional acoustic features, such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC), and the shallow structure based classifiers, such as support vector machine (SVM). However, the urban acoustic environment is complicated and changeable. Conventional acoustic representation and recognition methods may be insufficient in characterizing urban noises, and generally suffer from a degraded performance. In this paper, we study the recent deep neural network based urban noise recognition. The log-Mel-spectrogram, namely, the FBank feature is first derived for acoustic representation. Then, the FBank spectrum constructed with a set of FBank feature vectors from multiple acoustic signal frames is fed to a convolutional neural network (CNN) for urban noise recognition. Comprehensive studies on the dimension of FBank spectrums and the parameters in CNN, including the size of learnable kernels, the dropout rate, and the activation function, etc., are presented in the paper. An acoustic database collected in real environment covering 11 most common urban noises with more than 56,000 samples is constructed for model verification and performance evaluation. In addition, the traditional LPCC and MFCC acoustic feature combining with two popular machine learning algorithms, extreme learning machine (ELM) and support vector machine (SVM), and the FBank image feature combining with extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM) and multilayer extreme learning machine (ML-ELM), have also been presented for discussions. Experimental results show that the proposed method generally outperforms conventional shallow structure based classifiers.
Similar content being viewed by others
References
Abdel-Hamid O, Mohamed AR et al. (2014) Convolutional neural networks for speech recognition. IEEE-ACM Trans Audio Speech Language Process 22(10):1533–1545
Agha A, Ranjan R, Gan WS (2016) Noisy vehicle surveillance camera: A system to deter noisy vehicle in smart city. Appl Acoust 117:236–245
Ahmad K, Thosarz A, Jagannath H (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: IEEE eighth international conference on advances in pattern recognition, pp 1–6
Asensio C (2017) Acoustics in Smart Cities. Appl Acoust 117:191–192
Calixto A, Diniz FB, Zannin PHT (2003) The statistical modeling of road traffic noise in an urban setting. Cities 20(1):23–29
Cao J, Chen T, Fan J (2016) Landmark recognition with compact BoW histogram and ensemble ELM. Multimed Tools Appl 75(5):2839–2857
Cao J, Huang W, Zhao T, Wang J, Wang R (2017) An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature. Multidim Syst Sign Process 28(3):921–943
Cao J, Shang L, Wang J, Vong C, Yin C, Cheng Y, Huang X (2017) A novel distance estimation algorithm for periodic surface vibrations based on frequency band energy percentage feature. Mechanical Systems and Signal Processing. https://doi.org/10.1016/j.ymssp.2017.10.016
Cao J, Wang W, Wang J, Wang R (2017) Excavation equipment recognition based on novel acoustic statistical Features. IEEE Trans Cybern 47(12):4392–4404
Cao J, Zhang K, Luo M, Yin C, Lai X (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102
Cao J, Zhao T, Wang W, Wang J, Wang R (2017) Excavation equipments classification based on improved MFCC features and ELM. Neurocomputing 261:231–241
Cao M, Wang J, Cao J, Zeng H (2017) Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM. In: Proceedings of the 36th Chinese control conference, pp 5400–5404
Chutani S, Goyal A (2017) Improved universal quantitative steganalysis in spatial domain using ELM ensemble. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-4656-3
Davis B, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Deng L, Yu D (2014) Deep learning: Methods and applications. Found Trends Signal Process 7(3-4):197–387
Fernández LPS, Fernández XLAS, Hernández JJC et al. (2015) Methods of analysis for urban environmental noise. In: IEEE Sai intelligent systems conference, pp 381–389
Han Y, Kim J, Lee K (2017) Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans Audio Speech Language Process 25(1):208–221
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of IEEE international conference on computer vision (ICCV), pp 1026–1034
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Huang B, Pan Z, Zhang B (2015) A virtual perception method for urban noise: The calculation of noise annoyance threshold and facial emotion expression in the virtual noise scene. Appl Acoust 99:125–134
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1-3):489–501
Huang Y, Yu D, Liu C, Gong Y (2014) A comparative analytic study on the gaussian mixture and context dependent deep neural network hidden Markov models, Interspeech
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 60(2):1097–1105
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li B, Tao S, Dawson RW (2002) Evalution and analysis of traffic noise from the main urban roads in Beijing. Appl Acoust 63(10):1137–1142
Morillas JMB, Escobar VG, Sierra JAM et al. (2002) An environmental noise study in the city of Cáceres. Spain Appl. Acoust. 63(10):1061–1070
Mydlarz C, Salamon J, Bello JP (2016) The implementation of low-cost urban acoustic monitoring devices. Appl Acoust 117:207–218
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, 2010, pp 807–814
Nan S, Sun L, Chen B, Lin Z, Toh K-A (2017) Density-dependent quantized least squares support vector machine for large data sets. IEEE Trans Neural Netw Learn Syst 28(1):94–106
Ntalampiras S (2014) Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process 31:69–78
Piczak KJ (2015) Environmental sound classification with convoltional neural networks. In: IEEE international workshop on machine learning for signal processing, pp 1–6
Qian Y et al. (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Language Process 24(12):2263–2276
Rezazadeh Azar E, McCabe B (2011) Vision-based equipment detection in construction images.. In: The 3rd international/9th construction specialty conference, Ottawa ON, Canada, Accepted
Rezazadeh Azar E, McCabe B (2012) Part based model and spatialtemporal reasoning to recognize hydraulic excavators in construction images and videos. Autom Constr 24(7):194–202
Sainath TN, Kingsbury B, Saon G, Soltau H et al. (2015) Deep convolutional neural networks for large-scale speech tasks. Neural Netw 64:39–48
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Computer Science, pp 338–342
Salomons EM, Pont MB (2012) Urban traffic noise and the relation to urban desity, form, and traffic elasticity. Landsc Urban Plan 108(1):2–16
Schroeder M (1985) Linear predictive coding of speech: review and current directions. IEEE Commun Mag 23(8):54–61
Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: IEEE international conference on pattern recognition, pp 3288–3291
Souza LCLD, Giunta MB (2011) Urban indices as environmental noise indicators. Comput Environ Urban Syst 35(5):421–430
Srivastava N, Hinton G, Krizhevsky A et al. (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stoeckle S, Path N, Kumar DK et al. (2001) Environmental sound sources classification using neural networks. In: IEEE intelligent information systems conference, the 7th Australian and New Zealand, pp 399–403
Torija AJ, Ruiz DP (2016) Automated classification of urban locations for environmental noise impact assessment on the basis of road-traffic content. Expert Syst Appl 53:1–13
Tsai KT, Lin MD, Chen YH (2009) Noise mapping in urban environments: A Taiwan study. Appl Acoust 70(7):964–972
Yang S, Cao J, Wang J, Wang R (2016) Linear prediction of one-sided autocorrelation sequence for noisy acoustics recognition of excavation equipment. In: 12th world congress on intelligent control and automation, pp 924–928
Ye J, Kobayashi T, Murakawa M (2016) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256
Zannin PHT, Calixto A, Diniz FB et al. (2003) A survey of urban noise annoyance in a large Brazilian city: the importance of a subjective analysis in conjunction with an objective analysis. Environ Impact Assess Rev 23(2):245–255
Zhang Y, Zhao G, Sun J et al. (2017) Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and Jaya algorithm, Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-5023-0
Zhao J, Zhang X, Chen Y (2012) A novel traffic-noise prediction method for nonstraight roads. Appl Acoust 73(3):276–280
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (61503104, U1509205) and Hangzhou Smart City Research Center of Zhejiang/Zhejiang Smart City Regional Collaborative Innovation Center (GK150906299001/019).
Rights and permissions
About this article
Cite this article
Cao, J., Cao, M., Wang, J. et al. Urban noise recognition with convolutional neural network. Multimed Tools Appl 78, 29021–29041 (2019). https://doi.org/10.1007/s11042-018-6295-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6295-8