Skip to main content
Log in

Urban noise recognition with convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Urban noise recognition play a vital role in city management and safety operation, especially in the recent smart city engineering. Exiting studies on urban noise recognition are mostly based on conventional acoustic features, such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC), and the shallow structure based classifiers, such as support vector machine (SVM). However, the urban acoustic environment is complicated and changeable. Conventional acoustic representation and recognition methods may be insufficient in characterizing urban noises, and generally suffer from a degraded performance. In this paper, we study the recent deep neural network based urban noise recognition. The log-Mel-spectrogram, namely, the FBank feature is first derived for acoustic representation. Then, the FBank spectrum constructed with a set of FBank feature vectors from multiple acoustic signal frames is fed to a convolutional neural network (CNN) for urban noise recognition. Comprehensive studies on the dimension of FBank spectrums and the parameters in CNN, including the size of learnable kernels, the dropout rate, and the activation function, etc., are presented in the paper. An acoustic database collected in real environment covering 11 most common urban noises with more than 56,000 samples is constructed for model verification and performance evaluation. In addition, the traditional LPCC and MFCC acoustic feature combining with two popular machine learning algorithms, extreme learning machine (ELM) and support vector machine (SVM), and the FBank image feature combining with extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM) and multilayer extreme learning machine (ML-ELM), have also been presented for discussions. Experimental results show that the proposed method generally outperforms conventional shallow structure based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abdel-Hamid O, Mohamed AR et al. (2014) Convolutional neural networks for speech recognition. IEEE-ACM Trans Audio Speech Language Process 22(10):1533–1545

    Article  Google Scholar 

  2. Agha A, Ranjan R, Gan WS (2016) Noisy vehicle surveillance camera: A system to deter noisy vehicle in smart city. Appl Acoust 117:236–245

    Article  Google Scholar 

  3. Ahmad K, Thosarz A, Jagannath H (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: IEEE eighth international conference on advances in pattern recognition, pp 1–6

  4. Asensio C (2017) Acoustics in Smart Cities. Appl Acoust 117:191–192

    Article  Google Scholar 

  5. Calixto A, Diniz FB, Zannin PHT (2003) The statistical modeling of road traffic noise in an urban setting. Cities 20(1):23–29

    Article  Google Scholar 

  6. Cao J, Chen T, Fan J (2016) Landmark recognition with compact BoW histogram and ensemble ELM. Multimed Tools Appl 75(5):2839–2857

    Article  Google Scholar 

  7. Cao J, Huang W, Zhao T, Wang J, Wang R (2017) An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature. Multidim Syst Sign Process 28(3):921–943

    Article  Google Scholar 

  8. Cao J, Shang L, Wang J, Vong C, Yin C, Cheng Y, Huang X (2017) A novel distance estimation algorithm for periodic surface vibrations based on frequency band energy percentage feature. Mechanical Systems and Signal Processing. https://doi.org/10.1016/j.ymssp.2017.10.016

  9. Cao J, Wang W, Wang J, Wang R (2017) Excavation equipment recognition based on novel acoustic statistical Features. IEEE Trans Cybern 47(12):4392–4404

    Article  Google Scholar 

  10. Cao J, Zhang K, Luo M, Yin C, Lai X (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102

    Article  Google Scholar 

  11. Cao J, Zhao T, Wang W, Wang J, Wang R (2017) Excavation equipments classification based on improved MFCC features and ELM. Neurocomputing 261:231–241

    Article  Google Scholar 

  12. Cao M, Wang J, Cao J, Zeng H (2017) Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM. In: Proceedings of the 36th Chinese control conference, pp 5400–5404

  13. Chutani S, Goyal A (2017) Improved universal quantitative steganalysis in spatial domain using ELM ensemble. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-4656-3

  14. Davis B, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  15. Deng L, Yu D (2014) Deep learning: Methods and applications. Found Trends Signal Process 7(3-4):197–387

    Article  MathSciNet  MATH  Google Scholar 

  16. Fernández LPS, Fernández XLAS, Hernández JJC et al. (2015) Methods of analysis for urban environmental noise. In: IEEE Sai intelligent systems conference, pp 381–389

  17. Han Y, Kim J, Lee K (2017) Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans Audio Speech Language Process 25(1):208–221

    Article  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of IEEE international conference on computer vision (ICCV), pp 1026–1034

  19. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  20. Huang B, Pan Z, Zhang B (2015) A virtual perception method for urban noise: The calculation of noise annoyance threshold and facial emotion expression in the virtual noise scene. Appl Acoust 99:125–134

    Article  Google Scholar 

  21. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1-3):489–501

    Article  Google Scholar 

  22. Huang Y, Yu D, Liu C, Gong Y (2014) A comparative analytic study on the gaussian mixture and context dependent deep neural network hidden Markov models, Interspeech

  23. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 60(2):1097–1105

    Google Scholar 

  24. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444

    Article  Google Scholar 

  25. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324

  26. Li B, Tao S, Dawson RW (2002) Evalution and analysis of traffic noise from the main urban roads in Beijing. Appl Acoust 63(10):1137–1142

    Article  Google Scholar 

  27. Morillas JMB, Escobar VG, Sierra JAM et al. (2002) An environmental noise study in the city of Cáceres. Spain Appl. Acoust. 63(10):1061–1070

    Article  Google Scholar 

  28. Mydlarz C, Salamon J, Bello JP (2016) The implementation of low-cost urban acoustic monitoring devices. Appl Acoust 117:207–218

    Article  Google Scholar 

  29. Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, 2010, pp 807–814

  30. Nan S, Sun L, Chen B, Lin Z, Toh K-A (2017) Density-dependent quantized least squares support vector machine for large data sets. IEEE Trans Neural Netw Learn Syst 28(1):94–106

    Article  Google Scholar 

  31. Ntalampiras S (2014) Universal background modeling for acoustic surveillance of urban traffic. Digital Signal Process 31:69–78

    Article  Google Scholar 

  32. Piczak KJ (2015) Environmental sound classification with convoltional neural networks. In: IEEE international workshop on machine learning for signal processing, pp 1–6

  33. Qian Y et al. (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Language Process 24(12):2263–2276

    Article  Google Scholar 

  34. Rezazadeh Azar E, McCabe B (2011) Vision-based equipment detection in construction images.. In: The 3rd international/9th construction specialty conference, Ottawa ON, Canada, Accepted

  35. Rezazadeh Azar E, McCabe B (2012) Part based model and spatialtemporal reasoning to recognize hydraulic excavators in construction images and videos. Autom Constr 24(7):194–202

    Article  Google Scholar 

  36. Sainath TN, Kingsbury B, Saon G, Soltau H et al. (2015) Deep convolutional neural networks for large-scale speech tasks. Neural Netw 64:39–48

    Article  Google Scholar 

  37. Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Computer Science, pp 338–342

  38. Salomons EM, Pont MB (2012) Urban traffic noise and the relation to urban desity, form, and traffic elasticity. Landsc Urban Plan 108(1):2–16

    Article  Google Scholar 

  39. Schroeder M (1985) Linear predictive coding of speech: review and current directions. IEEE Commun Mag 23(8):54–61

    Article  Google Scholar 

  40. Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: IEEE international conference on pattern recognition, pp 3288–3291

  41. Souza LCLD, Giunta MB (2011) Urban indices as environmental noise indicators. Comput Environ Urban Syst 35(5):421–430

    Article  Google Scholar 

  42. Srivastava N, Hinton G, Krizhevsky A et al. (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  43. Stoeckle S, Path N, Kumar DK et al. (2001) Environmental sound sources classification using neural networks. In: IEEE intelligent information systems conference, the 7th Australian and New Zealand, pp 399–403

  44. Torija AJ, Ruiz DP (2016) Automated classification of urban locations for environmental noise impact assessment on the basis of road-traffic content. Expert Syst Appl 53:1–13

    Article  Google Scholar 

  45. Tsai KT, Lin MD, Chen YH (2009) Noise mapping in urban environments: A Taiwan study. Appl Acoust 70(7):964–972

    Article  Google Scholar 

  46. Yang S, Cao J, Wang J, Wang R (2016) Linear prediction of one-sided autocorrelation sequence for noisy acoustics recognition of excavation equipment. In: 12th world congress on intelligent control and automation, pp 924–928

  47. Ye J, Kobayashi T, Murakawa M (2016) Urban sound event classification based on local and global features aggregation. Appl Acoust 117:246–256

    Article  Google Scholar 

  48. Zannin PHT, Calixto A, Diniz FB et al. (2003) A survey of urban noise annoyance in a large Brazilian city: the importance of a subjective analysis in conjunction with an objective analysis. Environ Impact Assess Rev 23(2):245–255

    Article  Google Scholar 

  49. Zhang Y, Zhao G, Sun J et al. (2017) Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and Jaya algorithm, Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-5023-0

  50. Zhao J, Zhang X, Chen Y (2012) A novel traffic-noise prediction method for nonstraight roads. Appl Acoust 73(3):276–280

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiuwen Cao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (61503104, U1509205) and Hangzhou Smart City Research Center of Zhejiang/Zhejiang Smart City Regional Collaborative Innovation Center (GK150906299001/019).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, J., Cao, M., Wang, J. et al. Urban noise recognition with convolutional neural network. Multimed Tools Appl 78, 29021–29041 (2019). https://doi.org/10.1007/s11042-018-6295-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6295-8

Keywords

Navigation