Skip to main content
Erschienen in: Neural Computing and Applications 17/2021

11.01.2021 | Original Article

DENet: a deep architecture for audio surveillance applications

verfasst von: Antonio Greco, Antonio Roberto, Alessia Saggese, Mario Vento

Erschienen in: Neural Computing and Applications | Ausgabe 17/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last years, a big interest of both the scientific community and the market has been devoted to the design of audio surveillance systems, able to analyse the audio stream and to identify events of interest; this is particularly true in security applications, in which the audio analytics can be profitably used as an alternative to video analytics systems, but also combined with them. Within this context, in this paper we propose a novel recurrent convolutional neural network architecture, named DENet; it is based on a new layer that we call denoising-enhancement (DE) layer, which performs denoising and enhancement of the original signal by applying an attention map on the components of the band-filtered signal. Differently from state-of-the-art methodologies, DENet takes as input the lossless raw waveform and is able to automatically learn the evolution of the frequencies-of-interest over time, by combining the proposed layer with a bidirectional gated recurrent unit. Using the feedbacks coming from classifications related to consecutive frames (i.e. that belong to the same event), the proposed method is able to drastically reduce the misclassifications. We carried out experiments on the MIVIA Audio Events and MIVIA Road Events public datasets, confirming the effectiveness of our approach with respect to other state-of-the-art methodologies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Auger F, Flandrin P (1995) Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans Signal Process 43(5):1068–1089CrossRef Auger F, Flandrin P (1995) Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans Signal Process 43(5):1068–1089CrossRef
4.
Zurück zum Zitat Aytar Y, Vondrick C, Torralba A (2016) Soundnet: learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp 892–900 Aytar Y, Vondrick C, Torralba A (2016) Soundnet: learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp 892–900
5.
6.
Zurück zum Zitat Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv CSUR 48(4):1–46CrossRef Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv CSUR 48(4):1–46CrossRef
9.
10.
Zurück zum Zitat Foggia P, Saggese A, Strisciuglio N, Vento M, Vigilante V (2019) Detecting sounds of interest in roads with deep networks. In: Ricci E, Rota Bulò S, Snoek C, Lanz O, Messelodi S, Sebe N (eds) Image analysis and processing—ICIAP 2019, pp 583–592. Springer International Publishing, Cham Foggia P, Saggese A, Strisciuglio N, Vento M, Vigilante V (2019) Detecting sounds of interest in roads with deep networks. In: Ricci E, Rota Bulò S, Snoek C, Lanz O, Messelodi S, Sebe N (eds) Image analysis and processing—ICIAP 2019, pp 583–592. Springer International Publishing, Cham
11.
Zurück zum Zitat Furui S (1986) Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11, pp 1991–1994. IEEE Furui S (1986) Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11, pp 1991–1994. IEEE
14.
Zurück zum Zitat Hershey S, Chaudhuri S, Ellis DPW, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss RJ, Wilson K (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135 Hershey S, Chaudhuri S, Ellis DPW, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss RJ, Wilson K (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135
15.
Zurück zum Zitat Kim T, Lee J, Nam J (2019) Comparison and analysis of sample CNN architectures for audio classification. IEEE J Sel Top Signal Process 13(2):285–297CrossRef Kim T, Lee J, Nam J (2019) Comparison and analysis of sample CNN architectures for audio classification. IEEE J Sel Top Signal Process 13(2):285–297CrossRef
17.
Zurück zum Zitat Leng YR, Tran HD, Kitaoka N, Li H (2010) Selective gammatone filterbank feature for robust sound event recognition. In: Eleventh annual conference of the international speech communication association Leng YR, Tran HD, Kitaoka N, Li H (2010) Selective gammatone filterbank feature for robust sound event recognition. In: Eleventh annual conference of the international speech communication association
19.
Zurück zum Zitat Mathur A, Isopoussu A, Kawsar F, Berthouze N, Lane ND (2019) Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In: Proceedings of the 18th international conference on information processing in sensor networks, pp 169–180 Mathur A, Isopoussu A, Kawsar F, Berthouze N, Lane ND (2019) Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In: Proceedings of the 18th international conference on information processing in sensor networks, pp 169–180
20.
Zurück zum Zitat Nooralahiyan AY, Lopez L, Mckewon D, Ahmadi M (1997) Time-delay neural network for audio monitoring of road traffic and vehicle classification. In: Transportation sensors and controls: collision avoidance, traffic management, and ITS, vol 2902, pp 193–200. International Society for Optics and Photonics. https://doi.org/10.1117/12.267145 Nooralahiyan AY, Lopez L, Mckewon D, Ahmadi M (1997) Time-delay neural network for audio monitoring of road traffic and vehicle classification. In: Transportation sensors and controls: collision avoidance, traffic management, and ITS, vol 2902, pp 193–200. International Society for Optics and Photonics. https://​doi.​org/​10.​1117/​12.​267145
21.
Zurück zum Zitat Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219CrossRef Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219CrossRef
24.
27.
Zurück zum Zitat Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI Global Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI Global
28.
Zurück zum Zitat Valera M, Velastin SA (2005) Intelligent distributed surveillance systems: a review. IEE Proc Vis Image Signal Process 152(2):192–204CrossRef Valera M, Velastin SA (2005) Intelligent distributed surveillance systems: a review. IEE Proc Vis Image Signal Process 152(2):192–204CrossRef
29.
Zurück zum Zitat Wan T, Zhou Y, Ma Y, Liu H (2019) Noise robust sound event detection using deep learning and audio enhancement. In: 2019 IEEE international symposium on signal processing and information technology (ISSPIT), pp 1–5. IEEE Wan T, Zhou Y, Ma Y, Liu H (2019) Noise robust sound event detection using deep learning and audio enhancement. In: 2019 IEEE international symposium on signal processing and information technology (ISSPIT), pp 1–5. IEEE
30.
Zurück zum Zitat Wei P, He F, Li L, Li J (2020) Research on sound classification based on SVM. Neural Comput Appl 32(6):1593–1607CrossRef Wei P, He F, Li L, Li J (2020) Research on sound classification based on SVM. Neural Comput Appl 32(6):1593–1607CrossRef
Metadaten
Titel
DENet: a deep architecture for audio surveillance applications
verfasst von
Antonio Greco
Antonio Roberto
Alessia Saggese
Mario Vento
Publikationsdatum
11.01.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 17/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05572-5

Weitere Artikel der Ausgabe 17/2021

Neural Computing and Applications 17/2021 Zur Ausgabe

S. I : Hybridization of Neural Computing with Nature Inspired Algorithms

Nature-inspired algorithm-based secure data dissemination framework for smart city networks

S. I : Hybridization of Neural Computing with Nature Inspired Algorithms

Deep Q-network-based multi-criteria decision-making framework for virtual simulation environment

Premium Partner