nach oben

Neural Computing and Applications

Erschienen in:

11.01.2021 | Original Article

DENet: a deep architecture for audio surveillance applications

verfasst von: Antonio Greco, Antonio Roberto, Alessia Saggese, Mario Vento

Erschienen in: Neural Computing and Applications | Ausgabe 17/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the last years, a big interest of both the scientific community and the market has been devoted to the design of audio surveillance systems, able to analyse the audio stream and to identify events of interest; this is particularly true in security applications, in which the audio analytics can be profitably used as an alternative to video analytics systems, but also combined with them. Within this context, in this paper we propose a novel recurrent convolutional neural network architecture, named DENet; it is based on a new layer that we call denoising-enhancement (DE) layer, which performs denoising and enhancement of the original signal by applying an attention map on the components of the band-filtered signal. Differently from state-of-the-art methodologies, DENet takes as input the lossless raw waveform and is able to automatically learn the evolution of the frequencies-of-interest over time, by combining the proposed layer with a bidirectional gated recurrent unit. Using the feedbacks coming from classifications related to consecutive frames (i.e. that belong to the same event), the proposed method is able to drastically reduce the misclassifications. We carried out experiments on the MIVIA Audio Events and MIVIA Road Events public datasets, confirming the effectiveness of our approach with respect to other state-of-the-art methodologies.

Vorheriger Artikel Kernel ridge regression model for sediment transport in open channel flow

Nächster Artikel Viscous dissipation and MHD hybrid nanofluid flow towards an exponentially stretching/shrinking surface

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1d convolutional neural network. Expert Syst Appl 136:252–263. https://doi.org/10.1016/j.eswa.2019.06.040CrossRef

Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Esesn BCV, Awwal AAS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. https://arxiv.org/abs/1803.01164

Auger F, Flandrin P (1995) Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans Signal Process 43(5):1068–1089CrossRef

Aytar Y, Vondrick C, Torralba A (2016) Soundnet: learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp 892–900

Carletti V, Foggia P, Percannella G, Saggese A, Strisciuglio N, Vento M (2013) Audio surveillance using a bag of aural words classifier. In: IEEE international conference on advanced video and signal based surveillance (AVSS), pp 81–86. https://doi.org/10.1109/avss.2013.6636620

Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv CSUR 48(4):1–46CrossRef

Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2015) Reliable detection of audio events in highly noisy environments. Pattern Recognit Lett 65:22–28. https://doi.org/10.1016/j.patrec.2015.06.026CrossRef

Foggia P, Petkov N, Saggese A, Strisciuglio N, Vento M (2016) Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans Intell Transp Syst 17(1):279–288. https://doi.org/10.1109/tits.2015.2470216CrossRef

Foggia P, Saggese A, Strisciuglio N, Vento M, Petkov N (2015) Car crashes detection by audio analysis in crowded roads. In: 2015 12th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6. IEEE. https://doi.org/10.1109/avss.2015.7301731

10.

Foggia P, Saggese A, Strisciuglio N, Vento M, Vigilante V (2019) Detecting sounds of interest in roads with deep networks. In: Ricci E, Rota Bulò S, Snoek C, Lanz O, Messelodi S, Sebe N (eds) Image analysis and processing—ICIAP 2019, pp 583–592. Springer International Publishing, Cham

11.

Furui S (1986) Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11, pp 1991–1994. IEEE

12.

Greco A, Petkov N, Saggese A, Vento M (2020) AReN: a deep learning approach for sound event recognition using a brain inspired representation. IEEE Trans Inf Forensics Secur 15:3610–3624. https://doi.org/10.1109/tifs.2020.2994740CrossRef

13.

Greco A, Saggese A, Vento M, Vigilante V (2019) SoReNet: a novel deep network for audio surveillance applications. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), pp 546–551. IEEE. https://doi.org/10.1109/smc.2019.8914435

14.

Hershey S, Chaudhuri S, Ellis DPW, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss RJ, Wilson K (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135

15.

Kim T, Lee J, Nam J (2019) Comparison and analysis of sample CNN architectures for audio classification. IEEE J Sel Top Signal Process 13(2):285–297CrossRef

16.

Kumar P, Mittal A, Kumar P (2008) A multimodal framework using audio, visible and infrared imagery for surveillance and security applications. Int J Signal Imaging Syst Eng 1(3/4):255. https://doi.org/10.1504/ijsise.2008.026797CrossRef

17.

Leng YR, Tran HD, Kitaoka N, Li H (2010) Selective gammatone filterbank feature for robust sound event recognition. In: Eleventh annual conference of the international speech communication association

18.

Li J, Dai W, Metze F, Qu S, Das S (2017) A comparison of deep learning methods for environmental sound detection. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 126–130. IEEE. https://doi.org/10.1109/icassp.2017.7952131

19.

Mathur A, Isopoussu A, Kawsar F, Berthouze N, Lane ND (2019) Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In: Proceedings of the 18th international conference on information processing in sensor networks, pp 169–180

20.

Nooralahiyan AY, Lopez L, Mckewon D, Ahmadi M (1997) Time-delay neural network for audio monitoring of road traffic and vehicle classification. In: Transportation sensors and controls: collision avoidance, traffic management, and ITS, vol 2902, pp 193–200. International Society for Optics and Photonics. https://doi.org/10.1117/12.267145

21.

Purwins H, Li B, Virtanen T, Schlüter J, Chang SY, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206–219CrossRef

22.

Ravanelli M, Bengio Y (2018) Speaker recognition from raw waveform with sincnet. In: 2018 IEEE spoken language technology workshop (SLT). IEEE. https://doi.org/10.1109/slt.2018.8639585

23.

Roberto A, Saggese A, Vento M (2020) A deep convolutionary network for automatic detection of audio events. In: International conference on applications of intelligent systems (APPIS). https://doi.org/10.1145/3378184.3378186

24.

Saggese A, Strisciuglio N, Vento M, Petkov N (2016) Time-frequency analysis for audio event detection in real scenarios. In: 2016 13th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 438–443. IEEE. https://doi.org/10.1109/avss.2016.7738082

25.

Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681. https://doi.org/10.1109/78.650093CrossRef

26.

Strisciuglio N, Vento M, Petkov N (2019) Learning representations of sound using trainable COPE feature extractors. Pattern Recognit 92:25–36. https://doi.org/10.1016/j.patcog.2019.03.016CrossRef

27.

Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI Global

28.

Valera M, Velastin SA (2005) Intelligent distributed surveillance systems: a review. IEE Proc Vis Image Signal Process 152(2):192–204CrossRef

29.

Wan T, Zhou Y, Ma Y, Liu H (2019) Noise robust sound event detection using deep learning and audio enhancement. In: 2019 IEEE international symposium on signal processing and information technology (ISSPIT), pp 1–5. IEEE

30.

Wei P, He F, Li L, Li J (2020) Research on sound classification based on SVM. Neural Comput Appl 32(6):1593–1607CrossRef

31.

Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 559–563. https://doi.org/10.1109/icassp.2015.7178031

Titel: DENet: a deep architecture for audio surveillance applications
verfasst von: Antonio Greco
Antonio Roberto
Alessia Saggese
Mario Vento
Publikationsdatum: 11.01.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 17/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-020-05572-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 17/2021

ATPS: an adaptive trajectory prediction system based on semantic information for dynamic objects

Nature-inspired algorithm-based secure data dissemination framework for smart city networks

Machine learning for landslides prevention: a survey

Optimization of decoupling point position using metaheuristic evolutionary algorithms for smart mass customization manufacturing

A multi-objective particle swarm for constraint and unconstrained problems

Deep Q-network-based multi-criteria decision-making framework for virtual simulation environment

Premium Partner