Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2023

20.06.2023 | Industrial and Commercial Application

Multi-level distance embedding learning for robust acoustic scene classification with unseen devices

verfasst von: Gang Jiang, Zhongchen Ma, Qirong Mao, Jianming Zhang

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Acoustic scene classification (ASC) aims to analyse the recording scene of a piece of audio. In real life, ASC has to deal with audio data from various recording devices, even those recorded by devices that did not appear during the training phase. Audio data recorded by different devices, especially unseen devices, have differences in sampling rate, amplitude, data distribution, etc. These differences can greatly interfere with the feature learning process of CNNs and lead to degradation of the performance of the ASC model. In order to learn advanced features that are less susceptible to differences in device information from manual features that contain device information, we propose an ASC method based on multi-level distance embedding space, called multi-level distance embedding learning (MDEL). There is a hierarchical relationship among the categories of acoustic scene, that is, from the three coarse-grained categories of indoor, outdoor, and transportation to more fine-grained categories. This relation corresponds to a similarity relation between categories of different granularity. MDEL exploits this hierarchical relationship of similarity between acoustic scene classes to construct embedding space containing multi-level distance. During the learning process, the model is guided to focus more on common features of the same scene classes and learn an advanced feature that is more robust to the device, thus improving the robustness of the model to data from unseen devices. Our method was evaluated on the audio dataset provided by the DCASE2020 Challenge for Task1a, and the overall classification accuracy was improved by 1.2\(\%\). For audio data from unseen devices, the classification accuracy was improved by 2.3\(\%\).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yuanbo H, Bo K, Hauwermeiren V W, Botteldooren D (2022) Relation-guided acoustic scene classification aided with event embeddings. arXiv preprint arXiv:2205.00499 Yuanbo H, Bo K, Hauwermeiren V W, Botteldooren D (2022) Relation-guided acoustic scene classification aided with event embeddings. arXiv preprint arXiv:​2205.​00499
2.
Zurück zum Zitat Barchiesi D, Giannoulis D, Stowell D, Plumbley M (2015) Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process Mag 32(3):16–34CrossRef Barchiesi D, Giannoulis D, Stowell D, Plumbley M (2015) Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process Mag 32(3):16–34CrossRef
3.
Zurück zum Zitat Byeonggeun K, Seunghan Y, Jangho K, Hyunsin P, Juntae L, Simyung C (2022) Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification. arXiv preprint arXiv:2206.12513 Byeonggeun K, Seunghan Y, Jangho K, Hyunsin P, Juntae L, Simyung C (2022) Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification. arXiv preprint arXiv:​2206.​12513
4.
Zurück zum Zitat Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley M (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746CrossRef Stowell D, Giannoulis D, Benetos E, Lagrange M, Plumbley M (2015) Detection and classification of acoustic scenes and events. IEEE Trans Multimedia 17(10):1733–1746CrossRef
5.
Zurück zum Zitat Qian K, Janott C, Zhang Z, Deng J, Baird A, Heiser C, Hohenhorst W, Herzog M, Hemmert W, Schuller B (2018) Teaching machines on snoring: a benchmark on computer audition for snore sound excitation localisation. Arch Acoustics 43(3):465–475 Qian K, Janott C, Zhang Z, Deng J, Baird A, Heiser C, Hohenhorst W, Herzog M, Hemmert W, Schuller B (2018) Teaching machines on snoring: a benchmark on computer audition for snore sound excitation localisation. Arch Acoustics 43(3):465–475
6.
Zurück zum Zitat Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Context aware computing for the internet of things: A survey. IEEE Commun Surv Tutor 16(1):414–454CrossRef Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Context aware computing for the internet of things: A survey. IEEE Commun Surv Tutor 16(1):414–454CrossRef
7.
Zurück zum Zitat Harma A, Jakka J, Tikander M, Karjalainen M, Lokki T, Nironen H (2003) Techniques and applications for wearable augmented reality audio. In: Audio engineering society convention 114. audio engineering Harma A, Jakka J, Tikander M, Karjalainen M, Lokki T, Nironen H (2003) Techniques and applications for wearable augmented reality audio. In: Audio engineering society convention 114. audio engineering
8.
Zurück zum Zitat Martinson AE (2007) Robotic discovery of the auditory scene. In: robotics and automation, 2007 IEEE international conference on Martinson AE (2007) Robotic discovery of the auditory scene. In: robotics and automation, 2007 IEEE international conference on
9.
Zurück zum Zitat Qian K, Zhao R, Pandit V, Yang Z, Schuller B (2017) Wavelets revisited for the classification of acoustic scenes. In: workshop on detection and classification of acoustic scenes and events Qian K, Zhao R, Pandit V, Yang Z, Schuller B (2017) Wavelets revisited for the classification of acoustic scenes. In: workshop on detection and classification of acoustic scenes and events
10.
Zurück zum Zitat Amiriparian S, Freitag M, Cummins N, Schuller B (2017) Sequence to sequence autoencoders for unsupervised representation learning from audio Amiriparian S, Freitag M, Cummins N, Schuller B (2017) Sequence to sequence autoencoders for unsupervised representation learning from audio
11.
Zurück zum Zitat Hershey S, Chaudhuri S, Ellis D, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B (2017) Cnn architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135. IEEE Hershey S, Chaudhuri S, Ellis D, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B (2017) Cnn architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 131–135. IEEE
12.
Zurück zum Zitat Ren Z, Kong Q, Han J, Plumbley M, Schuller BW (2019) Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK, pp 56–60 Ren Z, Kong Q, Han J, Plumbley M, Schuller BW (2019) Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK, pp 56–60
13.
Zurück zum Zitat Wang H, Zou Y, Chong D (2020) Acoustic scene classification with spectrogram processing strategies Wang H, Zou Y, Chong D (2020) Acoustic scene classification with spectrogram processing strategies
14.
Zurück zum Zitat Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):1–1CrossRef Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):1–1CrossRef
15.
Zurück zum Zitat Abeer J (2020) A review of deep learning based methods for acoustic scene classification. Appl Sci 10(6):2020CrossRef Abeer J (2020) A review of deep learning based methods for acoustic scene classification. Appl Sci 10(6):2020CrossRef
16.
Zurück zum Zitat Suh S, Park S, Jeong Y, Lee T (June 2020) Designing acoustic scene classification models with CNN variants. Technical report, DCASE2020 challenge Suh S, Park S, Jeong Y, Lee T (June 2020) Designing acoustic scene classification models with CNN variants. Technical report, DCASE2020 challenge
17.
Zurück zum Zitat Hu H, Yang C, Xia X, Bai X, Lee CH (2020) Device-robust acoustic scene classification based on two-stage categorization and data augmentation Hu H, Yang C, Xia X, Bai X, Lee CH (2020) Device-robust acoustic scene classification based on two-stage categorization and data augmentation
18.
Zurück zum Zitat Hu H, Yang C, Xia X, Bai X, Lee CH (2021) A two-stage approach to device-robust acoustic scene classification. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 845–849. IEEE Hu H, Yang C, Xia X, Bai X, Lee CH (2021) A two-stage approach to device-robust acoustic scene classification. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 845–849. IEEE
19.
Zurück zum Zitat Gao W, Mcdonnell M (June 2020) Acoustic scene classification using deep residual networks with focal loss and mild domain adaptation. Technical report, DCASE2020 Challenge Gao W, Mcdonnell M (June 2020) Acoustic scene classification using deep residual networks with focal loss and mild domain adaptation. Technical report, DCASE2020 Challenge
20.
Zurück zum Zitat Jie L (June 2020) Acoustic scene classification with residual networks and attention mechanism. Technical report, DCASE2020 Challenge Jie L (June 2020) Acoustic scene classification with residual networks and attention mechanism. Technical report, DCASE2020 Challenge
21.
Zurück zum Zitat Mcdonnell M, Gao W (2020) Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) Mcdonnell M, Gao W (2020) Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP)
22.
Zurück zum Zitat Jie H, Li S, Gang S, Albanie S (2019) Squeeze-and-excitation networks. IEEE transactions on pattern analysis and machine intelligence Jie H, Li S, Gang S, Albanie S (2019) Squeeze-and-excitation networks. IEEE transactions on pattern analysis and machine intelligence
23.
Zurück zum Zitat Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. Springer, Cham Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. Springer, Cham
24.
Zurück zum Zitat Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection
25.
Zurück zum Zitat Hadsell R, Chopra S, Lecun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) Hadsell R, Chopra S, Lecun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06)
26.
Zurück zum Zitat Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. IEEE Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. IEEE
29.
Zurück zum Zitat Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization
30.
Zurück zum Zitat Koutini K, Eghbal-Zadeh H, Dorfer M, Widmer G (2019) The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification Koutini K, Eghbal-Zadeh H, Dorfer M, Widmer G (2019) The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification
31.
Metadaten
Titel
Multi-level distance embedding learning for robust acoustic scene classification with unseen devices
verfasst von
Gang Jiang
Zhongchen Ma
Qirong Mao
Jianming Zhang
Publikationsdatum
20.06.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01172-w

Weitere Artikel der Ausgabe 3/2023

Pattern Analysis and Applications 3/2023 Zur Ausgabe

Premium Partner