Skip to main content

2018 | OriginalPaper | Buchkapitel

Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification

verfasst von : Sławomir K. Zieliński

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper extends the traditional methodology of acoustic scene classification based on machine listening towards a new class of multichannel audio signals. It identifies a set of new features of five-channel surround recordings for classification of the two basic spatial audio scenes. Moreover, it compares the three artificial intelligence-based classification approaches to audio scene classification. The results indicate that the method based on the early fusion of features is superior compared to those involving the late fusion of signal metrics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Richard, G., Virtanen, T., Bello, J.P., Ono, N., Glotin, H.: Introduction to the special section on sound scene and event analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1169–1171 (2017)CrossRef Richard, G., Virtanen, T., Bello, J.P., Ono, N., Glotin, H.: Introduction to the special section on sound scene and event analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1169–1171 (2017)CrossRef
2.
Zurück zum Zitat Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)CrossRef Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)CrossRef
3.
Zurück zum Zitat Chu, S., Narayanan, S., Jay Kuo C.-C., Matarić, M.J.: Where am I? Scene recognition for mobile robots using audio features. In: Proceedings of IEEE International Conference on Multimedia and Expo, Toronto, Canada, pp. 885–888. IEEE (2006) Chu, S., Narayanan, S., Jay Kuo C.-C., Matarić, M.J.: Where am I? Scene recognition for mobile robots using audio features. In: Proceedings of IEEE International Conference on Multimedia and Expo, Toronto, Canada, pp. 885–888. IEEE (2006)
4.
Zurück zum Zitat Petetin, Y., Laroche, C., Mayoue, A.: Deep neural networks for audio scene recognition. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), Nice, France, pp. 125–129. IEEE (2015) Petetin, Y., Laroche, C., Mayoue, A.: Deep neural networks for audio scene recognition. In: Proceedings of 23rd European Signal Processing Conference (EUSIPCO), Nice, France, pp. 125–129. IEEE (2015)
5.
Zurück zum Zitat Bisot, V., Serizel, R., Essid, S., Richard, G.: Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1216–1229 (2017)CrossRef Bisot, V., Serizel, R., Essid, S., Richard, G.: Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1216–1229 (2017)CrossRef
6.
Zurück zum Zitat Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., Mertins, A.: Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1278–1290 (2017)CrossRef Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., Mertins, A.: Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1278–1290 (2017)CrossRef
7.
Zurück zum Zitat Dargie, W.: Adaptive audio-based context recognition. IEEE Trans. Syst. Man Cybern. – Part A: Syst. Hum. 39(4), 715–725 (2009)CrossRef Dargie, W.: Adaptive audio-based context recognition. IEEE Trans. Syst. Man Cybern. – Part A: Syst. Hum. 39(4), 715–725 (2009)CrossRef
8.
Zurück zum Zitat Stowell, D., Benetos, E.: On-bird sound recordings: automatic acoustic recognition of activities and contexts. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1193–1206 (2017)CrossRef Stowell, D., Benetos, E.: On-bird sound recordings: automatic acoustic recognition of activities and contexts. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1193–1206 (2017)CrossRef
9.
Zurück zum Zitat Geiger, J.T., Schuller, B., Rigoll, G.: Large-scale audio feature extraction and SVM for acoustic scene classification. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY. IEEE (2013) Geiger, J.T., Schuller, B., Rigoll, G.: Large-scale audio feature extraction and SVM for acoustic scene classification. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY. IEEE (2013)
10.
Zurück zum Zitat Trowitzsch, I., Mohr, J., Kashef, Y., Obermayer, K.: Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)CrossRef Trowitzsch, I., Mohr, J., Kashef, Y., Obermayer, K.: Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)CrossRef
11.
Zurück zum Zitat Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017)CrossRef Imoto, K., Ono, N.: Spatial cepstrum as a spatial feature using a distributed microphone array for acoustic scene analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1335–1343 (2017)CrossRef
12.
Zurück zum Zitat Yang, W., Kirshnan, S.: Combining temporal features by local binary pattern for acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1315–1321 (2017)CrossRef Yang, W., Kirshnan, S.: Combining temporal features by local binary pattern for acoustic scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1315–1321 (2017)CrossRef
13.
Zurück zum Zitat Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)CrossRef Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)CrossRef
14.
Zurück zum Zitat ITU-R Rec. BS.775: Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012) ITU-R Rec. BS.775: Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union, Geneva, Switzerland (2012)
15.
Zurück zum Zitat Sánchez-Hevia, H.A., Ayllón, D., Gil-Pita, R., Rosa-Zurera, M.: Maximum likelihood decision fusion for weapon classification in wireless acoustic sensor networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1172–1182 (2017)CrossRef Sánchez-Hevia, H.A., Ayllón, D., Gil-Pita, R., Rosa-Zurera, M.: Maximum likelihood decision fusion for weapon classification in wireless acoustic sensor networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1172–1182 (2017)CrossRef
16.
Zurück zum Zitat Rumsey, F.: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002) Rumsey, F.: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)
18.
Zurück zum Zitat Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRef Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)CrossRef
19.
Zurück zum Zitat Trajdos, P., Kurzynski, M.: A dynamic model of classifier competence based on the local fuzzy confusion matrix and the random reference classifier. Int. J. Appl. Math. Comput. Sci. 26(1), 175–189 (2016)MathSciNetCrossRef Trajdos, P., Kurzynski, M.: A dynamic model of classifier competence based on the local fuzzy confusion matrix and the random reference classifier. Int. J. Appl. Math. Comput. Sci. 26(1), 175–189 (2016)MathSciNetCrossRef
22.
Zurück zum Zitat Bradley, J.S., Soulodre, G.A.: Objective measures of listener envelopment. J. Acoust. Soc. Am. 98(5), 2590–2597 (1995)CrossRef Bradley, J.S., Soulodre, G.A.: Objective measures of listener envelopment. J. Acoust. Soc. Am. 98(5), 2590–2597 (1995)CrossRef
24.
Zurück zum Zitat George, S., Zieliński, S., Rumsey, F.: Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans. Audio Speech Lang. Process. 14(6), 1994–2005 (2006)CrossRef George, S., Zieliński, S., Rumsey, F.: Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans. Audio Speech Lang. Process. 14(6), 1994–2005 (2006)CrossRef
25.
Zurück zum Zitat Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014)CrossRef Conetta, R., Brookes, T., Rumsey, F., Zieliński, S., Dewhirst, M., Jackson, P., Bech, S., Meares, D., George, S.: Spatial audio quality perception (part 2): a linear regression model. J. Audio Eng. Soc. 62(12), 847–860 (2014)CrossRef
Metadaten
Titel
Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification
verfasst von
Sławomir K. Zieliński
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91262-2_43