Skip to main content
Top

2013 | OriginalPaper | Chapter

4. Chain of Audio Processing

Author : Björn Schuller

Published in: Intelligent Audio Analysis

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The chain of processing in a typical Intelligent Audio Analysis system is outlined. Along its path, it leads from preprocessing to Low Level Descriptor extraction, chunking, supra segmental analysis and hierarchical functional extraction, feature reduction, feature selection and generation, parameter selection, model learning to the actual classification or regression. This can be followed by a fusion with other information streams, and encoding for the application context. The individual steps are explained in more detail.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Schuller, B.: Voice and speech analysis in search of states and traits. In: Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior, Advances in Pattern Recognition, chapter 9, pp. 227–253. Springer, Berlin (2011) Schuller, B.: Voice and speech analysis in search of states and traits. In: Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior, Advances in Pattern Recognition, chapter 9, pp. 227–253. Springer, Berlin (2011)
2.
go back to reference Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)CrossRef Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)CrossRef
3.
go back to reference Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Chichester (2009) Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.-I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Chichester (2009)
4.
go back to reference Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Advances in Human Computer Interaction, Special Issue on Emotion-Aware Natural Interaction, 2010(Article ID 782802), 1–15 (2010) Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Advances in Human Computer Interaction, Special Issue on Emotion-Aware Natural Interaction, 2010(Article ID 782802), 1–15 (2010)
5.
go back to reference Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process. 1, 1–23 (2009) Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process. 1, 1–23 (2009)
6.
go back to reference Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsić, D., Rigoll, G.: Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proceedings 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pp. 4501–4504. IEEE, Las Vegas (2008) Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsić, D., Rigoll, G.: Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proceedings 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pp. 4501–4504. IEEE, Las Vegas (2008)
7.
go back to reference Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. ACM, Florence (2010) Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. ACM, Florence (2010)
8.
go back to reference Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)MATH Jolliffe, I.T.: Principal Component Analysis. Springer, Berlin (2002)MATH
9.
go back to reference Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15, 1119–1125 (1994)CrossRef Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15, 1119–1125 (1994)CrossRef
10.
go back to reference Ververidis, D., Kotropoulos, C.: Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collection. In: Proceedings of European Signal Processing Conference (EUSIPCO 2006), Florence (2006) Ververidis, D., Kotropoulos, C.: Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collection. In: Proceedings of European Signal Processing Conference (EUSIPCO 2006), Florence (2006)
11.
go back to reference Bocklet, T., Stemmer, G., Zeissler, V., Nöth, E.: Age and gender recognition based on multiple systems—early versus late fusion. In: Proceedings of Interspeech, pp. 2830–2833. Makuhari, Japan (2010) Bocklet, T., Stemmer, G., Zeissler, V., Nöth, E.: Age and gender recognition based on multiple systems—early versus late fusion. In: Proceedings of Interspeech, pp. 2830–2833. Makuhari, Japan (2010)
12.
go back to reference De Melo, C., Paiva, A.: Expression of emotions in virtual humans using lights, shadows, composition and filters, volume 4738 LNCS of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Heidelberg (2007) De Melo, C., Paiva, A.: Expression of emotions in virtual humans using lights, shadows, composition and filters, volume 4738 LNCS of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Heidelberg (2007)
13.
go back to reference Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D.: EMMA: Extensible MultiModal Annotation markup language (2007) Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D.: EMMA: Extensible MultiModal Annotation markup language (2007)
14.
go back to reference Schröder, M., Devillers, L., Karpouzis, K., Martin, J.-C., Pelachaud, C., Peter, C., Pirker, H., Schuller, B., Tao, J., Wilson, I.: What should a generic emotion markup language be able to represent? In: Paiva, A., Picard, R.W., Prada, R. (eds.) Affective Computing and Intelligent Interaction: Second International Conference, ACII 2007, Lisbon, Portugal. Proceedings, volume 4738/2007 of Lecture Notes on Computer Science (LNCS), pp. 440–451. Springer, Berlin, 12–14 Sept 2007 Schröder, M., Devillers, L., Karpouzis, K., Martin, J.-C., Pelachaud, C., Peter, C., Pirker, H., Schuller, B., Tao, J., Wilson, I.: What should a generic emotion markup language be able to represent? In: Paiva, A., Picard, R.W., Prada, R. (eds.) Affective Computing and Intelligent Interaction: Second International Conference, ACII 2007, Lisbon, Portugal. Proceedings, volume 4738/2007 of Lecture Notes on Computer Science (LNCS), pp. 440–451. Springer, Berlin, 12–14 Sept 2007
15.
go back to reference Mao, X., Li, Z., Bao, H.: An extension of MPML with emotion recognition functions attached, volume 5208 LNAI of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin (2008) Mao, X., Li, Z., Bao, H.: An extension of MPML with emotion recognition functions attached, volume 5208 LNAI of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin (2008)
Metadata
Title
Chain of Audio Processing
Author
Björn Schuller
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-36806-6_4