Skip to main content
Top

2013 | OriginalPaper | Chapter

5. Audio Data

Author : Björn Schuller

Published in: Intelligent Audio Analysis

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to train and test Intelligent Audio Analysis systems, audio data is needed. In fact, this is often considered as one of the main bottle necks and the common opinion is that there is "no data like more data". In this light, the requirements for collecting and providing audio databases are outlined. This includes in particular the establishment of a reliable gold standard. Explanatory examples are given for the three types of audio—speech, music, and general sound—by the corpora TUM AVIC, NTWICM, and the FindSounds database.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
The term ground truth indeed originated in the fields of aerial photographs and satellite imagery.
 
5
The complete annotation by the four individuals is available at http://​www.​openaudio.​eu to ensure reproducibility by others.
 
Literature
1.
go back to reference Nieschulz, R., Schuller, B., Geiger, M., Neuss, R.: Aspects of efficient usability engineering. Inf. Technol. Spec. Issue Usability Eng 44(1), 23–30 (2002) Nieschulz, R., Schuller, B., Geiger, M., Neuss, R.: Aspects of efficient usability engineering. Inf. Technol. Spec. Issue Usability Eng 44(1), 23–30 (2002)
2.
go back to reference Riccardi, G., Hakkani-Tur, D.: Active learning: theory and applications to automatic speech recognition. IEEE Trans. Speech Audio Process. 13(4), 504–511 (2005)CrossRef Riccardi, G., Hakkani-Tur, D.: Active learning: theory and applications to automatic speech recognition. IEEE Trans. Speech Audio Process. 13(4), 504–511 (2005)CrossRef
3.
go back to reference Schuller, B., Zhang, Z., Weninger, F., Rigoll, G.: Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 1553–1556, Florence, Italy, August 2011 (ISCA, ISCA) Schuller, B., Zhang, Z., Weninger, F., Rigoll, G.: Using multiple databases for training in emotion recognition: To unite or to vote? In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 1553–1556, Florence, Italy, August 2011 (ISCA, ISCA)
4.
go back to reference Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 523–528, Big Island, HY, December 2011 (IEEE, IEEE) Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 523–528, Big Island, HY, December 2011 (IEEE, IEEE)
5.
go back to reference Zhang, Z., Schuller, B.: Semi-supervised learning helps in sound event classification. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 333–336, Kyoto, Japan, March 2012 (IEEE, IEEE) Zhang, Z., Schuller, B.: Semi-supervised learning helps in sound event classification. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 333–336, Kyoto, Japan, March 2012 (IEEE, IEEE)
6.
go back to reference Schuller, B., Burkhardt, F.: Learning with synthesized speech for automatic emotion recognition. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5150–515, Dallas, TX, March 2010 (IEEE, IEEE) Schuller, B., Burkhardt, F.: Learning with synthesized speech for automatic emotion recognition. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5150–515, Dallas, TX, March 2010 (IEEE, IEEE)
7.
go back to reference Grimm, M., Kroschel, K.: Evaluation of natural emotions using self assessment manikins. In: Proceedings of ASRU, pp. 381–385 (2005) (IEEE) Grimm, M., Kroschel, K.: Evaluation of natural emotions using self assessment manikins. In: Proceedings of ASRU, pp. 381–385 (2005) (IEEE)
8.
go back to reference Krippendorff, K.: Content Analysis, An Introduction to Its Methodology, 2nd edn. Sage Publications, Thousand Oaks, CA, U. S. A. (2004) Krippendorff, K.: Content Analysis, An Introduction to Its Methodology, 2nd edn. Sage Publications, Thousand Oaks, CA, U. S. A. (2004)
9.
go back to reference Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)CrossRef Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)CrossRef
11.
go back to reference Fleiss, J.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions, Chapter 13, pp. 212–236, 2nd edn. John Wiley & Sons, New York (1981) Fleiss, J.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions, Chapter 13, pp. 212–236, 2nd edn. John Wiley & Sons, New York (1981)
12.
go back to reference Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Imag. Vis. Comput. Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior 27(12), 1760–1774 (2009) Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Imag. Vis. Comput. Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior 27(12), 1760–1774 (2009)
13.
go back to reference Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German Audio-Visual Emotional Speech Database. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 865–868, Hannover, Germany (2008) Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German Audio-Visual Emotional Speech Database. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 865–868, Hannover, Germany (2008)
14.
go back to reference Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. Special Issue on Scalable Audio-Content Analysis, 2010(Article ID 735854), 19 (2010) Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. Special Issue on Scalable Audio-Content Analysis, 2010(Article ID 735854), 19 (2010)
15.
go back to reference Schuller, B., Weninger, F., Dorfner, J.: Multi-modal non-prototypical music mood analysis in continuous space: reliability and performances. In: Proceedings 12th International Society for Music Information Retrieval Conference, ISMIR 2011, pp. 759–764, Miami, FL, October 2011 (ISMIR, ISMIR) Schuller, B., Weninger, F., Dorfner, J.: Multi-modal non-prototypical music mood analysis in continuous space: reliability and performances. In: Proceedings 12th International Society for Music Information Retrieval Conference, ISMIR 2011, pp. 759–764, Miami, FL, October 2011 (ISMIR, ISMIR)
16.
go back to reference Hevner, K.: Experimental studies of the elements of expression in music. Am. J. Psychol. 48, 246–268 (1936)CrossRef Hevner, K.: Experimental studies of the elements of expression in music. Am. J. Psychol. 48, 246–268 (1936)CrossRef
17.
go back to reference Farnsworth, P.R.: The Social Psychology of Music. The Dryden Press, New York (1958) Farnsworth, P.R.: The Social Psychology of Music. The Dryden Press, New York (1958)
18.
go back to reference Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of ISMIR, pp. 239–240, Baltimore, MD (2003) Li, T., Ogihara, M.: Detecting emotion in music. In: Proceedings of ISMIR, pp. 239–240, Baltimore, MD (2003)
19.
go back to reference Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings 9th International Conference on Music Information Retrieval (ISMIR), pp. 462–467, Philadelphia, PA (2008) Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.: The 2007 MIREX audio mood classification task: lessons learned. In: Proceedings 9th International Conference on Music Information Retrieval (ISMIR), pp. 462–467, Philadelphia, PA (2008)
20.
go back to reference Russell, J.A.: The Measurement of Emotions, Volume 4 of Emotion, Theory, Research, and Experience, Chapter Measures of Emotion, pp. 83–111. Academic Press, San Diego (1989) Russell, J.A.: The Measurement of Emotions, Volume 4 of Emotion, Theory, Research, and Experience, Chapter Measures of Emotion, pp. 83–111. Academic Press, San Diego (1989)
21.
go back to reference Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)CrossRef
22.
go back to reference Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1990) Thayer, R.E.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1990)
23.
go back to reference Liu, D.: Automatic mood detection from acoustic music data. In: Proceedings International Conference on Music, Information Retrieval, pp. 13–17 (2003) Liu, D.: Automatic mood detection from acoustic music data. In: Proceedings International Conference on Music, Information Retrieval, pp. 13–17 (2003)
24.
go back to reference Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)MathSciNetCrossRef Lu, L., Liu, D., Zhang, H.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 5–18 (2006)MathSciNetCrossRef
25.
go back to reference Xiao, Z., Dellandréa, E., Dou, W., Chen, L.: What is the best segment duration for music mood analysis? In: Proceedings of International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 17–24, (2008) Xiao, Z., Dellandréa, E., Dou, W., Chen, L.: What is the best segment duration for music mood analysis? In: Proceedings of International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 17–24, (2008)
26.
go back to reference Bartsch, M.A., Wakefield, G.H.: To Catch a Chorus: Using Chroma-Based Representations for Audio Thumbnailing. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001, pp. 15–18, New Paltz, NY, October 2001 Bartsch, M.A., Wakefield, G.H.: To Catch a Chorus: Using Chroma-Based Representations for Audio Thumbnailing. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001, pp. 15–18, New Paltz, NY, October 2001
27.
go back to reference Schuller, B., Rigoll, G., Lang, M.: Hmm-based music retrieval using stereophonic feature information and framelength adaptation. In: Proceedings 4th IEEE International Conference on Multimedia and Expo, ICME 2003, vol. II, pp. 713–716, Baltimore, MD, July 2003 (IEEE, IEEE) Schuller, B., Rigoll, G., Lang, M.: Hmm-based music retrieval using stereophonic feature information and framelength adaptation. In: Proceedings 4th IEEE International Conference on Multimedia and Expo, ICME 2003, vol. II, pp. 713–716, Baltimore, MD, July 2003 (IEEE, IEEE)
28.
go back to reference Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Trans. Audio Speech Lang. Process. 14(5), 1783–1794 (2006)CrossRef Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Trans. Audio Speech Lang. Process. 14(5), 1783–1794 (2006)CrossRef
29.
go back to reference Müller, M., Kurth, F.: Towards structural analysis of audio recordings in the presence of mucical variations. EURASIP J. Adv. Signal Process. ID 89686, (2007) Müller, M., Kurth, F.: Towards structural analysis of audio recordings in the presence of mucical variations. EURASIP J. Adv. Signal Process. ID 89686, (2007)
30.
go back to reference S. Steidl, A. Batliner, D. Seppi, and B. Schuller. On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process. Special Issue on Atypical Speech, 2010(Article ID 783954), 2010. pp. 14 S. Steidl, A. Batliner, D. Seppi, and B. Schuller. On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process. Special Issue on Atypical Speech, 2010(Article ID 783954), 2010. pp. 14
31.
go back to reference Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRef Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRef
32.
go back to reference Gabrielsson, A.: Emotion perceived and emotion felt: same or different? Musicae Scientiae, pp. 123–147 (2002) Gabrielsson, A.: Emotion perceived and emotion felt: same or different? Musicae Scientiae, pp. 123–147 (2002)
33.
go back to reference Steidl, S., Schuller, B., Seppi, D., Batliner, A.: The hinterland of emotions: facing the open-microphone challenge. In: Proceedings 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 690–697, Amsterdam, The Netherlands, September 2009 (HUMAINE Association, IEEE) Steidl, S., Schuller, B., Seppi, D., Batliner, A.: The hinterland of emotions: facing the open-microphone challenge. In: Proceedings 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 690–697, Amsterdam, The Netherlands, September 2009 (HUMAINE Association, IEEE)
34.
go back to reference Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (June 1996) Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (June 1996)
35.
go back to reference Rice, S.V., Bailey, S.M.: A web search engine for sound effects. In: Proceedings of 119th AES, New York (2005) Rice, S.V., Bailey, S.M.: A web search engine for sound effects. In: Proceedings of 119th AES, New York (2005)
36.
go back to reference Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 341–344, Kyoto, Japan, March 2012 (IEEE, IEEE) Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 341–344, Kyoto, Japan, March 2012 (IEEE, IEEE)
37.
go back to reference Gunes, H., Schuller, B., Pantic, M., Cowie, R.: Emotion representation, analysis and synthesis in continuous space: A survey. In: Proceedings International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, pp. 827–834, Santa Barbara, CA, March 2011 (IEEE, IEEE) Gunes, H., Schuller, B., Pantic, M., Cowie, R.: Emotion representation, analysis and synthesis in continuous space: A survey. In: Proceedings International Workshop on Emotion Synthesis, rePresentation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, pp. 827–834, Santa Barbara, CA, March 2011 (IEEE, IEEE)
38.
go back to reference Kim, Y., Schmidt, E., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266, Utrecht, The Netherlands (2010) Kim, Y., Schmidt, E., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266, Utrecht, The Netherlands (2010)
Metadata
Title
Audio Data
Author
Björn Schuller
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-36806-6_5