Skip to main content
Top
Published in:
Cover of the book

2016 | OriginalPaper | Chapter

2. Acoustic Features and Modelling

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter gives an overview of the methods for speech and music analysis implemented by the author in the openSMILE toolkit. The methods described, include all the relevant processing steps from an audio signal to a classification result. These steps include pre-processing and segmentation of the input, feature extraction (i.e., computation of acoustic Low-level Descriptors (LLDs) and summarisation of these descriptors in high level segments), and modelling (e.g., classification).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
In openSMILE the FFT with complex valued output (and also the inverse FFT) is implemented by the cTransformFFT component. Magnitude and Phase can be computed with the cFFTmagphase component.
 
2
In openSMILE windowing of audio samples (i.e., short-time analysis) can be performed with the cFramer component.
 
4
In openSMILE pre-emphasis can be implemented with the cPreemphasis component on a continuous signal, or with the cVectorPreemphasis component on a frame base (Hidden Markov Toolkit (Young et al. 2006) (HTK) compatible behaviour).
 
5
RMS and logarithmic energy can be computed in openSMILE with the cEnergy component.
 
6
openSMILE defines \(8.674676 \times 10^{-19}\) as a floor value for the argument of the log, for samples scaled to the range of \(-1\)\(+1\). In case of sample value range from \(-32767\) to \(+32767\) (HTK compatible mode), the floor value for the argument of the log is 1.
 
7
The loudness approximation and the signal intensity as defined here can be extracted in openSMILE with the cIntensity component.
 
8
In openSMILE the option dBpsd must be enabled in the cFftMagphase component in order to compute logarithmic power spectral densities.
 
9
In openSMILE these spectral scale transformations and spline interpolation can be applied with the cSpecScale component.
 
11
The SPEEX version of the Bark transformation is implemented in openSMILE as forward transformation only. Not all components will work, as most components require a backward scale transformation.
 
12
For an implementation, see the cMelspec component in openSMILE and scale transformation functions in the smileUtil library.
 
13
Band spectra can be computed in openSMILE with the cMelspec component, which—despite the name Melspec—can compute general band spectra for all supported frequency scales from a linear magnitude or power spectrum.
 
14
In openSMILE the cMelspec component implements these filterbanks for various frequency scales (not only Mel).
 
15
In openSMILE the FIR filterbanks with Gabor, gammatone, high- and low-pass filters can be applied with the cFirFilterbank component.
 
16
In openSMILE these spectral descriptors can be extracted with the cSpectral component.
 
17
In openSMILE, this is implemented in the cSpectral component.
 
18
This is the current default in all openSMILE feature sets up to version 2.0. An option for normalisation might appear in later versions.
 
19
In the cSpectral component.
 
20
Enabled by the option normBandEnergies of the cSpectral component of openSMILE.
 
21
ACF according to this equation is implemented in openSMILE in the cAcf component.
 
22
In openSMILE linear predictive coding is supported via the cLpc component.
 
23
As implemented in openSMILE in the cLpc component.
 
24
In openSMILE the cLsp component implements LSP computation based on code from the Speex codec library (www.​speex.​org).
 
25
In openSMILE formant extraction is implemented via this method in the cFormant component, which processes the AR LP coefficients from the cLpc component.
 
26
PLP via this method is implemented in openSMILE via the cPlp component.
 
27
In openSMILE this Bark scale can be selected in the cMelspec component by setting the specScale option to ‘bark_schroed’.
 
28
openSMILE allows for this flexibility because the PLP procedure builds on a chain of components: cTransformFFT, cFFTmagphase, cMelspec (for the non-linear band spectrum), and cPlp (for equal loudness and intensity power law and autoregressive modelling and cepstral coefficients).
 
29
In openSMILE it is enabled by setting htkcompatible to 1 in the cPlp component.
 
30
Configurable via the option compression in the openSMILE component cPlp.
 
31
In openSMILE MFCC are computed via cMelspec (taking FFT magnitude spectrum from cFFTmagphase as input) and cMfcc.
 
32
In openSMILE the floor value is also \(10^{-8}\) by default, and 1 when htkcompatible=1 in cMfcc.
 
33
Please note, that the DCT equation given in Young et al. (2006) and here differ because Young et al. (2006) start the summation at \(b=1\) for the first Mel-spectrum band, while here the first band is set at \(b=0\).
 
34
PLP-CC can be computed in openSMILE by creating a chain of cFFTmagphase, cMelspec, and cPlp and setting the appropriate options for cepstral coefficients in the cPlp component.
 
35
In openSMILE this behaviour is implemented in the pitch smoother components and in the cPitchACF component; the output \(F_0\) final contains \(F_0\) with values forced to 0 for unvoiced regions. See the documentation for more details.
 
36
In the cPitchACF component, which requires combined ACF and Cepstrum input from two instances of the cAcf component.
 
37
The method is implemented in openSMILE in two components: cSpecScale which performs spectral peak enhancement, smoothing, octave scale interpolation, and auditory weighting; cPitchShs which expects the spectrum produced by cSpecScale and performs the shifting, compression, and summation as well as pitch candidate estimation by peak picking.
 
38
\(\gamma \) can be changed in openSMILE via the compressionFactor option of the cPitchShs component.
 
39
The greedy peak picking algorithm behaviour is achieved in openSMILE when the greedyPeakAlgo option is set to 1. The old (non-greedy) version of the algorithm searched through the peaks from lowest to highest frequency and considered the first peak found as the first candidate. Another candidate was only added if the magnitude was higher than that of the previous first candidate. This behaviour was sub-optimal for Viterbi based smoothing, which requires multiple candidates to evaluate the best path among them.
 
40
In openSMILE this behaviour is not implemented in the cPitchShs component, but is rather implemented via the configuration, e.g., for the smileF0_base.conf and IS13_ComParE.conf configurations. Thereby, the cValbasedSelector component is used to force F0 values to 0 (indicating unvoiced parts) if the energy falls below the threshold.
 
41
Available in openSMILE via the cPitchSmoother component.
 
42
In openSMILE the Viterbi based pitch smoothing is implemented in the cPitchSmoother Viterbi component.
 
43
In openSMILE version 2.0 and above, these parameters are implemented by the cHarmonics component.
 
44
This definition of Jitter is implemented in openSMILE in the cPitchJitter component. It can be enabled via the jitterLocal option.
 
45
This definition of delta Jitter is implemented in openSMILE in the cPitchJitter component. It can be enabled via the jitterDDP option.
 
46
searchRangeRel option of the cPitchJitter component in openSMILE.
 
47
minCC option in openSMILE.
 
48
sourceQualityMean and sourceQualityRange options in cPitchJitter of openSMILE.
 
49
In openSMILE CHROMA features are supported by the cChroma component, which requires a semi-tone band spectrum as input, which can be generated by the cTonespec component (preferred) or by the (more general) cMelspec component.
 
50
In openSMILE CENS features can be computed from CHROMA (PCP) features with the cCens component.
 
51
In openSMILE the simple difference function can be applied with the cDeltaRegression component with the delta window size set to 0 (option deltaWin \(=\) 0).
 
52
In openSMILE these delta regression coefficients can be computed with the cDeltaRegression component.
 
53
Option deltaWin in openSMILE component cDeltaRegression.
 
54
In openSMILE the smoothing via a moving average window is implemented in the cContourSmoother component. Feature names often carry the suffix _sma, which stands for ‘smoothed (with) moving average (filtering)’.
 
55
In openSMILE univariate functionals are accessible via the cFunctionals component.
 
56
Implementations of mean value related functionals are contained in the cFunctionalMeans component in openSMILE, which can be activated by setting functionalsEnabled = Means in the configuration of cFunctionals.
 
57
And is the implementation used in openSMILE.
 
58
And also implemented in the cFunctionalMeans component.
 
59
In openSMILE the norm option of cFunctionalMeans can be set to segment to normalise counts and times etc. by N.
 
60
Implemented in openSMILE in the cFunctionalMoments component.
 
61
In openSMILE extreme values can be extracted with the cFunctionalExtremes component.
 
62
Percentiles are implemented in openSMILE in the cFunctionalPercentiles component.
 
63
In openSMILE the temporal centroid is implemented by the cFunctionalRegression component, as the sums are shared with the regression equations, thus computing both descriptors in the same component increases the efficiency.
 
64
In openSMILE the cFunctionalRegression component computes linear and quadratic regression coefficients.
 
65
As used in this thesis, in order to avoid a name conflict with the quadratic regression coefficients a and b and time t.
 
66
In openSMILE, the time scaling feature is enabled by the normRegCoeff option in cFunctionalRegression component. Setting it to 1 enables the relative time scale \(g=1/N\) and setting it to 2 enables the absolute time scale in seconds.
 
67
Option normInputs in openSMILE component cFunctionalRegression—also affects linear and quadratic error.
 
68
Option normInputs in the openSMILE component cFunctionalRegression—note that this option also affects the regression coefficients as it effectively normalises the input range.
 
69
In openSMILE these functionals are implemented in the component cFunctionalTimes.
 
70
Configurable with the norm option in openSMILE.
 
71
In openSMILE these functionals can be applied with the cFunctionalPeaks2 component; the cFunctionalPeaks component contains an older, obsolete peak picking algorithm.
 
72
In openSMILE in cFunctionalPeaks2 norm=second has to be set for this behaviour (default).
 
73
norm=frame in openSMILE.
 
74
norm=segment in openSMILE.
 
75
In openSMILE the norm option controls this behaviour (frames, seconds, segment—respectively).
 
76
See the absThresh and relThresh options in the openSMILE component cFunctionalPeaks2.
 
77
In openSMILE segment-based temporal functionals can be computed with the component cFunctionalSegments.
 
78
Use the ravgLng option of the cFunctionalSegments component in openSMILE.
 
79
This length can be changed via the pauseMinLng option of the cFunctionalSegments component.
 
80
Computed in openSMILE by the cFunctionalOnset component.
 
81
Provided by the cFunctionalCrossings component in openSMILE.
 
82
Sample-based functionals are provided by the cFunctionalSamples component in openSMILE.
 
83
In openSMILE the cFunctionalDCT component computes DCT coefficient functionals.
 
84
In openSMILE the cFunctionalLpc component computes LP-analysis functionals.
 
85
In openSMILE the cFunctionalModulation component computes modulation spectrum functionals.
 
86
In openSMILE, the statistics can be applied to the modulation spectrum with the cSpectral component. Also other components which expect magnitude spectra (e.g., ACF in cAcf) can read from the output of cFunctionalModulation.
 
87
These features are not part of openSMILE (yet). It is planned to include them in future releases. C code is available from the author of this thesis upon request.
 
88
E.g., as is also implemented in the CURRENNT toolkit (http://​sourceforge.​net/​projects/​currennt) and the RNNLIB (http://​sourceforge.​net/​projects/​rnnl/​).
 
Literature
go back to reference R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy, in Advanced Techniques in Computing Sciences and Software Engineering, ed. by K. Elleithy (Springer, Netherlands, 2010), pp. 279–282. doi:10.1007/978-90-481-3660-5_47. ISBN 978-90-481-3659-9 R.G. Bachu, S. Kopparthi, B. Adapa, B.D. Barkana, Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy, in Advanced Techniques in Computing Sciences and Software Engineering, ed. by K. Elleithy (Springer, Netherlands, 2010), pp. 279–282. doi:10.​1007/​978-90-481-3660-5_​47. ISBN 978-90-481-3659-9
go back to reference A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, V. Aharonson, The impact of F0 extraction errors on the classification of prominence and emotion, in Proceedings of 16-th ICPhS (Saarbrücken, Germany, 2007), pp. 2201–2204 A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, V. Aharonson, The impact of F0 extraction errors on the classification of prominence and emotion, in Proceedings of 16-th ICPhS (Saarbrücken, Germany, 2007), pp. 2201–2204
go back to reference L.L. Beranek, Acoustic Measurements (Wiley, New York, 1949) L.L. Beranek, Acoustic Measurements (Wiley, New York, 1949)
go back to reference C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995)MATH C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995)MATH
go back to reference R.B. Blackman, J. Tukey, Particular pairs of windows, The Measurement of Power Spectra, from the Point of View of Communications Engineering (Dover, New York, 1959) R.B. Blackman, J. Tukey, Particular pairs of windows, The Measurement of Power Spectra, from the Point of View of Communications Engineering (Dover, New York, 1959)
go back to reference S. Böck, M. Schedl, Polyphonic piano note transcription with recurrent neural networks, in Proceedings of ICASSP 2012 (Kyoto, 2012), pp. 121–124 S. Böck, M. Schedl, Polyphonic piano note transcription with recurrent neural networks, in Proceedings of ICASSP 2012 (Kyoto, 2012), pp. 121–124
go back to reference P. Boersma, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993) P. Boersma, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proc. 17, 97–110 (1993)
go back to reference P. Boersma, Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001) P. Boersma, Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)
go back to reference B.P. Bogert, M.J.R. Healy, J.W. Tukey, The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, in Proceedings of the Symposium on Time Series Analysis, chapter 15, ed. by M. Rosenblatt (Wiley, New York, 1963), pp. 209–243 B.P. Bogert, M.J.R. Healy, J.W. Tukey, The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, in Proceedings of the Symposium on Time Series Analysis, chapter 15, ed. by M. Rosenblatt (Wiley, New York, 1963), pp. 209–243
go back to reference C.H. Chen, Signal Processing Handbook. Electrical Computer Engineering, vol. 51 (CRC Press, New York, 1988), 840 p. ISBN 978-0824779566 C.H. Chen, Signal Processing Handbook. Electrical Computer Engineering, vol. 51 (CRC Press, New York, 1988), 840 p. ISBN 978-0824779566
go back to reference A. Cheveigne, H. Kawahara, YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. (JASA) 111(4), 1917–1930 (2002)CrossRef A. Cheveigne, H. Kawahara, YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Am. (JASA) 111(4), 1917–1930 (2002)CrossRef
go back to reference J. Cooley, P. Lewis, P. Welch, The finite fourier transform. IEEE Trans. Audio Electroacoust. 17(2), 77–85 (1969)MathSciNetCrossRef J. Cooley, P. Lewis, P. Welch, The finite fourier transform. IEEE Trans. Audio Electroacoust. 17(2), 77–85 (1969)MathSciNetCrossRef
go back to reference C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
go back to reference R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M. Schröder, Feeltrace: an instrument for recording perceived emotion in real time, in Proceedings of the ISCA Workshop on Speech and Emotion (Newcastle, Northern Ireland, 2000), pp. 19–24 R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, M. Schröder, Feeltrace: an instrument for recording perceived emotion in real time, in Proceedings of the ISCA Workshop on Speech and Emotion (Newcastle, Northern Ireland, 2000), pp. 19–24
go back to reference G. Dahl, T. Sainath, G. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in Proceedings of ICASSP 2013 (IEEE, Vancouver, 2013), pp. 8609–8613 G. Dahl, T. Sainath, G. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in Proceedings of ICASSP 2013 (IEEE, Vancouver, 2013), pp. 8609–8613
go back to reference G. Dalquist, A. Björk, N. Anderson, Numerical Methods (Prentice Hall, Englewood Cliffs, 1974) G. Dalquist, A. Björk, N. Anderson, Numerical Methods (Prentice Hall, Englewood Cliffs, 1974)
go back to reference S. Damelin, W. Miller, The Mathematics of Signal Processing (Cambridge University Press, Cambridge, 2011). ISBN 978-1107601048CrossRefMATH S. Damelin, W. Miller, The Mathematics of Signal Processing (Cambridge University Press, Cambridge, 2011). ISBN 978-1107601048CrossRefMATH
go back to reference G. de Krom, A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. J. Speech Hear. Res. 36, 254–266 (1993)CrossRef G. de Krom, A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. J. Speech Hear. Res. 36, 254–266 (1993)CrossRef
go back to reference J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals, University of Michigan, Macmillan Publishing Company (1993) J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals, University of Michigan, Macmillan Publishing Company (1993)
go back to reference P. Deuflhard, Newton Methods For Nonlinear Problems: Affine Invariance and Adaptive Algorithms. Springer Series in Computational Mathematics, vol. 35 (Springer, Berlin, 2011), 440 p P. Deuflhard, Newton Methods For Nonlinear Problems: Affine Invariance and Adaptive Algorithms. Springer Series in Computational Mathematics, vol. 35 (Springer, Berlin, 2011), 440 p
go back to reference E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. McRorie, J.C. Martin, L. Devillers, S. Abrilian, A. Batliner, N. Amir, K. Karpouzis, The HUMAINE Database. Lecture Notes in Computer Science, vol. 4738 (Springer, Berlin, 2007), pp. 488–500 E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. McRorie, J.C. Martin, L. Devillers, S. Abrilian, A. Batliner, N. Amir, K. Karpouzis, The HUMAINE Database. Lecture Notes in Computer Science, vol. 4738 (Springer, Berlin, 2007), pp. 488–500
go back to reference J. Durbin, The fitting of time series models. Revue de l’Institut International de Statistique (Review of the International Statistical Institute) 28(3), 233–243 (1960)CrossRefMATH J. Durbin, The fitting of time series models. Revue de l’Institut International de Statistique (Review of the International Statistical Institute) 28(3), 233–243 (1960)CrossRefMATH
go back to reference C. Duxbury, M. Sandler, M. Davies, A hybrid approach to musical note onset detection, in Proceedings of the Digital Audio Effect Conference (DAFX’02) (Hamburg, Germany, 2002), pp. 33–38 C. Duxbury, M. Sandler, M. Davies, A hybrid approach to musical note onset detection, in Proceedings of the Digital Audio Effect Conference (DAFX’02) (Hamburg, Germany, 2002), pp. 33–38
go back to reference L.D. Enochson, R.K. Otnes, Programming and Analysis for Digital Time Series Data, 1st edn. U.S. Department of Defense, Shock and Vibration Information Center (1968) L.D. Enochson, R.K. Otnes, Programming and Analysis for Digital Time Series Data, 1st edn. U.S. Department of Defense, Shock and Vibration Information Center (1968)
go back to reference F. Eyben, M. Wöllmer, B. Schuller, openEAR—introducing the Munich open-source emotion and affect recognition toolkit, in Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), vol. I (IEEE, Amsterdam, 2009a), pp. 576–581 F. Eyben, M. Wöllmer, B. Schuller, openEAR—introducing the Munich open-source emotion and affect recognition toolkit, in Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009), vol. I (IEEE, Amsterdam, 2009a), pp. 576–581
go back to reference F. Eyben, M. Wöllmer, B. Schuller, A. Graves, From speech to letters—using a novel neural network architecture for grapheme based ASR, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009 (IEEE, Merano, 2009b), pp. 376–380 F. Eyben, M. Wöllmer, B. Schuller, A. Graves, From speech to letters—using a novel neural network architecture for grapheme based ASR, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009 (IEEE, Merano, 2009b), pp. 376–380
go back to reference F. Eyben, M. Wöllmer, B. Schuller, openSMILE—The Munich versatile and fast open-source audio feature extractor, in Proceedings of ACM Multimedia 2010 (ACM, Florence, 2010a), pp. 1459–1462 F. Eyben, M. Wöllmer, B. Schuller, openSMILE—The Munich versatile and fast open-source audio feature extractor, in Proceedings of ACM Multimedia 2010 (ACM, Florence, 2010a), pp. 1459–1462
go back to reference F. Eyben, S. Böck, B. Schuller, A. Graves, Universal onset detection with bidirectional long-short term memory neural networks, in Proceedings of ISMIR 2010 (ISMIR, Utrecht, The Netherlands, 2010b), pp. 589–594 F. Eyben, S. Böck, B. Schuller, A. Graves, Universal onset detection with bidirectional long-short term memory neural networks, in Proceedings of ISMIR 2010 (ISMIR, Utrecht, The Netherlands, 2010b), pp. 589–594
go back to reference F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, R. Cowie, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces (JMUI) 3(1–2), 7–19 (2010c). doi:10.1007/s12193-009-0032-6 F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, R. Cowie, On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Interfaces (JMUI) 3(1–2), 7–19 (2010c). doi:10.​1007/​s12193-009-0032-6
go back to reference F. Eyben, M. Wöllmer, B. Schuller, A multi-task approach to continuous five-dimensional affect sensing in natural speech, ACM Trans. Interact. Intell. Syst. 2(1), Article No. 6, 29 p. Special Issue on Affective Interaction in Natural Environments (2012) F. Eyben, M. Wöllmer, B. Schuller, A multi-task approach to continuous five-dimensional affect sensing in natural speech, ACM Trans. Interact. Intell. Syst. 2(1), Article No. 6, 29 p. Special Issue on Affective Interaction in Natural Environments (2012)
go back to reference G. Fant, Speech Sounds and Features (MIT press, Cambridge, 1973), p. 227 G. Fant, Speech Sounds and Features (MIT press, Cambridge, 1973), p. 227
go back to reference H.G. Feichtinger, T. Strohmer, Gabor Analysis and Algorithms (Birkhäuser, Boston, 1998). ISBN 0-8176-3959-4CrossRefMATH H.G. Feichtinger, T. Strohmer, Gabor Analysis and Algorithms (Birkhäuser, Boston, 1998). ISBN 0-8176-3959-4CrossRefMATH
go back to reference J.-B.-J. Fourier, Théorie analytique de la chaleur, University of Lausanne, Switzerland (1822) J.-B.-J. Fourier, Théorie analytique de la chaleur, University of Lausanne, Switzerland (1822)
go back to reference T. Fujishima, Realtime chord recognition of musical sound: a system using common lisp music, in Proceedings of the International Computer Music Conference (ICMC) 1999 (Bejing, China, 1999), pp. 464–467 T. Fujishima, Realtime chord recognition of musical sound: a system using common lisp music, in Proceedings of the International Computer Music Conference (ICMC) 1999 (Bejing, China, 1999), pp. 464–467
go back to reference S. Furui, Digital Speech Processing: Synthesis, and Recognition. Signal Processing and Communications, 2nd edn. (Marcel Denker Inc., New York, 1996) S. Furui, Digital Speech Processing: Synthesis, and Recognition. Signal Processing and Communications, 2nd edn. (Marcel Denker Inc., New York, 1996)
go back to reference C. Glaser, M. Heckmann, F. Joublin, C. Goerick, Combining auditory preprocessing and bayesian estimation for robust formant tracking. IEEE Trans. Audio Speech Lang. Process. 18(2), 224–236 (2010)CrossRef C. Glaser, M. Heckmann, F. Joublin, C. Goerick, Combining auditory preprocessing and bayesian estimation for robust formant tracking. IEEE Trans. Audio Speech Lang. Process. 18(2), 224–236 (2010)CrossRef
go back to reference F. Gouyon, F. Pachet, O. Delerue. Classifying percussive sounds: a matter of zero-crossing rate? in Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (Verona, Italy, 2000) F. Gouyon, F. Pachet, O. Delerue. Classifying percussive sounds: a matter of zero-crossing rate? in Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (Verona, Italy, 2000)
go back to reference A. Graves, Supervised sequence labelling with recurrent neural networks. Doctoral thesis, Technische Universität München, Munich, Germany (2008) A. Graves, Supervised sequence labelling with recurrent neural networks. Doctoral thesis, Technische Universität München, Munich, Germany (2008)
go back to reference A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRef A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRef
go back to reference W.D. Gregg, Analog & Digital Communication (Wiley, New York, 1977). ISBN 978-0-471-32661-8 W.D. Gregg, Analog & Digital Communication (Wiley, New York, 1977). ISBN 978-0-471-32661-8
go back to reference M. Grimm, K. Kroschel, S. Narayanan, Support vector regression for automatic recognition of spontaneous emotions in speech, in Proceedings of ICASSP 2007, vol. 4 (IEEE, Honolulu, 2007), pp. 1085–1088 M. Grimm, K. Kroschel, S. Narayanan, Support vector regression for automatic recognition of spontaneous emotions in speech, in Proceedings of ICASSP 2007, vol. 4 (IEEE, Honolulu, 2007), pp. 1085–1088
go back to reference B. Hammarberg, B. Fritzell, J. Gauffin, J. Sundberg, L. Wedin, Perceptual and acoustic correlates of abnormal voice qualities. Acta Otolaryngol. 90, 441–451 (1980)CrossRef B. Hammarberg, B. Fritzell, J. Gauffin, J. Sundberg, L. Wedin, Perceptual and acoustic correlates of abnormal voice qualities. Acta Otolaryngol. 90, 441–451 (1980)CrossRef
go back to reference H. Hanson, Glottal characteristics of female speakers: acoustic correlates. J. Acoust. Soc. Am. (JASA) 101, 466–481 (1997)CrossRef H. Hanson, Glottal characteristics of female speakers: acoustic correlates. J. Acoust. Soc. Am. (JASA) 101, 466–481 (1997)CrossRef
go back to reference H. Hanson, E.S. Chuang, Glottal characteristics of male speakers: acoustic correlates and comparison with female data. J. Acoust. Soc. Am. (JASA) 106, 1064–1077 (1999)CrossRef H. Hanson, E.S. Chuang, Glottal characteristics of male speakers: acoustic correlates and comparison with female data. J. Acoust. Soc. Am. (JASA) 106, 1064–1077 (1999)CrossRef
go back to reference F.J. Harris, On the use of windows for harmonic analysis with the discrete fourier transform. Proc. IEEE 66, 51–83 (1978)CrossRef F.J. Harris, On the use of windows for harmonic analysis with the discrete fourier transform. Proc. IEEE 66, 51–83 (1978)CrossRef
go back to reference H. Hermansky, Perceptual linear predictive (PLP) analysis for speech. J. Acoust. Soc. Am. (JASA) 87, 1738–1752 (1990)CrossRef H. Hermansky, Perceptual linear predictive (PLP) analysis for speech. J. Acoust. Soc. Am. (JASA) 87, 1738–1752 (1990)CrossRef
go back to reference H. Hermansky, N. Morgan, A. Bayya, P. Kohn, RASTA-PLP speech analysis technique, in Proceedings of ICASSP 1992, vol. 1 (IEEE, San Francisco, 1992), pp. 121–124 H. Hermansky, N. Morgan, A. Bayya, P. Kohn, RASTA-PLP speech analysis technique, in Proceedings of ICASSP 1992, vol. 1 (IEEE, San Francisco, 1992), pp. 121–124
go back to reference D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. (JASA) 83(1), 257–264 (1988)CrossRef D.J. Hermes, Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. (JASA) 83(1), 257–264 (1988)CrossRef
go back to reference W. Hess, Pitch Determination of Speech Signals: Algorithms and Devices (Springer, Berlin, 1983)CrossRef W. Hess, Pitch Determination of Speech Signals: Algorithms and Devices (Springer, Berlin, 1983)CrossRef
go back to reference S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
go back to reference S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Neural Networks, ed. by S.C. Kremer, J.F. Kolen (IEEE Press, New York, 2001) S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Neural Networks, ed. by S.C. Kremer, J.F. Kolen (IEEE Press, New York, 2001)
go back to reference ISO16:1975. ISO Standard 16:1975 Acoustics: Standard tuning frequency (Standard musical pitch). International Organization for Standardization (ISO) (1975) ISO16:1975. ISO Standard 16:1975 Acoustics: Standard tuning frequency (Standard musical pitch). International Organization for Standardization (ISO) (1975)
go back to reference T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of the 10th European Conference on Machine Learning (ECML-98), ed. by C. Nédellec, C. Rouveirol (Springer, Chemnitz, 1998), pp. 137–142 T. Joachims, Text categorization with support vector machines: learning with many relevant features, in Proceedings of the 10th European Conference on Machine Learning (ECML-98), ed. by C. Nédellec, C. Rouveirol (Springer, Chemnitz, 1998), pp. 137–142
go back to reference J.D. Johnston, Transform coding of audio signals using perceptual noise criteria. IEEE J. Sel. Areas Commun. 6(2), 314–332 (1988)CrossRef J.D. Johnston, Transform coding of audio signals using perceptual noise criteria. IEEE J. Sel. Areas Commun. 6(2), 314–332 (1988)CrossRef
go back to reference P. Kabal, R.P. Ramachandran, The computation of line spectral frequencies using Chebyshev polynomials. IEEE Trans. Acoust. Speech Signal Process. 34(6), 1419–1426 (1986)CrossRef P. Kabal, R.P. Ramachandran, The computation of line spectral frequencies using Chebyshev polynomials. IEEE Trans. Acoust. Speech Signal Process. 34(6), 1419–1426 (1986)CrossRef
go back to reference R. Kendall, E. Carterette, Difference thresholds for timbre related to spectral centroid, in Proceedings of the 4-th International Conference on Music Perception and Cognition (ICMPC) (Montreal, Canada, 1996), pp. 91–95 R. Kendall, E. Carterette, Difference thresholds for timbre related to spectral centroid, in Proceedings of the 4-th International Conference on Music Perception and Cognition (ICMPC) (Montreal, Canada, 1996), pp. 91–95
go back to reference J.F. Kenney, E.S. Keeping, Root mean square, Mathematics of Statistics, vol. 1, 3rd edn. (Van Nostrand, Princeton, 1962), pp. 59–60 J.F. Kenney, E.S. Keeping, Root mean square, Mathematics of Statistics, vol. 1, 3rd edn. (Van Nostrand, Princeton, 1962), pp. 59–60
go back to reference A. Kießling, Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung (Shaker, Aachen, 1997). ISBN 978-3-8265-2245-1 A. Kießling, Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung (Shaker, Aachen, 1997). ISBN 978-3-8265-2245-1
go back to reference A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, vol. 25, ed. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Curran Associates, Inc., 2012), pp. 1097–1105 A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, vol. 25, ed. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Curran Associates, Inc., 2012), pp. 1097–1105
go back to reference K. Kroschel, G. Rigoll, B. Schuller, Statistische Informationstechnik, 5th edn. (Springer, Berlin, 2011)CrossRefMATH K. Kroschel, G. Rigoll, B. Schuller, Statistische Informationstechnik, 5th edn. (Springer, Berlin, 2011)CrossRefMATH
go back to reference K. Lee, M. Slaney, Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio. IEEE Trans. Audio Speech Lang. Process. 16(2), 291–301 (2008). doi:10.1109/TASL.2007.914399. ISSN 1558-7916CrossRef K. Lee, M. Slaney, Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio. IEEE Trans. Audio Speech Lang. Process. 16(2), 291–301 (2008). doi:10.​1109/​TASL.​2007.​914399. ISSN 1558-7916CrossRef
go back to reference P. Lejeune-Dirichlet, Sur la convergence des séries trigonométriques qui servent à représenter une fonction arbitraire entre des limites données. Journal für die reine und angewandte Mathematik 4, 157–169 (1829)MathSciNetCrossRef P. Lejeune-Dirichlet, Sur la convergence des séries trigonométriques qui servent à représenter une fonction arbitraire entre des limites données. Journal für die reine und angewandte Mathematik 4, 157–169 (1829)MathSciNetCrossRef
go back to reference N. Levinson, A heuristic exposition of wiener’s mathematical theory of prediction and filtering. J. Math. Phys. 25, 110–119 (1947a) N. Levinson, A heuristic exposition of wiener’s mathematical theory of prediction and filtering. J. Math. Phys. 25, 110–119 (1947a)
go back to reference N. Levinson, The Wiener RMS error criterion in filter design and prediction. J. Math. Phys. 25(4), 261–278 (1947b) N. Levinson, The Wiener RMS error criterion in filter design and prediction. J. Math. Phys. 25(4), 261–278 (1947b)
go back to reference P.I. Lizorkin, Fourier transform, in Encyclopaedia of Mathematics, ed. by M. Hazewinkel (Springer, Berlin, 2002). ISBN 1-4020-0609-8 P.I. Lizorkin, Fourier transform, in Encyclopaedia of Mathematics, ed. by M. Hazewinkel (Springer, Berlin, 2002). ISBN 1-4020-0609-8
go back to reference I. Luengo, Evaluation of pitch detection algorithms under real conditions, in Proceedings of ICASSP 2007, vol. 4 (IEEE, Honolulu, 2007), pp. 1057–1060 I. Luengo, Evaluation of pitch detection algorithms under real conditions, in Proceedings of ICASSP 2007, vol. 4 (IEEE, Honolulu, 2007), pp. 1057–1060
go back to reference J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(5), 561–580 (1975)CrossRef J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(5), 561–580 (1975)CrossRef
go back to reference J. Makhoul, L. Cosell, LPCW: an LPC vocoder with linear predictive spectral warping, in Proceedings of ICASSP 1976 (IEEE, Philadelphia, 1976), pp. 466–469 J. Makhoul, L. Cosell, LPCW: an LPC vocoder with linear predictive spectral warping, in Proceedings of ICASSP 1976 (IEEE, Philadelphia, 1976), pp. 466–469
go back to reference B.S. Manjunath, P. Salembier, T. Sikoraa (eds.), Introduction to MPEG-7: Multimedia Content Description Interface (Wiley, Berlin, 2002), 396 p. ISBN 978-0-471-48678-7 B.S. Manjunath, P. Salembier, T. Sikoraa (eds.), Introduction to MPEG-7: Multimedia Content Description Interface (Wiley, Berlin, 2002), 396 p. ISBN 978-0-471-48678-7
go back to reference P. Martin, Détection de \(f_0\) par intercorrelation avec une fonction peigne. J. Etude Parole 12, 221–232 (1981) P. Martin, Détection de \(f_0\) par intercorrelation avec une fonction peigne. J. Etude Parole 12, 221–232 (1981)
go back to reference P. Martin, Comparison of pitch detection by cepstrum and spectral comb analysis, in Proceedings of ICASSP 1982 (IEEE, Paris, 1982), pp. 180–183 P. Martin, Comparison of pitch detection by cepstrum and spectral comb analysis, in Proceedings of ICASSP 1982 (IEEE, Paris, 1982), pp. 180–183
go back to reference J. Martinez, H. Perez, E. Escamilla, M.M. Suzuki, Speaker recognition using mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques, in Proceedings of the 22nd International Conference on Electrical Communications and Computers (CONIELECOMP) (Cholula, Puebla, 2012), pp. 248–251. doi:10.1109/CONIELECOMP.2012.6189918 J. Martinez, H. Perez, E. Escamilla, M.M. Suzuki, Speaker recognition using mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques, in Proceedings of the 22nd International Conference on Electrical Communications and Computers (CONIELECOMP) (Cholula, Puebla, 2012), pp. 248–251. doi:10.​1109/​CONIELECOMP.​2012.​6189918
go back to reference P. Masri, Computer modelling of sound for transformation and synthesis of musical signal. Doctoral thesis, University of Bristol, Bristol (1996) P. Masri, Computer modelling of sound for transformation and synthesis of musical signal. Doctoral thesis, University of Bristol, Bristol (1996)
go back to reference S. McCandless, An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust. Speech Signal Process. 22, 134–141 (1974)CrossRef S. McCandless, An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust. Speech Signal Process. 22, 134–141 (1974)CrossRef
go back to reference D.D. Mehta, D. Rudoy, P.K. Wolfe, Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking. J. Acoust. Soc. Am. (JASA) 132(3), 1732–1746 (2012)CrossRef D.D. Mehta, D. Rudoy, P.K. Wolfe, Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking. J. Acoust. Soc. Am. (JASA) 132(3), 1732–1746 (2012)CrossRef
go back to reference H. Misra, S. Ikbal, H. Bourlard, H. Hermansky, Spectral entropy based feature for robust ASR, in Proceedings of ICASSP 2004, vol. 1 (IEEE, Montreal, Canada, 2004), pp. I–193–6. doi:10.1109/ICASSP.2004.1325955 H. Misra, S. Ikbal, H. Bourlard, H. Hermansky, Spectral entropy based feature for robust ASR, in Proceedings of ICASSP 2004, vol. 1 (IEEE, Montreal, Canada, 2004), pp. I–193–6. doi:10.​1109/​ICASSP.​2004.​1325955
go back to reference O. Mubarak, E. Ambikairajah, J. Epps, T. Gunawan, Modulation features for speech and music classification, in Proceedings of the 10th IEEE Singapore International Conference on Communication systems (ICCS) 2006 (IEEE, 2006), pp. 1–5. doi:10.1109/ICCS.2006.301515 O. Mubarak, E. Ambikairajah, J. Epps, T. Gunawan, Modulation features for speech and music classification, in Proceedings of the 10th IEEE Singapore International Conference on Communication systems (ICCS) 2006 (IEEE, 2006), pp. 1–5. doi:10.​1109/​ICCS.​2006.​301515
go back to reference M. Müller, Information Retrieval for Music and Motion (Springer, Berlin, 2007)CrossRef M. Müller, Information Retrieval for Music and Motion (Springer, Berlin, 2007)CrossRef
go back to reference M. Müller, F. Kurth, M. Clausen, Audio matching via chroma-based statistical features, in Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR) (London, 2005a), pp. 288–295 M. Müller, F. Kurth, M. Clausen, Audio matching via chroma-based statistical features, in Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR) (London, 2005a), pp. 288–295
go back to reference M. Müller, F. Kurth, M. Clausen, Chroma-based statistical audio features for audio matching, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE, 2005b), pp. 275–278 M. Müller, F. Kurth, M. Clausen, Chroma-based statistical audio features for audio matching, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE, 2005b), pp. 275–278
go back to reference N.J. Nalini, S. Palanivel, Emotion recognition in music signal using AANN and SVM. Int. J. Comput. Appl. 77(2), 7–14 (2013) N.J. Nalini, S. Palanivel, Emotion recognition in music signal using AANN and SVM. Int. J. Comput. Appl. 77(2), 7–14 (2013)
go back to reference A.M. Noll, Cepstrum pitch determination. J. Acoust. Soc. Am. (JASA) 41(2), 293–309 (1967)CrossRef A.M. Noll, Cepstrum pitch determination. J. Acoust. Soc. Am. (JASA) 41(2), 293–309 (1967)CrossRef
go back to reference A.M. Noll, Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate, in Symposium on Computer Processing in Communication, vol. 19 (University of Brooklyn, New York, 1970), pp. 779–797, edited by the Microwave Institute A.M. Noll, Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate, in Symposium on Computer Processing in Communication, vol. 19 (University of Brooklyn, New York, 1970), pp. 779–797, edited by the Microwave Institute
go back to reference A.H. Nuttal, Some windows with very good sidelobe behavior. IEEE Trans. Acoust. Speech Signal Process. ASSP 29, 84–91 (1981)CrossRef A.H. Nuttal, Some windows with very good sidelobe behavior. IEEE Trans. Acoust. Speech Signal Process. ASSP 29, 84–91 (1981)CrossRef
go back to reference A.V. Oppenheim, R.W. Schafer, Digital Signal Processing (Prentice-Hall, Englewood Cliffs, 1975)MATH A.V. Oppenheim, R.W. Schafer, Digital Signal Processing (Prentice-Hall, Englewood Cliffs, 1975)MATH
go back to reference A.V. Oppenheim, A.S. Willsky, S. Hamid, Signals and Systems, 2nd edn. (Prentice Hall, Upper Saddle River, 1996) A.V. Oppenheim, A.S. Willsky, S. Hamid, Signals and Systems, 2nd edn. (Prentice Hall, Upper Saddle River, 1996)
go back to reference A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice Hall, Upper Saddle River, 1999) A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing (Prentice Hall, Upper Saddle River, 1999)
go back to reference T.W. Parsons, Voice and Speech Processing. Electrical and Computer Engineering (University of Michigan, McGraw-Hill, 1987) T.W. Parsons, Voice and Speech Processing. Electrical and Computer Engineering (University of Michigan, McGraw-Hill, 1987)
go back to reference S. Patel, K.R. Scherer, J. Sundberg, E. Björkner, Acoustic markers of emotions based on voice physiology, in Proceedings of Speech Prosody 2010 (ISCA, Chicago, 2010), pp. 100865:1–4 S. Patel, K.R. Scherer, J. Sundberg, E. Björkner, Acoustic markers of emotions based on voice physiology, in Proceedings of Speech Prosody 2010 (ISCA, Chicago, 2010), pp. 100865:1–4
go back to reference V. Pham, C. Kermorvant, J. Louradour, Dropout improves recurrent neural networks for handwriting recognition, in CoRR (2013) (online), arXiv:1312.4569 V. Pham, C. Kermorvant, J. Louradour, Dropout improves recurrent neural networks for handwriting recognition, in CoRR (2013) (online), arXiv:​1312.​4569
go back to reference J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines, Technical report MSR-98-14, Microsoft Research (1998) J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines, Technical report MSR-98-14, Microsoft Research (1998)
go back to reference L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
go back to reference L.R. Rabiner, B.H. Juang, An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef L.R. Rabiner, B.H. Juang, An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef
go back to reference L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition, 1st edn. (Prentice Hall, Englewood Cliffs, 1993)MATH L. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition, 1st edn. (Prentice Hall, Englewood Cliffs, 1993)MATH
go back to reference L. Rade, B. Westergren, Springers Mathematische Formeln (German translation by P. Vachenauer), 3rd edn. (Springer, Berlin, 2000). ISBN 3-540-67505-1CrossRef L. Rade, B. Westergren, Springers Mathematische Formeln (German translation by P. Vachenauer), 3rd edn. (Springer, Berlin, 2000). ISBN 3-540-67505-1CrossRef
go back to reference J.F. Reed, F. Lynn, B.D. Meade, Use of coefficient of variation in assessing variability of quantitative assays. Clin. Diagn. Lab. Immunol. 9(6), 1235–1239 (2002) J.F. Reed, F. Lynn, B.D. Meade, Use of coefficient of variation in assessing variability of quantitative assays. Clin. Diagn. Lab. Immunol. 9(6), 1235–1239 (2002)
go back to reference M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, in Proceedings of the IEEE International Conference on Neural Networks, vol. 1 (IEEE, San Francisco, 1993), pp. 586–591. doi:10.1109/icnn.1993.298623 M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: the RPROP algorithm, in Proceedings of the IEEE International Conference on Neural Networks, vol. 1 (IEEE, San Francisco, 1993), pp. 586–591. doi:10.​1109/​icnn.​1993.​298623
go back to reference F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, in Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013 (IEEE, Shanghai, 2013), pp. 1–8 F. Ringeval, A. Sonderegger, J. Sauer, D. Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions, in Proceedings of the 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), held in conjunction with FG 2013 (IEEE, Shanghai, 2013), pp. 1–8
go back to reference S. Rosen, P. Howell, The vocal tract as a linear system, Signals and Systems for Speech and Hearing, 1st edn. (Emerald Group, 1991), pp. 92–99. ISBN 978-0125972314 S. Rosen, P. Howell, The vocal tract as a linear system, Signals and Systems for Speech and Hearing, 1st edn. (Emerald Group, 1991), pp. 92–99. ISBN 978-0125972314
go back to reference G. Ruske, Automatische Spracherkennung. Methoden der Klassifikation und Merkmalsextraktion, 2nd edn. (Oldenbourg, Munich, 1993) G. Ruske, Automatische Spracherkennung. Methoden der Klassifikation und Merkmalsextraktion, 2nd edn. (Oldenbourg, Munich, 1993)
go back to reference B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning) (MIT Press, Cambridge, 2002) B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning) (MIT Press, Cambridge, 2002)
go back to reference M. Schröder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. ter Maat, G. McKeown, S. Pammi, M. Pantic, C. Pelachaud, B. Schuller, E. de Sevin, M. Valstar, M. Wöllmer, Building autonomous sensitive artificial listeners. IEEE Trans. Affect. Comput. 3(2), 165–183 (2012)CrossRef M. Schröder, E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, M. ter Maat, G. McKeown, S. Pammi, M. Pantic, C. Pelachaud, B. Schuller, E. de Sevin, M. Valstar, M. Wöllmer, Building autonomous sensitive artificial listeners. IEEE Trans. Affect. Comput. 3(2), 165–183 (2012)CrossRef
go back to reference M.R. Schroeder, Period histogram and product spectrum: new methods for fundamental-frequency measurement. J. Acoust. Soc. Am. (JASA) 43, 829–834 (1968)CrossRef M.R. Schroeder, Period histogram and product spectrum: new methods for fundamental-frequency measurement. J. Acoust. Soc. Am. (JASA) 43, 829–834 (1968)CrossRef
go back to reference M.R. Schroeder, Recognition of complex acoustic signals, in Life Sciences Research Reports, vol. 5, ed. by T.H. Bullock (Abakon Verlag, Berlin, 1977), 324 p M.R. Schroeder, Recognition of complex acoustic signals, in Life Sciences Research Reports, vol. 5, ed. by T.H. Bullock (Abakon Verlag, Berlin, 1977), 324 p
go back to reference B. Schuller, Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Doctoral thesis, Technische Universität München, Munich, Germany (2006) B. Schuller, Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Doctoral thesis, Technische Universität München, Munich, Germany (2006)
go back to reference B. Schuller, Intelligent Audio Analysis. Signals and Communication Technology (Springer, Berlin, 2013) B. Schuller, Intelligent Audio Analysis. Signals and Communication Technology (Springer, Berlin, 2013)
go back to reference B. Schuller, A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing (Wiley, Hoboken, 2013), 344 p. ISBN 978-1119971368 B. Schuller, A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing (Wiley, Hoboken, 2013), 344 p. ISBN 978-1119971368
go back to reference B. Schuller, G. Rigoll, M. Lang, Hidden Markov model-based speech emotion recognition, in Proceedings of ICASSP 2003, vol. 2 (IEEE, Hong Kong, 2003), pp. II 1–4 B. Schuller, G. Rigoll, M. Lang, Hidden Markov model-based speech emotion recognition, in Proceedings of ICASSP 2003, vol. 2 (IEEE, Hong Kong, 2003), pp. II 1–4
go back to reference B. Schuller, D. Arsić, F. Wallhoff, G. Rigoll, Emotion recognition in the noise applying large acoustic feature sets, in Proceedings of the 3rd International Conference on Speech Prosody (SP) 2006 (ISCA, Dresden, 2006), pp. 276–289 B. Schuller, D. Arsić, F. Wallhoff, G. Rigoll, Emotion recognition in the noise applying large acoustic feature sets, in Proceedings of the 3rd International Conference on Speech Prosody (SP) 2006 (ISCA, Dresden, 2006), pp. 276–289
go back to reference B. Schuller, F. Eyben, G. Rigoll, Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles, in Proceedings of ICASSP 2007, vol. I (IEEE, Honolulu, 2007), pp. 217–220 B. Schuller, F. Eyben, G. Rigoll, Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles, in Proceedings of ICASSP 2007, vol. I (IEEE, Honolulu, 2007), pp. 217–220
go back to reference B. Schuller, F. Eyben, G. Rigoll, Beat-synchronous data-driven automatic chord labeling, in Proceedings 34. Jahrestagung für Akustik (DAGA) 2008 (DEGA, Dresden, 2008), pp. 555–556 B. Schuller, F. Eyben, G. Rigoll, Beat-synchronous data-driven automatic chord labeling, in Proceedings 34. Jahrestagung für Akustik (DAGA) 2008 (DEGA, Dresden, 2008), pp. 555–556
go back to reference B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 emotion challenge, in Proceedings of INTERSPEECH 2009 (Brighton, 2009a), pp. 312–315 B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 emotion challenge, in Proceedings of INTERSPEECH 2009 (Brighton, 2009a), pp. 312–315
go back to reference B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009 (IEEE, Merano, 2009b), pp. 552–557 B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, Acoustic emotion recognition: A benchmark comparison of performances, in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2009 (IEEE, Merano, 2009b), pp. 552–557
go back to reference B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 paralinguistic challenge, in Proceedings of INTERSPEECH 2010 (ISCA, Makuhari, 2010), pp. 2794–2797 B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 paralinguistic challenge, in Proceedings of INTERSPEECH 2010 (ISCA, Makuhari, 2010), pp. 2794–2797
go back to reference B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 speaker state challenge, in Proceedings of INTERSPEECH 2011 (ISCA, Florence, 2011), pp. 3201–3204 B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 speaker state challenge, in Proceedings of INTERSPEECH 2011 (ISCA, Florence, 2011), pp. 3201–3204
go back to reference B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 speaker trait challenge, in Proceedings of INTERSPEECH 2012 (ISCA, Portland, 2012a) B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 speaker trait challenge, in Proceedings of INTERSPEECH 2012 (ISCA, Portland, 2012a)
go back to reference B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, 2012b), pp. 361–362. October B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, 2012b), pp. 361–362. October
go back to reference B. Schuller, F. Pokorny, S. Ladstätter, M. Fellner, F. Graf, L. Paletta. Acoustic geo-sensing: recognising cyclists’ route, route direction, and route progress from cell-phone audio, in Proceedings of ICASSP 2013 (IEEE, Vancouver, 2013a), pp. 453–457 B. Schuller, F. Pokorny, S. Ladstätter, M. Fellner, F. Graf, L. Paletta. Acoustic geo-sensing: recognising cyclists’ route, route direction, and route progress from cell-phone audio, in Proceedings of ICASSP 2013 (IEEE, Vancouver, 2013a), pp. 453–457
go back to reference B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of INTERSPEECH 2013 (ISCA, Lyon, 2013b), pp. 148–152 B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of INTERSPEECH 2013 (ISCA, Lyon, 2013b), pp. 148–152
go back to reference M. Schuster, K.K. Paliwal, Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRef M. Schuster, K.K. Paliwal, Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRef
go back to reference C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948). (Reprint with corrections in: ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001)) C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948). (Reprint with corrections in: ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001))
go back to reference M. Slaney, An efficient implementation of the patterson-holdsworth auditory filter bank. Technical Report 35, Apple Computer Inc. (1993) M. Slaney, An efficient implementation of the patterson-holdsworth auditory filter bank. Technical Report 35, Apple Computer Inc. (1993)
go back to reference M. Soleymani, M.N. Caro, E.M. Schmidt, Y.-H. Yang, The MediaEval 2013 brave new task: emotion in music, in Proceedings of the MediaEval 2013 Workshop (CEUR-WS.org, Barcelona, 2013) M. Soleymani, M.N. Caro, E.M. Schmidt, Y.-H. Yang, The MediaEval 2013 brave new task: emotion in music, in Proceedings of the MediaEval 2013 Workshop (CEUR-WS.org, Barcelona, 2013)
go back to reference F.K. Soong, B.-W. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of ICASSP 1984 (IEEE, San Diego, 1984), pp. 1.10.1–1.10.4 F.K. Soong, B.-W. Juang, Line spectrum pair (LSP) and speech data compression, in Proceedings of ICASSP 1984 (IEEE, San Diego, 1984), pp. 1.10.1–1.10.4
go back to reference A. Spanias, T. Painter, V. Atti, Audio Signal Processing and Coding (Wiley, Hoboken, 2007), 464 p. ISBN 978-0-471-79147-8 A. Spanias, T. Painter, V. Atti, Audio Signal Processing and Coding (Wiley, Hoboken, 2007), 464 p. ISBN 978-0-471-79147-8
go back to reference J. Stadermann, G. Rigoll, A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition, in Proceedings of INTERSPEECH 2004 (ISCA, Jeju, 2004), pp. 661–664 J. Stadermann, G. Rigoll, A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition, in Proceedings of INTERSPEECH 2004 (ISCA, Jeju, 2004), pp. 661–664
go back to reference J. Stadermann, G. Rigoll, Hybrid NN/HMM acoustic modeling techniques for distributed speech recognition. Speech Commun. 48(8), 1037–1046 (2006)CrossRef J. Stadermann, G. Rigoll, Hybrid NN/HMM acoustic modeling techniques for distributed speech recognition. Speech Commun. 48(8), 1037–1046 (2006)CrossRef
go back to reference J.F. Steffensen, Interpolation, 2nd edn. (Dover Publications, New York, 2012), 256 p. ISBN 978-0486154831 J.F. Steffensen, Interpolation, 2nd edn. (Dover Publications, New York, 2012), 256 p. ISBN 978-0486154831
go back to reference P. Suman, S. Karan, V. Singh, R. Maringanti, Algorithm for gunshot detection using mel-frequency cepstrum coefficients (MFCC), in Proceedings of the Ninth International Conference on Wireless Communication and Sensor Networks, ed. by R. Maringanti, M. Tiwari, A. Arora. Lecture Notes in Electrical Engineering, vol. 299 (Springer, India, 2014), pp. 155–166. doi:10.1007/978-81-322-1823-4_15. ISBN 978-81-322-1822-7 P. Suman, S. Karan, V. Singh, R. Maringanti, Algorithm for gunshot detection using mel-frequency cepstrum coefficients (MFCC), in Proceedings of the Ninth International Conference on Wireless Communication and Sensor Networks, ed. by R. Maringanti, M. Tiwari, A. Arora. Lecture Notes in Electrical Engineering, vol. 299 (Springer, India, 2014), pp. 155–166. doi:10.​1007/​978-81-322-1823-4_​15. ISBN 978-81-322-1822-7
go back to reference J. Sundberg, The Science of the Singing Voice (Northern Illinois University Press, Dekalb, 1987), p. 226. ISBN 978-0-87580-542-9 J. Sundberg, The Science of the Singing Voice (Northern Illinois University Press, Dekalb, 1987), p. 226. ISBN 978-0-87580-542-9
go back to reference D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New York, 1995), pp. 495–518. ISBN 0444821694 D. Talkin, A robust algorithm for pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, New York, 1995), pp. 495–518. ISBN 0444821694
go back to reference L. Tamarit, M. Goudbeek, K.R. Scherer, Spectral slope measurements in emotionally expressive speech, in Proceedings of SPKD-2008 (ISCA, 2008), paper 007 L. Tamarit, M. Goudbeek, K.R. Scherer, Spectral slope measurements in emotionally expressive speech, in Proceedings of SPKD-2008 (ISCA, 2008), paper 007
go back to reference H.M. Teager, S.M. Teager, Evidence for nonlinear sound production mechanisms in the vocal tract, in Proceedings of Speech Production and Speech Modelling, Bonas, France, ed. by W.J. Hardcastle, A. Marchal. NATO Advanced Study Institute Series D, vol. 55 (Kluwer Academic Publishers, Boston, 1990), pp. 241–261 H.M. Teager, S.M. Teager, Evidence for nonlinear sound production mechanisms in the vocal tract, in Proceedings of Speech Production and Speech Modelling, Bonas, France, ed. by W.J. Hardcastle, A. Marchal. NATO Advanced Study Institute Series D, vol. 55 (Kluwer Academic Publishers, Boston, 1990), pp. 241–261
go back to reference E. Terhardt, Pitch, consonance, and harmony. J. Acoust. Soc. Am. (JASA) 55, 1061–1069 (1974)CrossRef E. Terhardt, Pitch, consonance, and harmony. J. Acoust. Soc. Am. (JASA) 55, 1061–1069 (1974)CrossRef
go back to reference H. Traunmueller, Analytical expressions for the tonotoc sensory scale. J. Acoust. Soc. Am. (JASA) 88, 97–100 (1990)CrossRef H. Traunmueller, Analytical expressions for the tonotoc sensory scale. J. Acoust. Soc. Am. (JASA) 88, 97–100 (1990)CrossRef
go back to reference K. Turkowski, S. Gabriel, Filters for common resampling tasks, in Graphics Gems, ed. by A.S. Glassner (Academic Press, New York, 1990), pp. 147–165. ISBN 978-0-12-286165-9CrossRef K. Turkowski, S. Gabriel, Filters for common resampling tasks, in Graphics Gems, ed. by A.S. Glassner (Academic Press, New York, 1990), pp. 147–165. ISBN 978-0-12-286165-9CrossRef
go back to reference P.-F. Verhulst, Recherches mathématiques sur la loi d’accroissement de la population (mathematical researches into the law of population growth increase). Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles 18, 1–42 (1945) P.-F. Verhulst, Recherches mathématiques sur la loi d’accroissement de la population (mathematical researches into the law of population growth increase). Nouveaux Mémoires de l’Académie Royale des Sciences et Belles-Lettres de Bruxelles 18, 1–42 (1945)
go back to reference D. Ververidis, C. Kotropoulos, Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)CrossRef D. Ververidis, C. Kotropoulos, Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)CrossRef
go back to reference A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)CrossRefMATH A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)CrossRefMATH
go back to reference B. Vlasenko, B. Schuller, A. Wendemuth, G. Rigoll., Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, in Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction (ACII) 2007, ed. by A. Paiva, R. Prada, R.W. Picard. Lecture Notes in Computer Science, Lisbon, Portugal, vol. 4738 (Springer, Berlin, 2007), pp. 139–147 B. Vlasenko, B. Schuller, A. Wendemuth, G. Rigoll., Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing, in Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction (ACII) 2007, ed. by A. Paiva, R. Prada, R.W. Picard. Lecture Notes in Computer Science, Lisbon, Portugal, vol. 4738 (Springer, Berlin, 2007), pp. 139–147
go back to reference A.L. Wang, An industrial-strength audio search algorithm, in Proceedings of ISMIR (Baltimore, 2003) A.L. Wang, An industrial-strength audio search algorithm, in Proceedings of ISMIR (Baltimore, 2003)
go back to reference F. Weninger, F. Eyben, B. Schuller, The TUM approach to the mediaeval music emotion task using generic affective audio features, in Proceedings of the MediaEval 2013 Workshop (CEUR-WS.org, Barcelona, 2013) F. Weninger, F. Eyben, B. Schuller, The TUM approach to the mediaeval music emotion task using generic affective audio features, in Proceedings of the MediaEval 2013 Workshop (CEUR-WS.org, Barcelona, 2013)
go back to reference F. Weninger, F. Eyben, B. Schuller, On-line continuous-time music mood regression with deep recurrent neural networks, in Proceedings of ICASSP 2014 (IEEE, Florence, 2014), pp. 5449–5453 F. Weninger, F. Eyben, B. Schuller, On-line continuous-time music mood regression with deep recurrent neural networks, in Proceedings of ICASSP 2014 (IEEE, Florence, 2014), pp. 5449–5453
go back to reference P. Werbos, Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)CrossRef P. Werbos, Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 1550–1560 (1990)CrossRef
go back to reference N. Wiener, Extrapolation, Intrapolation and Smoothing of Stationary Time Series, M.I.T. Press Paperback Series (Book 9) (MIT Press, Cambridge, 1964), 163 p N. Wiener, Extrapolation, Intrapolation and Smoothing of Stationary Time Series, M.I.T. Press Paperback Series (Book 9) (MIT Press, Cambridge, 1964), 163 p
go back to reference M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies, in Proceedings of INTERSPEECH 2008 (ISCA, Brisbane, 2008), pp. 597–600 M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies, in Proceedings of INTERSPEECH 2008 (ISCA, Brisbane, 2008), pp. 597–600
go back to reference M. Wöllmer, F. Eyben, A. Graves, B. Schuller, G. Rigoll, Improving keyword spotting with a tandem BLSTM-DBN architecture, in Advances in Non-linear Speech Processing: Revised selected papers of the International Conference on Nonlinear Speech Processing (NOLISP) 2009, ed. by J. Sole-Casals, V. Zaiats. Lecture Notes on Computer Science (LNCS), vol. 5933/2010 (Springer, Vic, 2010), pp. 68–75 M. Wöllmer, F. Eyben, A. Graves, B. Schuller, G. Rigoll, Improving keyword spotting with a tandem BLSTM-DBN architecture, in Advances in Non-linear Speech Processing: Revised selected papers of the International Conference on Nonlinear Speech Processing (NOLISP) 2009, ed. by J. Sole-Casals, V. Zaiats. Lecture Notes on Computer Science (LNCS), vol. 5933/2010 (Springer, Vic, 2010), pp. 68–75
go back to reference M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, G. Rigoll, LSTM-Modeling of Continuous Emotions in an Audiovisual Affect Recognition Framework. Image Vis. Comput. (IMAVIS) 31(2), 153–163. Special Issue on Affect Analysis in Continuous Input (2013) M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, G. Rigoll, LSTM-Modeling of Continuous Emotions in an Audiovisual Affect Recognition Framework. Image Vis. Comput. (IMAVIS) 31(2), 153–163. Special Issue on Affect Analysis in Continuous Input (2013)
go back to reference Q. Yan, S. Vaseghi, E. Zavarehei, B. Milner, J. Darch, P. White, I. Andrianakis, Formant-tracking linear prediction model using hmms and kalman filters for noisy speech processing. Comput. Speech Lang. 21(3), 543–561 (2007). doi:10.1016/j.csl.2006.11.001 Q. Yan, S. Vaseghi, E. Zavarehei, B. Milner, J. Darch, P. White, I. Andrianakis, Formant-tracking linear prediction model using hmms and kalman filters for noisy speech processing. Comput. Speech Lang. 21(3), 543–561 (2007). doi:10.​1016/​j.​csl.​2006.​11.​001
go back to reference S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book, Cambridge University Engineering Department, for HTK version 3.4 edition (2006) S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book, Cambridge University Engineering Department, for HTK version 3.4 edition (2006)
go back to reference E. Yumoto, W.J. Gould, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. (JASA) 71(6), 1544–1549 (1981)CrossRef E. Yumoto, W.J. Gould, Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. (JASA) 71(6), 1544–1549 (1981)CrossRef
go back to reference G. Zhou, J.H.L. Hansen, J.F. Kaiser, Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001). doi:10.1109/89.905995 CrossRef G. Zhou, J.H.L. Hansen, J.F. Kaiser, Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001). doi:10.​1109/​89.​905995 CrossRef
go back to reference X. Zuo, P. Fung, A cross gender and cross lingual study of stress recognition in speech without linguistic features, in Proceedings of the 17th ICPhS (Hong Kong, China, 2011) X. Zuo, P. Fung, A cross gender and cross lingual study of stress recognition in speech without linguistic features, in Proceedings of the 17th ICPhS (Hong Kong, China, 2011)
go back to reference E. Zwicker, Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. (JASA) 33(2), 248–248 (1961)CrossRef E. Zwicker, Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. (JASA) 33(2), 248–248 (1961)CrossRef
go back to reference E. Zwicker, Masking and psychological excitation as consequences of ear’s frequency analysis, in Frequency Analysis and Periodicity Detection in Hearing, ed. by R. Plomp, G.F. Smoorenburg (Sijthoff, Leyden, 1970) E. Zwicker, Masking and psychological excitation as consequences of ear’s frequency analysis, in Frequency Analysis and Periodicity Detection in Hearing, ed. by R. Plomp, G.F. Smoorenburg (Sijthoff, Leyden, 1970)
go back to reference E. Zwicker, E. Terhardt, Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. (JASA) 68, 1523–1525 (1980)CrossRef E. Zwicker, E. Terhardt, Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. (JASA) 68, 1523–1525 (1980)CrossRef
go back to reference E. Zwicker, H. Fastl, Psychoacoustics—Facts and Models, 2nd edn. (Springer, Berlin, 1999), 417 p. ISBN 978-3540650638 E. Zwicker, H. Fastl, Psychoacoustics—Facts and Models, 2nd edn. (Springer, Berlin, 1999), 417 p. ISBN 978-3540650638
Metadata
Title
Acoustic Features and Modelling
Author
Florian Eyben
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-27299-3_2