Skip to main content
Top

2008 | OriginalPaper | Chapter

10. Pitch and Voicing Determination of Speech with an Extension Toward Music Signals

Author : Wolfgang J. Hess, Prof.

Published in: Springer Handbook of Speech Processing

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter reviews selected methods for pitch determination of speech and music signals. As both these signals are time variant we first define what is subsumed under the term pitch. Then we subdivide pitch determination algorithms (PDAs) into short-term analysis algorithms, which apply some spectral transform and derive pitch from a frequency or lag domain representation, and time-domain algorithms, which analyze the signal directly and apply structural analysis or determine individual periods from the first partial or compute the instant of glottal closure in speech. In the 1970s, when many of these algorithms were developed, the main application in speech technology was the vocoder, whereas nowadays prosody recognition in speech understanding systems and high-accuracy pitch period determination for speech synthesis corpora are emphasized. In musical acoustics, pitch determination is applied in melody recognition or automatic musical transcription, where we also have the problem that several pitches can exist simultaneously.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
10.1.
go back to reference W.J. Hess: Pitch Determination of Speech Signals - Algorithms and Devices (Springer, Berlin, Heidelberg 1983)CrossRef W.J. Hess: Pitch Determination of Speech Signals - Algorithms and Devices (Springer, Berlin, Heidelberg 1983)CrossRef
10.2.
go back to reference R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. 34, 744-754 (1986)CrossRef R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. 34, 744-754 (1986)CrossRef
10.3.
go back to reference E. Zwicker, W.J. Hess, E. Terhardt: Erkennung gesprochener Zahlworte mit Funktionsmodell und Rechenanlage, Kybernetik 3, 267-272 (1967), (in German)CrossRef E. Zwicker, W.J. Hess, E. Terhardt: Erkennung gesprochener Zahlworte mit Funktionsmodell und Rechenanlage, Kybernetik 3, 267-272 (1967), (in German)CrossRef
10.4.
10.5.
go back to reference R. Plomp: Aspects of Tone Sensation (Academic, London 1976) R. Plomp: Aspects of Tone Sensation (Academic, London 1976)
10.6.
go back to reference R. Meddis, L. OʼMard: A unitary model for pitch perception, J. Acoust. Soc. Am. 102, 1811-1820 (1997)CrossRef R. Meddis, L. OʼMard: A unitary model for pitch perception, J. Acoust. Soc. Am. 102, 1811-1820 (1997)CrossRef
10.7.
go back to reference K.J. Kohler: 25 Years of Phonetica: Preface to the special issue on pitch analysis, Phonetica 39, 185-187 (1992) K.J. Kohler: 25 Years of Phonetica: Preface to the special issue on pitch analysis, Phonetica 39, 185-187 (1992)
10.8.
go back to reference W.J. Hess, H. Indefrey: Accurate time-domain pitch determination of speech signals by means of a laryngograph, Speech Commun. 6, 55-68 (1987)CrossRef W.J. Hess, H. Indefrey: Accurate time-domain pitch determination of speech signals by means of a laryngograph, Speech Commun. 6, 55-68 (1987)CrossRef
10.9.
go back to reference W.J. Hess: Pitch and voicing determination. In: Advances in Speech Signal Processing, ed. by M.M. Sondhi, S. Furui (Dekker, New York 1992), p.3-48 W.J. Hess: Pitch and voicing determination. In: Advances in Speech Signal Processing, ed. by M.M. Sondhi, S. Furui (Dekker, New York 1992), p.3-48
10.10.
go back to reference A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41, 293-309 (1967)CrossRef A.M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am. 41, 293-309 (1967)CrossRef
10.11.
go back to reference L.R. Rabiner: On the use of autocorrelation analysis for pitch detection, IEEE Trans. Acoust. Speech Signal Process. 25, 24-33 (1977)CrossRef L.R. Rabiner: On the use of autocorrelation analysis for pitch detection, IEEE Trans. Acoust. Speech Signal Process. 25, 24-33 (1977)CrossRef
10.12.
go back to reference E. Terhardt, G. Stoll, M. Seewann: Algorithm for extraction of pitch and pitch salience from complex tonal signals, J. Acoust. Soc. Am. 71, 679-688 (1982)CrossRef E. Terhardt, G. Stoll, M. Seewann: Algorithm for extraction of pitch and pitch salience from complex tonal signals, J. Acoust. Soc. Am. 71, 679-688 (1982)CrossRef
10.13.
go back to reference M.S. Harris, N. Umeda: Difference limens for fundamental frequency contours in sentences, J. Acoust. Soc. Am. 81, 1139-1145 (1987)CrossRef M.S. Harris, N. Umeda: Difference limens for fundamental frequency contours in sentences, J. Acoust. Soc. Am. 81, 1139-1145 (1987)CrossRef
10.14.
go back to reference J. ʼt Hart: Differential sensitivity to pitch distance, particularly in speech, J. Acoust. Soc. Am. 69, 811-822 (1981)CrossRef J. ʼt Hart: Differential sensitivity to pitch distance, particularly in speech, J. Acoust. Soc. Am. 69, 811-822 (1981)CrossRef
10.15.
go back to reference H. Duifhuis, L.F. Willems, R.J. Sluyter: Measurement of pitch in speech: an implementation of Goldsteinʼs theory of pitch perception, J. Acoust. Soc. Am. 71, 1568-1580 (1982)CrossRef H. Duifhuis, L.F. Willems, R.J. Sluyter: Measurement of pitch in speech: an implementation of Goldsteinʼs theory of pitch perception, J. Acoust. Soc. Am. 71, 1568-1580 (1982)CrossRef
10.16.
go back to reference D.J. Hermes: Measurement of pitch by subharmonic summation, J. Acoust. Soc. Am. 83, 257-264 (1988)CrossRef D.J. Hermes: Measurement of pitch by subharmonic summation, J. Acoust. Soc. Am. 83, 257-264 (1988)CrossRef
10.17.
go back to reference D. Talkin: A robust algorithm for pitch tracking (RAPT). In: Speech Coding and Synthesis, ed. by B. Kleijn, K. Paliwal (Elsevier, Amsterdam 1995), p.-495-518 D. Talkin: A robust algorithm for pitch tracking (RAPT). In: Speech Coding and Synthesis, ed. by B. Kleijn, K. Paliwal (Elsevier, Amsterdam 1995), p.-495-518
10.18.
go back to reference P. Hedelin, D. Huber: Pitch period determination of aperiodic speech signals, Proc. IEEE ICASSP (1990) pp. 361-364 P. Hedelin, D. Huber: Pitch period determination of aperiodic speech signals, Proc. IEEE ICASSP (1990) pp. 361-364
10.19.
go back to reference H. Hollien: On vocal registers, J. Phonet. 2, 225-243 (1974) H. Hollien: On vocal registers, J. Phonet. 2, 225-243 (1974)
10.20.
go back to reference N.P. McKinney: Laryngeal Frequency Analysis for Linguistic Research (Univ. Michigan, Ann Arbor 1965), Res. Rept. No. 14 N.P. McKinney: Laryngeal Frequency Analysis for Linguistic Research (Univ. Michigan, Ann Arbor 1965), Res. Rept. No. 14
10.21.
go back to reference H. Fujisaki, K. Hirose, K. Shimizu: A new system for reliable pitch extraction of speech, Proc. IEEE ICASSP (1986), paper 34.16 H. Fujisaki, K. Hirose, K. Shimizu: A new system for reliable pitch extraction of speech, Proc. IEEE ICASSP (1986), paper 34.16
10.22.
go back to reference M.M. Sondhi: New methods of pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 26, 262-266 (1968) M.M. Sondhi: New methods of pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 26, 262-266 (1968)
10.23.
go back to reference J.D. Markel: The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Acoust. Speech Signal Process. 20, 149-153 (1972) J.D. Markel: The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Acoust. Speech Signal Process. 20, 149-153 (1972)
10.24.
go back to reference V.N. Sobolev, S.P. Baronin: Investigation of the shift method for pitch determination, Elektrosvyaz 12, 30-36 (1968), in Russian V.N. Sobolev, S.P. Baronin: Investigation of the shift method for pitch determination, Elektrosvyaz 12, 30-36 (1968), in Russian
10.25.
go back to reference J.A. Moorer: The optimum comb method of pitch period analysis of continuous digitized speech, IEEE Trans. Acoust. Speech Signal Process. 22, 330-338 (1974)CrossRef J.A. Moorer: The optimum comb method of pitch period analysis of continuous digitized speech, IEEE Trans. Acoust. Speech Signal Process. 22, 330-338 (1974)CrossRef
10.26.
go back to reference T. Shimamura, H. Kobayashi: Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. Speech Audio Process. 9, 727-730 (2001)CrossRef T. Shimamura, H. Kobayashi: Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. Speech Audio Process. 9, 727-730 (2001)CrossRef
10.27.
go back to reference A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111, 1917-1930 (2002)CrossRef A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111, 1917-1930 (2002)CrossRef
10.28.
go back to reference K. Hirose, H. Fujisaki, S. Seto: A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag, Proc. IEEE ICASSP (1992) pp. 149-152 K. Hirose, H. Fujisaki, S. Seto: A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag, Proc. IEEE ICASSP (1992) pp. 149-152
10.29.
go back to reference D.E. Terez: Robust pitch determination using nonlinear state-space embedding, Proc. IEEE ICASSP (2002) D.E. Terez: Robust pitch determination using nonlinear state-space embedding, Proc. IEEE ICASSP (2002)
10.30.
go back to reference C.M. Rader: Vector pitch detection, J. Acoust Soc. Am. 36(C), 1463 (1964) C.M. Rader: Vector pitch detection, J. Acoust Soc. Am. 36(C), 1463 (1964)
10.31.
go back to reference L.A. Yaggi: Full Duplex Digital Vocoder (Texas Instruments, Dallas 1962), Scientific Report No.1, SP14-A62; DDC-AD-282986 L.A. Yaggi: Full Duplex Digital Vocoder (Texas Instruments, Dallas 1962), Scientific Report No.1, SP14-A62; DDC-AD-282986
10.32.
go back to reference Y. Medan, E. Yair, D. Chazan: Super resolution pitch determination of speech signals, IEEE Trans. Signal Process. 39, 40-48 (1991)CrossRef Y. Medan, E. Yair, D. Chazan: Super resolution pitch determination of speech signals, IEEE Trans. Signal Process. 39, 40-48 (1991)CrossRef
10.33.
go back to reference M.R. Weiss, R.P. Vogel, C.M. Harris: Implementation of a pitch-extractor of the double spectrum analysis type, J. Acoust. Soc. Am. 40, 657-662 (1966)CrossRef M.R. Weiss, R.P. Vogel, C.M. Harris: Implementation of a pitch-extractor of the double spectrum analysis type, J. Acoust. Soc. Am. 40, 657-662 (1966)CrossRef
10.34.
go back to reference H. Indefrey, W.J. Hess, G. Seeser: Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain, Proc. IEEE ICASSP, Vol. 2 (1985), paper 11.12 H. Indefrey, W.J. Hess, G. Seeser: Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain, Proc. IEEE ICASSP, Vol. 2 (1985), paper 11.12
10.35.
go back to reference P. Martin: Comparison of pitch detection by cepstrum and spectral comb analysis, Proc. IEEE ICASSP (1982) pp. 180-183 P. Martin: Comparison of pitch detection by cepstrum and spectral comb analysis, Proc. IEEE ICASSP (1982) pp. 180-183
10.36.
go back to reference V.T. Sreenivas: Pitch estimation of aperiodic and noisy speech signals (Indian Institute of Technology, Bombay 1982), Diss., Department of Electrical Engineering, Indian Institute of Technology V.T. Sreenivas: Pitch estimation of aperiodic and noisy speech signals (Indian Institute of Technology, Bombay 1982), Diss., Department of Electrical Engineering, Indian Institute of Technology
10.37.
go back to reference M.R. Schroeder: Period histogram and product spectrum: new methods for fundamental-frequency measurement, J. Acoust. Soc. Am. 43, 819-834 (1968) M.R. Schroeder: Period histogram and product spectrum: new methods for fundamental-frequency measurement, J. Acoust. Soc. Am. 43, 819-834 (1968)
10.38.
go back to reference P. Martin: A logarithmic spectral comb method for fundamental frequency analysis, Proc. 11th Int. Congr. on Phonetic Sciences Tallinn (1987), paper 59.2 P. Martin: A logarithmic spectral comb method for fundamental frequency analysis, Proc. 11th Int. Congr. on Phonetic Sciences Tallinn (1987), paper 59.2
10.39.
go back to reference P. Martin: WinPitchPro - a tool for text to speech alignment and prosodic analysis, Proc. Speech Prosody 2004 (2004) pp. 545-548, http://www.isca-speech.org/archive/sp2004 and http://www.winpitch.com P. Martin: WinPitchPro - a tool for text to speech alignment and prosodic analysis, Proc. Speech Prosody 2004 (2004) pp. 545-548, http://​www.​isca-speech.​org/​archive/​sp2004 and http://www.winpitch.com
10.40.
go back to reference J.C. Brown, M. Puckette: A high-resolution fundamental frequency determination based on phase changes of the Fourier transform, J. Acoust. Soc. Am. 94, 662-667 (1993)CrossRef J.C. Brown, M. Puckette: A high-resolution fundamental frequency determination based on phase changes of the Fourier transform, J. Acoust. Soc. Am. 94, 662-667 (1993)CrossRef
10.41.
go back to reference J.C. Brown: Musical fundamental frequency tracking using a pattern recognition method, J. Acoust. Soc. Am. 92, 1394-1402 (1992)CrossRef J.C. Brown: Musical fundamental frequency tracking using a pattern recognition method, J. Acoust. Soc. Am. 92, 1394-1402 (1992)CrossRef
10.42.
go back to reference F. Charpentier: Pitch detection using the short-term phase spectrum, Proc. IEEE ICASSP (1986) pp. 113-116 F. Charpentier: Pitch detection using the short-term phase spectrum, Proc. IEEE ICASSP (1986) pp. 113-116
10.43.
go back to reference M. Lahat, R.J. Niederjohn, D.A. Krubsack: A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. Acoust. Speech Signal Process. 35, 741-750 (1987)CrossRef M. Lahat, R.J. Niederjohn, D.A. Krubsack: A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE Trans. Acoust. Speech Signal Process. 35, 741-750 (1987)CrossRef
10.44.
go back to reference B. Doval, X. Rodet: Estimation of fundamental frequency of musical sound signals, Proc. IEEE ICASSP (1991) pp. 3657-3660 B. Doval, X. Rodet: Estimation of fundamental frequency of musical sound signals, Proc. IEEE ICASSP (1991) pp. 3657-3660
10.45.
go back to reference T. Abe, K. Kobayashi, S. Imai: Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, Proc. ICSLPʼ96 (1996) pp. 1277-1280, http://www.isca-speech.org/archive/icslp_1996 T. Abe, K. Kobayashi, S. Imai: Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency, Proc. ICSLPʼ96 (1996) pp. 1277-1280, http://​www.​isca-speech.​org/​archive/​icslp_​1996
10.46.
go back to reference T. Nakatani, T. Irino: Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am. 116, 3690-3700 (2004)CrossRef T. Nakatani, T. Irino: Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am. 116, 3690-3700 (2004)CrossRef
10.47.
go back to reference L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative study of several pitch detection algorithms, IEEE Trans. Acoust. Speech 24, 399-423 (1976)CrossRef L.R. Rabiner, M.J. Cheng, A.E. Rosenberg, C.A. McGonegal: A comparative study of several pitch detection algorithms, IEEE Trans. Acoust. Speech 24, 399-423 (1976)CrossRef
10.48.
go back to reference L. Arévalo: Beiträge zur Schätzung der Frequenz gestörter Schwingungen kurzer Dauer und eine Anwendung auf die Analyse von Sprachsignalen (Ruhr-Universität, Bochum 1991), Diss. in German L. Arévalo: Beiträge zur Schätzung der Frequenz gestörter Schwingungen kurzer Dauer und eine Anwendung auf die Analyse von Sprachsignalen (Ruhr-Universität, Bochum 1991), Diss. in German
10.49.
go back to reference A.M. Noll, A. Michael: Pitch determination of human speech by the harmonic product spectrum the harmonic sum spectrum and a maximum likelihood estimate, Symp. Comput. Process. Commun. 19, 779-797 (1970), ed. by the Microwave Inst., New York: Univ. of Brooklyn Press A.M. Noll, A. Michael: Pitch determination of human speech by the harmonic product spectrum the harmonic sum spectrum and a maximum likelihood estimate, Symp. Comput. Process. Commun. 19, 779-797 (1970), ed. by the Microwave Inst., New York: Univ. of Brooklyn Press
10.50.
go back to reference D.H. Friedman: Pseudo-maximum-likelihood speech pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 25, 213-221 (1977)CrossRef D.H. Friedman: Pseudo-maximum-likelihood speech pitch extraction, IEEE Trans. Acoust. Speech Signal Process. 25, 213-221 (1977)CrossRef
10.51.
go back to reference R.J. McAulay, T.F. Quatieri: Pitch estimation and voicing detection based on a sinusoidal speech model, Proc. IEEE ICASSP (1990) pp. 249-252 R.J. McAulay, T.F. Quatieri: Pitch estimation and voicing detection based on a sinusoidal speech model, Proc. IEEE ICASSP (1990) pp. 249-252
10.52.
go back to reference A. Moreno, J.A.R. Fonollosa: Pitch determination of noisy speech using higher order statistics, Proc. IEEE ICASSP (1992) pp. 133-136 A. Moreno, J.A.R. Fonollosa: Pitch determination of noisy speech using higher order statistics, Proc. IEEE ICASSP (1992) pp. 133-136
10.53.
go back to reference B.B. Wells: Voiced/Unvoiced decision based on the bispectrum, Proc. IEEE ICASSP (1985) pp. 1589-1592 B.B. Wells: Voiced/Unvoiced decision based on the bispectrum, Proc. IEEE ICASSP (1985) pp. 1589-1592
10.54.
go back to reference J. Tabrikian, S. Dubnov, Y. Dickalov: Speech enhancement by harmonic modeling via MAP pitch tracking, Proc. IEEE ICASSP (2002) pp. 3316-3319 J. Tabrikian, S. Dubnov, Y. Dickalov: Speech enhancement by harmonic modeling via MAP pitch tracking, Proc. IEEE ICASSP (2002) pp. 3316-3319
10.55.
go back to reference S. Godsill, M. Davy: Bayesian harmonic models for musical pitch estimation and analysis, Proc. IEEE ICASSP (2002) pp. 1769-1772 S. Godsill, M. Davy: Bayesian harmonic models for musical pitch estimation and analysis, Proc. IEEE ICASSP (2002) pp. 1769-1772
10.56.
go back to reference C.A. McGonegal, L.R. Rabiner, A.E. Rosenberg: A subjective evaluation of pitch detection methods using LPC synthesized speech, IEEE Trans. Acoust. Speech Signal Process. 25, 221-229 (1977)CrossRef C.A. McGonegal, L.R. Rabiner, A.E. Rosenberg: A subjective evaluation of pitch detection methods using LPC synthesized speech, IEEE Trans. Acoust. Speech Signal Process. 25, 221-229 (1977)CrossRef
10.57.
go back to reference C. Hamon, E. Moulines, F. Charpentier: A diphone synthesis system based on time-domain prosodic modifications of speech, Proc. IEEE ICASSP (1989) pp. 238-241 C. Hamon, E. Moulines, F. Charpentier: A diphone synthesis system based on time-domain prosodic modifications of speech, Proc. IEEE ICASSP (1989) pp. 238-241
10.58.
go back to reference D.M. Howard: Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am. 86, 902-910 (1989)CrossRef D.M. Howard: Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am. 86, 902-910 (1989)CrossRef
10.59.
go back to reference I. Dologlou, G. Carayannis: Pitch detection based on zero-phase filtering, Speech Commun. 8, 309-318 (1989)CrossRef I. Dologlou, G. Carayannis: Pitch detection based on zero-phase filtering, Speech Commun. 8, 309-318 (1989)CrossRef
10.60.
go back to reference W.J. Hess: An algorithm for digital time-domain pitch period determination of speech signals and its application to detect F0 dynamics in VCV utterances, Proc. IEEE ICASSP (1976) pp. 322-325 W.J. Hess: An algorithm for digital time-domain pitch period determination of speech signals and its application to detect F0 dynamics in VCV utterances, Proc. IEEE ICASSP (1976) pp. 322-325
10.61.
go back to reference T.V. Ananthapadmanabha, B. Yegnanarayana: Epoch extraction of voiced speech, IEEE Trans. Acoust. Speech Signal Process. 23, 562-569 (1975)CrossRef T.V. Ananthapadmanabha, B. Yegnanarayana: Epoch extraction of voiced speech, IEEE Trans. Acoust. Speech Signal Process. 23, 562-569 (1975)CrossRef
10.62.
go back to reference L.O. Dolanský: An instantaneous pitch-period indicator, J. Acoust. Soc. Am. 27, 67-72 (1955)CrossRef L.O. Dolanský: An instantaneous pitch-period indicator, J. Acoust. Soc. Am. 27, 67-72 (1955)CrossRef
10.63.
go back to reference I.S. Howard, J.R. Walliker: The implementation of a portable real-time multilayer-perceptron speech fundamental period estimator, Proc. EUROSPEECH-89 (1989) pp. 206-209, http://www.isca-speech.org/archive/eurospeech_1989 I.S. Howard, J.R. Walliker: The implementation of a portable real-time multilayer-perceptron speech fundamental period estimator, Proc. EUROSPEECH-89 (1989) pp. 206-209, http://​www.​isca-speech.​org/​archive/​eurospeech_​1989
10.64.
go back to reference W.J. Hess: A pitch-synchronous digital feature extraction system for phonemic recognition of speech, IEEE Trans. Acoust. Speech Signal Process. 24, 14-25 (1976)CrossRef W.J. Hess: A pitch-synchronous digital feature extraction system for phonemic recognition of speech, IEEE Trans. Acoust. Speech Signal Process. 24, 14-25 (1976)CrossRef
10.65.
go back to reference A. Davis, S. Nordholm, R. Togneri: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold, IEEE Trans. Audio Speech Lang. Process. 14, 412-424 (2006)CrossRef A. Davis, S. Nordholm, R. Togneri: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold, IEEE Trans. Audio Speech Lang. Process. 14, 412-424 (2006)CrossRef
10.66.
go back to reference L.J. Siegel, A.C. Bessey: Voiced/unvoiced/mixed excitation classification of speech, IEEE Trans. Acoust. Speech Signal Process. 30, 451-461 (1982)CrossRef L.J. Siegel, A.C. Bessey: Voiced/unvoiced/mixed excitation classification of speech, IEEE Trans. Acoust. Speech Signal Process. 30, 451-461 (1982)CrossRef
10.67.
go back to reference S. Ahmadi, A.S. Spanias: Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech Audio Process. 7, 333-338 (1999)CrossRef S. Ahmadi, A.S. Spanias: Cepstrum-based pitch detection using a new statistical V/UV classification algorithm, IEEE Trans. Speech Audio Process. 7, 333-338 (1999)CrossRef
10.68.
go back to reference B.M. Lobanov, M. Boris: Automatic discrimination of noisy and quasi periodic speech sounds by the phase plane method, Soviet Physics - Acoustics 16, 353-356 (1970) Original (in Russian) in Akusticheskiy Zhurnal 16, 425-428 (1970) B.M. Lobanov, M. Boris: Automatic discrimination of noisy and quasi periodic speech sounds by the phase plane method, Soviet Physics - Acoustics 16, 353-356 (1970) Original (in Russian) in Akusticheskiy Zhurnal 16, 425-428 (1970)
10.69.
go back to reference E. Fisher, J. Tabrikian, S. Dubnov: Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Trans. Audio Speech Lang. Process. 14, 502-510 (2006)CrossRef E. Fisher, J. Tabrikian, S. Dubnov: Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Trans. Audio Speech Lang. Process. 14, 502-510 (2006)CrossRef
10.70.
go back to reference B.S. Atal, L.R. Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 24, 201-212 (1976)CrossRef B.S. Atal, L.R. Rabiner: A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 24, 201-212 (1976)CrossRef
10.71.
go back to reference O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Acoust. Speech Signal Process. 16, 68-72 (1968) O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Acoust. Speech Signal Process. 16, 68-72 (1968)
10.72.
go back to reference A.K. Krishnamurthy, D.G. Childers: Two-channel speech analysis, IEEE Trans. Acoust Speech Signal Process. 34, 730-743 (1986)CrossRef A.K. Krishnamurthy, D.G. Childers: Two-channel speech analysis, IEEE Trans. Acoust Speech Signal Process. 34, 730-743 (1986)CrossRef
10.73.
go back to reference K.N. Stevens, D.N. Kalikow, T.R. Willemain: A miniature accelerometer for detecting glottal waveforms and nasalization, J. Speech Hear. Res. 18, 594-599 (1975)CrossRef K.N. Stevens, D.N. Kalikow, T.R. Willemain: A miniature accelerometer for detecting glottal waveforms and nasalization, J. Speech Hear. Res. 18, 594-599 (1975)CrossRef
10.74.
go back to reference V.R. Viswanathan, W.H. Russell: Subjective and objective evaluation of pitch extractors for LPC and harmonic-deviations vocoders (Bolt Beranek and Newman, Cambridge 1984), MA: Report No. 5726 V.R. Viswanathan, W.H. Russell: Subjective and objective evaluation of pitch extractors for LPC and harmonic-deviations vocoders (Bolt Beranek and Newman, Cambridge 1984), MA: Report No. 5726
10.75.
go back to reference A.J. Fourcin, E. Abberton: First applications of a new laryngograph, Med Biol Illust 21, 172-182 (1971) A.J. Fourcin, E. Abberton: First applications of a new laryngograph, Med Biol Illust 21, 172-182 (1971)
10.76.
go back to reference D.G. Childers, M. Hahn, J.N. Larar: Silent and voiced/Unvoiced/Mixed excitation (four-way) classification of speech, IEEE Trans. Acoust. Speech Signal Process. 37, 1771-1774 (1989)CrossRef D.G. Childers, M. Hahn, J.N. Larar: Silent and voiced/Unvoiced/Mixed excitation (four-way) classification of speech, IEEE Trans. Acoust. Speech Signal Process. 37, 1771-1774 (1989)CrossRef
10.77.
go back to reference E. Mousset, W.A. Ainsworth, J.A.R. Fonollosa: A comparison of several recent methods of fundamental frequency and voicing decision estimation, Proc. ICSLPʼ96 (1996) pp. 1273-1276, http://www.isca-speech.org/archive/icslp_1996 E. Mousset, W.A. Ainsworth, J.A.R. Fonollosa: A comparison of several recent methods of fundamental frequency and voicing decision estimation, Proc. ICSLPʼ96 (1996) pp. 1273-1276, http://​www.​isca-speech.​org/​archive/​icslp_​1996
10.78.
go back to reference D.A. Krubsack, R.J. Niederjohn: An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech, IEEE Trans. Signal Process. 39, 319-329 (1991)CrossRef D.A. Krubsack, R.J. Niederjohn: An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech, IEEE Trans. Signal Process. 39, 319-329 (1991)CrossRef
10.79.
go back to reference Y. Xu, X. Sun: Maximum speed of pitch change and how it may relate to speech, J. Acoust. Soc. Am. 111, 1399-1413 (2002)CrossRef Y. Xu, X. Sun: Maximum speed of pitch change and how it may relate to speech, J. Acoust. Soc. Am. 111, 1399-1413 (2002)CrossRef
10.80.
go back to reference B.G. Secrest, G.R. Doddington: Postprocessing techniques for voice pitch trackers, Proc. IEEE ICASSP (1982) pp. 172-175 B.G. Secrest, G.R. Doddington: Postprocessing techniques for voice pitch trackers, Proc. IEEE ICASSP (1982) pp. 172-175
10.81.
go back to reference F. Plante, G.F. Meyer, W.A. Ainsworth: A pitch extraction reference database, Proc. EUROSPEECHʼ95 (1995) pp. 837-840, http://www.isca-speech.org/archive/eurospeech_1995 F. Plante, G.F. Meyer, W.A. Ainsworth: A pitch extraction reference database, Proc. EUROSPEECHʼ95 (1995) pp. 837-840, http://​www.​isca-speech.​org/​archive/​eurospeech_​1995
10.82.
go back to reference H. Kawahara, H. Katayose, A. de Cheveigné, R.D. Patterson: Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. EUROSPEECHʼ99 (1999) pp. 2781-2784, http://www.isca-speech.org/archive/eurospeech_1999 H. Kawahara, H. Katayose, A. de Cheveigné, R.D. Patterson: Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. EUROSPEECHʼ99 (1999) pp. 2781-2784, http://​www.​isca-speech.​org/​archive/​eurospeech_​1999
10.83.
go back to reference L.R. Rabiner, M.R. Sambur, C.E. Schmidt: Applications of nonlinear smoothing algorithm to speech processing, IEEE Trans. Acoust. Speech Signal Process. 23, 552-557 (1975)CrossRef L.R. Rabiner, M.R. Sambur, C.E. Schmidt: Applications of nonlinear smoothing algorithm to speech processing, IEEE Trans. Acoust. Speech Signal Process. 23, 552-557 (1975)CrossRef
10.84.
go back to reference P. Specker: A powerful postprocessing algorithm for time-domain pitch trackers, Proc. IEEE ICASSP (1984), paper 28B.2 P. Specker: A powerful postprocessing algorithm for time-domain pitch trackers, Proc. IEEE ICASSP (1984), paper 28B.2
10.85.
go back to reference F. Itakura: Minimum prediction residual applied to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 23, 67-72 (1975)CrossRef F. Itakura: Minimum prediction residual applied to speech recognition, IEEE Trans. Acoust. Speech Signal Process. 23, 67-72 (1975)CrossRef
10.86.
go back to reference Y.R. Wang, I.J. Wong, T.C. Tsao: A statistical pitch detection algorithm, Proc. IEEE ICASSP (2002) pp. 357-360 Y.R. Wang, I.J. Wong, T.C. Tsao: A statistical pitch detection algorithm, Proc. IEEE ICASSP (2002) pp. 357-360
10.87.
go back to reference Y. Sagisaka, N. Campbell, N. Higuchi (eds.): Computing prosody. Computational models for processing spontaneous speech (Springer, New York 1996)MATH Y. Sagisaka, N. Campbell, N. Higuchi (eds.): Computing prosody. Computational models for processing spontaneous speech (Springer, New York 1996)MATH
10.88.
go back to reference P. Bagshaw: Automatic prosodic analysis for computer aided pronunciation teaching (Univ. of Edinburgh, Edinburgh 1993), PhD Thesis http://www.cstr.ed.ac.uk/projects/fda/Bagshaw_PhDThesis.pdf P. Bagshaw: Automatic prosodic analysis for computer aided pronunciation teaching (Univ. of Edinburgh, Edinburgh 1993), PhD Thesis http://​www.​cstr.​ed.​ac.​uk/​projects/​fda/​Bagshaw_​PhDThesis.​pdf
10.89.
go back to reference R.J. Baken: Clinical Measurement of Speech and Voice (Taylor Francis, London 1987) R.J. Baken: Clinical Measurement of Speech and Voice (Taylor Francis, London 1987)
10.90.
go back to reference A. Askenfelt: Automatic notation of played music: The Visa project, Fontes Artis Musicae 26, 109-120 (1979) A. Askenfelt: Automatic notation of played music: The Visa project, Fontes Artis Musicae 26, 109-120 (1979)
10.91.
go back to reference Y.M. Cheng, D. OʼShaughnessy: Automatic and reliable estimation of glottal closure instant and period, IEEE Trans. Acoust. Speech Signal Process. 37, 1805-1815 (1989)CrossRef Y.M. Cheng, D. OʼShaughnessy: Automatic and reliable estimation of glottal closure instant and period, IEEE Trans. Acoust. Speech Signal Process. 37, 1805-1815 (1989)CrossRef
10.92.
go back to reference W.J. Hess: Determination of glottal excitation cycles in running speech, Phonetica 52, 196-204 (1995)CrossRef W.J. Hess: Determination of glottal excitation cycles in running speech, Phonetica 52, 196-204 (1995)CrossRef
10.93.
go back to reference W.J. Hess: Pitch determination of acoustic signals - an old problem and new challenges, Proc. 18th Intern. Congress on Acoustics, Kyoto (2004), paper Tu2.H.1 W.J. Hess: Pitch determination of acoustic signals - an old problem and new challenges, Proc. 18th Intern. Congress on Acoustics, Kyoto (2004), paper Tu2.H.1
10.94.
go back to reference B. Yegnanarayana, R. Smits: A robust method for determining instants of major excitations in voiced speech, Proc. IEEE ICASSP (1995) pp. 776-779 B. Yegnanarayana, R. Smits: A robust method for determining instants of major excitations in voiced speech, Proc. IEEE ICASSP (1995) pp. 776-779
10.95.
go back to reference M. Brookes, P.A. Naylor, J. Gudnason: A quantitative assessment of group delay methods for identifying glottal closures in voiced speech, IEEE Trans. Audio Speech Language Process. 14, 456-466 (2006)CrossRef M. Brookes, P.A. Naylor, J. Gudnason: A quantitative assessment of group delay methods for identifying glottal closures in voiced speech, IEEE Trans. Audio Speech Language Process. 14, 456-466 (2006)CrossRef
10.96.
go back to reference C.X. Ma, Y. Kamp, L.F. Willems: A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process. 2, 258-265 (1994)CrossRef C.X. Ma, Y. Kamp, L.F. Willems: A Frobenius norm approach to glottal closure detection from the speech signal, IEEE Trans. Speech Audio Process. 2, 258-265 (1994)CrossRef
10.98.
go back to reference K.E. Barner: Colored L-ℓ filters and their application in speech pitch detection, IEEE Trans. Signal Process. 48, 2601-2606 (2000)CrossRef K.E. Barner: Colored L-ℓ filters and their application in speech pitch detection, IEEE Trans. Signal Process. 48, 2601-2606 (2000)CrossRef
10.99.
go back to reference J.L. Navarro-Mesa, I. Esquerra-Llucià: A time-frequency approach to epoch detection, Proc. EUROSPEECHʼ95 (1995) pp. 405-408, http://www.isca-speech.org/archive/eurospeech_1995 J.L. Navarro-Mesa, I. Esquerra-Llucià: A time-frequency approach to epoch detection, Proc. EUROSPEECHʼ95 (1995) pp. 405-408, http://​www.​isca-speech.​org/​archive/​eurospeech_​1995
10.100.
go back to reference A. de Cheveigné: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am. 93, 3279-3290 (1993)CrossRef A. de Cheveigné: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing, J. Acoust. Soc. Am. 93, 3279-3290 (1993)CrossRef
10.101.
go back to reference A.P. Klapuri: Signal processing methods for the automatic transcription of music (Tampere Univ. Technol., Tampere 2004), Ph.D. diss. http://www.cs.tut.fi/sgn/arg/klap/klap_phd.pdf A.P. Klapuri: Signal processing methods for the automatic transcription of music (Tampere Univ. Technol., Tampere 2004), Ph.D. diss. http://​www.​cs.​tut.​fi/​sgn/​arg/​klap/​klap_​phd.​pdf
10.102.
go back to reference M. Goto: A predominant F0-estimation method for polyphonic musical audio signals, Proc. 18th Intern. Congress on Acoustics Kyoto (2004), paper Tu2.H.4 M. Goto: A predominant F0-estimation method for polyphonic musical audio signals, Proc. 18th Intern. Congress on Acoustics Kyoto (2004), paper Tu2.H.4
10.103.
go back to reference T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8, 708-716 (2000)CrossRef T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8, 708-716 (2000)CrossRef
10.104.
go back to reference H. Kameoka, T. Nishimoto, S. Sagayama: Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds, Proc. IEEE ICASSP (2004), paper AE-P5.9 H. Kameoka, T. Nishimoto, S. Sagayama: Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds, Proc. IEEE ICASSP (2004), paper AE-P5.9
10.105.
go back to reference A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, 1-38 (1977)MathSciNetMATH A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39, 1-38 (1977)MathSciNetMATH
10.106.
go back to reference L. Yoo, I. Fujinaga: A comparative latency study of hardware and software pitch-trackers, Proc. 1999 Int. Computer Music Conf. (1999) pp. 36-40 L. Yoo, I. Fujinaga: A comparative latency study of hardware and software pitch-trackers, Proc. 1999 Int. Computer Music Conf. (1999) pp. 36-40
Metadata
Title
Pitch and Voicing Determination of Speech with an Extension Toward Music Signals
Author
Wolfgang J. Hess, Prof.
Copyright Year
2008
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_10