2011  OriginalPaper  Buchkapitel
Tipp
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Erschienen in:
Speech Spectrum Analysis
Up to this point, I have presented speech analysis methods which obtain spectral or time–frequency information from the signal data directly by means of some kind of transformation. It has been shown how different processing schemes can extract such information from the signal in different ways, but none has relied on any special assumptions about a speech signal beyond the most general and widelyheld sort. In this chapter, I introduce an entirely different approach to what is often called “spectral estimation,” in which the signal is explicitly assumed to conform to the outlines of a model. The parameters of the assumed model are then estimated from the signal data, and the values of the parameters are used as a kind of proxy estimate of corresponding signal properties.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anzeige
When a sound object in the list is selected, you will see a button for “Formants & LPC” which accesses all LP analysis features of Praat. Depressing this button brings a popup menu with the LP options. Praat can compute a raw LPC object, using any one of four algorithms which are all essentially equivalent for pitchasynchronous analysis. Praat is only useful for pitchasynchronous LP analysis, since it forces the use of a Gaussian taper on the analysis window. Also on the menu is a “To Formant” function, which further automates some of the steps in LP formant estimation.
It may be necessary to downsample the sound object before using an LPC function. As mentioned above, for a good LP model of sonorant speech sounds, the sampling rate should be set to 10 kHz for males and around 11 kHz for female speakers. Resampling is available in Praat when a sound object is selected, as one of the functions accessible by the “Convert” button. Upon selecting one of the functions “To LPC,” a dialog opens which allows you to set the prediction order (number of poles), the analysis window length (automatically doubled with a Gaussian taper applied), the time step between successive LP coefficient computations, and the frequency at which preemphasis begins. Setting the latter to some frequency higher than the Nyquist frequency will turn off preemphasis (e.g. 6,000 Hz for a sound sampled at 10 kHz).
The result of the above LPC function will be an LPC object in the list, which represents filter coefficients as a function of time, stored in successive frames with a constant sampling period [
4]. It is important to remember the actual window length (double what was entered above), the time step, and preemphasis frequency in order to be able to work with an LPC object effectively. The LPC object can be acted upon in a number of ways, not all of which I can address here. The actual coefficients can be accessed for each analysis window using the “Inspect” button below the object list. The function “To Spectrum” will compute a spectrum object from the LPC object; it is important to enter the deemphasis frequency equal to whatever preemphasis frequency was entered before. The time point of the analysis window within the LPC object that is desired for the spectral slice must also be entered. Leaving the value set to zero will choose the first analysis frame. Once a spectrum object has been created, it can be viewed using “Edit” or drawn as a graph in the picture area, just as with a Fourier power spectrum object.
Starting from an LPC object, there is also the function “To Formant,” which is useful for getting the frequency and bandwidth values of the various resonances determined by the LP coefficients. The function yields a formant object added to the list, which provides frequency and bandwidth information (by root solving) about discovered formants for each analysis frame in the corresponding LPC object. The formant values can be read for each successive analysis frame by first computing the Formant object and then using the “Inspect” button. The formant function tries to heuristically eliminate certain resonances which are not peaks; if every resonance and its bandwidth is desired (which will include any shaping resonances that are not peaks), use “To Formant (keep all)” instead.
Starting from a sound object which has not been downsampled, it is also possible to bypass some of the details of LP analysis by selecting “To Formant” from the “Formants & LPC” functions. This function brings up a dialog requiring you to enter the number of formants sought or expected; the number entered will simply be doubled to determine the number of poles for the analysis. From the discussion in this chapter, it can be gleaned that this is not generally the optimal number of poles, since something like two additional poles should be included in order to model the lip radiation and glottal production filters. Therefore, you might try increasing the number of formants you enter by one, to get a better LP model. The dialog also asks for the maximum frequency of the formants, which should be 5,500 Hz for a female and 5,000 Hz for a male. The function will then automatically resample the sound before computing the LP coefficients and formant properties therefrom. Once again, the analysis window length (automatically doubled by Praat) and the desired starting frequency for any preemphasis should also be entered in this dialog. The result of this function will be a formant object, as described above.
Readers who wish to use the supplied mfiles for this chapter will need to have installed the Signal Processing Toolbox for Matlab, which costs extra. Some of the routines (in particular, all of the ARMA routines) also rely on two free toolboxes, Higher Order Spectral Analysis and ARMASA, both available through the Matlab Central file exchange service. Rather than using the functions in these toolboxes directly, I have written “wrapper” functions which in turn call functions from the toolboxes, and in this way the output and plots precisely suit the purposes to which they have been put here.
The three functions to be described are intended for LP or ARMA modeling of a “single slice” of a speech signal. The signal provided to the functions for pitchasynchronous analysis should be around 20–60 ms in length, not windowed. Praat is useful for cutting out the desired slice of a speech sound. For closedphase analysis, the provided signal should instead comprise a single glottal cycle from beginning to end. Praat can also be used to downsample the signal in preparation for the parametric procedures provided here. For modeling of the vocal tract resonances in sonorant speech sounds, male speakers should be resampled at 10 kHz, and females at around 11 kHz.
The main function for computing a linear prediction model from a signal and plotting its power spectrum is
lpcspectrum.m , which is called using the following template:
The argument
signal is a vectorized signal that need not have had a tapered window applied,
order is the number of LP coefficients in the model, and
Fs is the signal sampling rate. The argument
taper should be entered as
‘yes’ or
‘no’ (including the single quotes, per Matlab syntax). The argument
alg should be entered as one of
‘autocor’, ‘cov’, ‘burg’ , or
‘rcest’ . These values will compute the LP model using, respectively, the autocorrelation method, the covariance method, the Burg method, and the thirdorder cumulantbased method of [
16]. For computing an LP model using the closed phase of a glottal cycle, set
alg to covariance with no taper applied. All pitchasynchronous models should have tapering applied, unless the provided signal was already tapered by other software. The argument
linestyle is useful for overlaying plots with different lines; it should be one of the Matlab LineStyle property specifiers, such as
‘’ for a plain line or
‘:’ for a dotted line.
When lpcspectrum is called, a set of LP coefficients is computed using the specified algorithm, the resulting model power spectrum is plotted, and a number of peak frequencies are automatically picked using a routine from the HOSA toolbox and reported in the Matlab command window. The desired number of peaks to be sought is set in the code, as the value of
npeaks , which equals 6 by default. The discovered peaks and their power amplitudes can also be returned using the so named output variables in the template, which are optional. At the conclusion of the routine, the user is asked whether to overlay the Fourier power spectrum, where the answer ’h’ will not overlay but will “hold” the plot so that future runs through the routine will overlay the next LP spectrum. This is useful for comparing spectra of LP models with differing numbers of poles, for instance.
The function
lpcroots.m is used for computing an LP model and getting the resonances by solving the polynomial roots, instead of peakpicking a spectrum. It is invoked using the following template:
where the arguments have the same meaning as with the lpcspectrum routine above. Running the function automatically prints all solved roots (positive and negative) as frequency values together with bandwidths of the resonances. The negativefrequency roots simply repeat the information contained in the positive roots. If desired, a twocolumn matrix of resonance frequencies and bandwidths can be output using the variable of that name. The variable named
amps will return a vector of the power amplitudes of the resonances; as usual the routine can be called without using output variables.
The main function for ARMA modeling provided is
armaspectrum.m , and it combines the features of the above LP routines. The function is invoked using the following template:
in which the arguments have the same meaning as in the LP routines above, except that now two orders (for the autoregressive part and the moving average part of the model) have to be specified, giving the number of poles and zeros respectively. The argument
alg should be either
‘rts’ (for the HOSA toolbox algorithm) or
‘ARMASA’ . Running the function produces a plot of the model power spectrum, although the gain has been estimated using the LP gain factor which is known to be incorrect. The frequencies of all poles (with their bandwidths) and zeros found by rootsolving are automatically printed to the command window. The output variable
formants returns a twocolumn matrix of resonance frequencies and bandwidths,
amps returns a vector of the resonance amplitudes, and
zeros returns a vector of the frequencies of spectral zeros.
$$ {\texttt{[peaks, amps] = lpcspectrum(signal, order, Fs, taper, alg, linestyle)}} $$
$$ {\texttt{[output, amps] =lpcroots(signal, order, Fs, taper, alg)}} $$
$$ {\texttt{[formants, amps, zeros] = armaspectrum(signal, arorder, maorder, Fs, alg)}} $$
1
2
3
4
5
The expressions in the numerator and denominator are not strictly polynomials, but are actually a more general sort of creature called a
Laurent polynomial, in which the indeterminate
\(z\) is allowed to take negative powers.
The view of LP analysis as a measurement tool is fraught with difficulty; should a scientist accept a measurement from a tool that would provide “wrong” measurements if it were not specially set up using partial foreknowledge of the desired outcome?
It was Paul Boersma who pointed out this important fact to me.
The discussion in the previous section clarifies how this systematic difference is not really an error.
A spurious peak in the spectrum has been introduced at 5 kHz, which is perhaps a result of resampling to 10 kHz.
1.
R.L. Allen, D.W. Mills,
Signal Analysis: Time, Frequency, Scale, and Structure (Wiley, New York, 2004)
2.
N. Andersen, On the calculation of filter coefficients for maximum entropy spectral analysis. Geophysics.
39, 69–72 (1974)
CrossRef
3.
B.S. Atal, S.L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am.
50(2 part 2), 637–655 (1971)
CrossRef
4.
P. Boersma, D. Weenink, Praat: Doing phonetics by computer. Comp. Softw. (2009)
5.
A. van den Bos, Alternative interpretation of maximum entropy spectral analysis. IEEE Trans. Inform. Theory.
17, 493–494 (1971)
CrossRef
6.
G.E.P. Box, G.M. Jenkins,
Time Series Analysis: Forecasting and Control, rev. edn. (HoldenDay, San Francisco, 1976)
7.
P.M.T. Broersen,
Automatic Autocorrelation and Spectral Analysis (Springer, Berlin, 2006)
8.
H.P. Bucker, Comparison of FFT and Prony algorithms for bearing estimation of narrowband signals in a realistic ocean environment. J. Acoust. Soc. Am.
61(3), 756–762 (1977)
CrossRef
9.
J.P. Burg, A new analysis technique for time series data. in
Modern Spectrum Analysis, ed. by D.G. Childers (IEEE Press, New York, 1978), pp. 42–49. Reprint of a paper presented at the NATO Advanced Study Institute on Signal Processing with Emphasis on Underwater Acoustics (1968)
10.
R.L. Christensen, W.J. Strong, E.P. Palmer, A comparison of three methods of extracting resonance information from predictorcoefficient coded speech. IEEE Trans. Acoust. Speech Sig. Proc.
24(1), 8–14 (1976)
CrossRef
11.
M.G. Di Benedetto, Vowel representation: some observations on temporal and spectral properties of the first formant frequency. J. Acoust. Soc. Am.
86(1), 55–66 (1989)
CrossRef
12.
13.
G. Fant,
Acoustic Theory of Speech Production (Mouton, The Hague, 1960) Reissued 1970
14.
J.L. Flanagan,
Speech Analysis, Synthesis and Perception, 2nd edn. (Springer, Berlin, 1972)
15.
S.A. Fulop, Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction. J. Acoust. Soc. Am.
127(4), 2114–2117 (2010)
CrossRef
16.
G.B. Giannakis, J.M. Mendel, Identification of nonminimum phase systems using higher order statistics. IEEE Trans. Acoust. Speech Sig. Proc.
37(3), 360–377 (1989)
MathSciNetMATHCrossRef
17.
P.R. Gutowski, E.A. Robinson, S. Treitel, Spectral estimation: fact or fiction. IEEE Trans. Geosci. Elect.
16(2), 80–84 (1978)
CrossRef
18.
R.W. Hamming,
Digital Filters. 3rd edn. (PrenticeHall, Englewood Cliffs, 1989)
19.
J. Hillenbrand, L.A. Getty, M.J. Clark, K. Wheeler, Acoustic characteristics of American English vowels. J. Acoust. Soc. Am.
97(5), 3099–3111 (1995)
CrossRef
20.
S.M. Kay, S.L. Marple Jr., Spectrum analysis—a modern perspective. Proc. IEEE.
69(11), 1380–1419 (1981)
CrossRef
21.
D. KewleyPort, C.S. Watson, Formantfrequency discrimination for isolated English vowels. J. Acoust. Soc. Am.
95(1), 485–496 (1994)
CrossRef
22.
J. Makhoul, Spectral analysis of speech by linear prediction. IEEE Trans. Audio Electroacoust.
21(3), 140–148 (1973)
CrossRef
23.
J. Makhoul, Spectral linear prediction: properties and applications. IEEE Trans. Acoust. Speech Sig. Proc.
23(3), 283–296 (1975)
MathSciNetCrossRef
24.
J.D. Markel, A.H. Gray Jr.,
Linear Prediction of Speech (Springer, Berlin, 1976)
25.
S.S. McCandless, An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust. Speech Sig. Proc.
22(2), 135–141 (1974)
CrossRef
26.
R.B. Monsen, A.M. Engebretson, The accuracy of formant frequency measurements: a comparison of spectrographic analysis and linear prediction. J. Speech Hearing Res.
26(3), 89–97 (1983)
27.
H. Morikawa, H. Fujisaki, System identification of the speech production process based on a statespace representation. IEEE Trans. Acoust. Speech Sig. Proc.
32(2), 252–262 (1984)
CrossRef
28.
National Instruments: LabVIEW 8.6 Advanced Signal Processing Toolkit Help (2008). Available online
29.
M.B. Priestley,
Spectral Analysis and Time Series, vol. 1. (Academic Press, London, 1981)
30.
J.G. Proakis, D.G. Manolakis,
Digital Signal Processing: Principles, Algorithms, and Applications, 2nd edn. 2nd edn. (Macmillan, New York, 1992)
31.
H.R. Radoski, E.J. Zawalick, P.F. Fougere, The superiority of maximum entropy power spectrum techniques applied to geomagnetic micropulsations. Phys. Earth Planet. Interiors.
12, 298–216 (1976)
CrossRef
32.
J.R. Ragazzini, L.A. Zadeh, Analysis of sampleddata systems. Trans. Am. Inst. Elec. Eng.
71, 225–234 (1952)
33.
K. Steiglitz, On the simultaneous estimation of poles and zeros in speech analysis. IEEE Trans. Acoust. Speech Sig. Proc.
25(3), 229–234 (1977)
CrossRef
34.
35.
W.W.S. Wei,
Time Series Analysis: Univariate and Multivariate Methods. (AddisonWesley, Redwood City, 1990)
MATH
36.
Wikipedia: atan2.
http://www.wikipedia.org (2010)
 Titel
 Linear Prediction and ARMA Spectrum Estimation
 DOI
 https://doi.org/10.1007/9783642174780_7
 Autor:

Sean A. Fulop
 Verlag
 Springer Berlin Heidelberg
 Sequenznummer
 7
 Kapitelnummer
 Chapter 7