main-content

## Swipe to navigate through the chapters of this book

Published in:

2019 | OriginalPaper | Chapter

# 4. The Voice Signal and Its Information Content—1

Author: Rita Singh

Published in:

Publisher: Springer Singapore

## Abstract

The voice signal, like all sounds, is a pressure wave. The actions of the speaker’s vocal tract result in continuous variations of pressure in the air surrounding the speaker’s mouth. These pressure waves radiate outward from the speaker’s mouth and are sensed by the listener’s ear. The information in the voice signal is encoded in these time variations of air pressure. Any computer-based analysis of voice must first convert these variations into a sequence of numbers that the computer can operate upon. This requires transduction of the pressure wave into a sequence of numbers in a manner that assuredly retains most of the information in it with minimal distortion. From the perspective of the computer, this sequence of numbers now represents the voice signal. We refer to the sequence of numbers representing the voice signal as a “digital” signal, and the process of converting the pressure wave into it as “digitization.” Subsequent computational procedures must be performed on this digitized signal in order to derive information from it. The sequence of procedures followed for computer analysis of sounds is illustrated in Fig. 4.1a.
Footnotes
1
While non-uniform sample spacing is also possible, it is not general practice in sampling audio signals.

2
We can also obtain unambiguous and perfect reconstruction of signals with frequency components greater than half the sampling frequency, provided the signal is bandlimited, the limits on the frequencies of the signal are known, and the sampling frequency is carefully chosen to enable such reconstruction. Audio signals do not generally satisfy these conditions, however.

3
Note that although “Hz” is used here as a unit for sampling rate, in its strictest definition “Hz” refers to cycles-per-second with reference to waves of any kind.

4
In fact FFTs exist for signals of any length. Power-of-2 FFTs are preferred due to their simplicity.

5
The term “white” is analogous to the definition of white light, which has equal contribution from all wavelengths.

6
By “square integrable” we mean that $$\int ^\infty _{-\infty } \psi ^2(t)dt$$ is finite.

Literature
1.
Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
2.
Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
3.
Smith, S. W. (1999). The scientist and engineer’s guide to digital signal processing (2nd ed.). USA: California Technical Publishing.
4.
Proakis, J. G. (2001). Digital signal processing: Principles, algorithms and applications. Pearson Education India.
5.
Wente, E. C., & Thuras, A. L. (1934). Loud speakers and microphones. Bell System Technical Journal, 13(2), 259–277. CrossRef
6.
Landau, H. J. (1967). Sampling, data transmission, and the Nyquist rate. Proceedings of the IEEE, 55(10), 1701–1706. CrossRef
7.
Vaidyanathan, P. P. (2001). Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(9), 1094–1109.
8.
Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
9.
Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., et al. (1996). Stochastic processes (Vol. 2). New York: Wiley.
10.
Marple, S. L., Jr. (1989). Digital spectral analysis: With applications. Prentice-Hall series in signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
11.
Burg, J. P. (1975). Maximum entropy spectral analysis. Doctoral dissertation, Stanford University, Palo Alto, California.
12.
Yao, Q., & Brockwell, P. J. (2006). Gaussian maximum likelihood estimation for ARMA models. I. Time series. Journal of Time Series Analysis, 27(6), 857–875.
13.
Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transactions on Computers, 100(1), 90–93.
14.
Rao, K. R., & Yip, P. (2014). Discrete cosine transform: Algorithms, advantages, applications. Cambridge: Academic Press.
15.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef
16.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. CrossRef
17.
Moore, B. C., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753. CrossRef
18.
Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987). An efficient auditory filterbank based on the gammatone function. In The Meeting of the IOC Speech Group on Auditory Modelling at RSRE (Vol. 2(7)).
19.
Brown, J. C. (1991). Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89(1), 425–434. CrossRef
20.
De Moortel, I., Munday, S. A., & Hood, A. W. (2004). Wavelet analysis: The effect of varying basic wavelet parameters. Solar Physics, 222(2), 203–228. CrossRef
21.
Daubechies, I. (1990). The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5), 961–1005.
22.
Antonini, M., Barlaud, M., Mathieu, P., & Daubechies, I. (1992). Image coding using wavelet transform. IEEE Transactions on Image Processing, 1(2), 205–220. CrossRef
23.
Chui, C. K. (1992). An introduction to wavelets; Wavelets: A tutorial in theory and applications. Wavelet analysis and its applications (Vols. 1, 2). San Diego, California; London, UK: Academic Press Inc.; Harcourt Brace Jovanovich Publishers.
24.
Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. CrossRef
25.
Nason, G. P., & Silverman, B. W. (1995). The stationary wavelet transform and some statistical applications. In R. R. Coifman & D. L. Donoho (Eds.), Wavelets and statistics (pp. 281–299). Lecture notes in statistics. New York: Springer.