Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2019 | OriginalPaper | Chapter

4. The Voice Signal and Its Information Content—1

Author: Rita Singh

Published in: Profiling Humans from their Voice

Publisher: Springer Singapore

share
SHARE

Abstract

The voice signal, like all sounds, is a pressure wave. The actions of the speaker’s vocal tract result in continuous variations of pressure in the air surrounding the speaker’s mouth. These pressure waves radiate outward from the speaker’s mouth and are sensed by the listener’s ear. The information in the voice signal is encoded in these time variations of air pressure. Any computer-based analysis of voice must first convert these variations into a sequence of numbers that the computer can operate upon. This requires transduction of the pressure wave into a sequence of numbers in a manner that assuredly retains most of the information in it with minimal distortion. From the perspective of the computer, this sequence of numbers now represents the voice signal. We refer to the sequence of numbers representing the voice signal as a “digital” signal, and the process of converting the pressure wave into it as “digitization.” Subsequent computational procedures must be performed on this digitized signal in order to derive information from it. The sequence of procedures followed for computer analysis of sounds is illustrated in Fig. 4.1a.
Footnotes
1
While non-uniform sample spacing is also possible, it is not general practice in sampling audio signals.
 
2
We can also obtain unambiguous and perfect reconstruction of signals with frequency components greater than half the sampling frequency, provided the signal is bandlimited, the limits on the frequencies of the signal are known, and the sampling frequency is carefully chosen to enable such reconstruction. Audio signals do not generally satisfy these conditions, however.
 
3
Note that although “Hz” is used here as a unit for sampling rate, in its strictest definition “Hz” refers to cycles-per-second with reference to waves of any kind.
 
4
In fact FFTs exist for signals of any length. Power-of-2 FFTs are preferred due to their simplicity.
 
5
The term “white” is analogous to the definition of white light, which has equal contribution from all wavelengths.
 
6
By “square integrable” we mean that \(\int ^\infty _{-\infty } \psi ^2(t)dt\) is finite.
 
Literature
1.
go back to reference Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc. Oppenheim, A. V., & Schafer, R. W. (1975). Digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
2.
go back to reference Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc. Rabiner, L. R., & Gold, B. (1975). Theory and application of digital signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
3.
go back to reference Smith, S. W. (1999). The scientist and engineer’s guide to digital signal processing (2nd ed.). USA: California Technical Publishing. Smith, S. W. (1999). The scientist and engineer’s guide to digital signal processing (2nd ed.). USA: California Technical Publishing.
4.
go back to reference Proakis, J. G. (2001). Digital signal processing: Principles, algorithms and applications. Pearson Education India. Proakis, J. G. (2001). Digital signal processing: Principles, algorithms and applications. Pearson Education India.
5.
go back to reference Wente, E. C., & Thuras, A. L. (1934). Loud speakers and microphones. Bell System Technical Journal, 13(2), 259–277. CrossRef Wente, E. C., & Thuras, A. L. (1934). Loud speakers and microphones. Bell System Technical Journal, 13(2), 259–277. CrossRef
6.
go back to reference Landau, H. J. (1967). Sampling, data transmission, and the Nyquist rate. Proceedings of the IEEE, 55(10), 1701–1706. CrossRef Landau, H. J. (1967). Sampling, data transmission, and the Nyquist rate. Proceedings of the IEEE, 55(10), 1701–1706. CrossRef
7.
go back to reference Vaidyanathan, P. P. (2001). Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(9), 1094–1109. MathSciNetCrossRef Vaidyanathan, P. P. (2001). Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(9), 1094–1109. MathSciNetCrossRef
8.
go back to reference Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Englewood Cliffs, New Jersey: Prentice-Hall Inc. Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
9.
go back to reference Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., et al. (1996). Stochastic processes (Vol. 2). New York: Wiley. Ross, S. M., Kelly, J. J., Sullivan, R. J., Perry, W. J., Mercer, D., Davis, R. M., et al. (1996). Stochastic processes (Vol. 2). New York: Wiley.
10.
go back to reference Marple, S. L., Jr. (1989). Digital spectral analysis: With applications. Prentice-Hall series in signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc. Marple, S. L., Jr. (1989). Digital spectral analysis: With applications. Prentice-Hall series in signal processing. Englewood Cliffs, New Jersey: Prentice-Hall Inc.
11.
go back to reference Burg, J. P. (1975). Maximum entropy spectral analysis. Doctoral dissertation, Stanford University, Palo Alto, California. Burg, J. P. (1975). Maximum entropy spectral analysis. Doctoral dissertation, Stanford University, Palo Alto, California.
12.
go back to reference Yao, Q., & Brockwell, P. J. (2006). Gaussian maximum likelihood estimation for ARMA models. I. Time series. Journal of Time Series Analysis, 27(6), 857–875. Yao, Q., & Brockwell, P. J. (2006). Gaussian maximum likelihood estimation for ARMA models. I. Time series. Journal of Time Series Analysis, 27(6), 857–875.
13.
go back to reference Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transactions on Computers, 100(1), 90–93. MathSciNetCrossRef Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transactions on Computers, 100(1), 90–93. MathSciNetCrossRef
14.
go back to reference Rao, K. R., & Yip, P. (2014). Discrete cosine transform: Algorithms, advantages, applications. Cambridge: Academic Press. Rao, K. R., & Yip, P. (2014). Discrete cosine transform: Algorithms, advantages, applications. Cambridge: Academic Press.
15.
go back to reference Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589. CrossRef
16.
go back to reference Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. CrossRef
17.
go back to reference Moore, B. C., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753. CrossRef Moore, B. C., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753. CrossRef
18.
go back to reference Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987). An efficient auditory filterbank based on the gammatone function. In The Meeting of the IOC Speech Group on Auditory Modelling at RSRE (Vol. 2(7)). Patterson, R. D., Nimmo-Smith, I., Holdsworth, J., & Rice, P. (1987). An efficient auditory filterbank based on the gammatone function. In The Meeting of the IOC Speech Group on Auditory Modelling at RSRE (Vol. 2(7)).
19.
go back to reference Brown, J. C. (1991). Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89(1), 425–434. CrossRef Brown, J. C. (1991). Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89(1), 425–434. CrossRef
20.
go back to reference De Moortel, I., Munday, S. A., & Hood, A. W. (2004). Wavelet analysis: The effect of varying basic wavelet parameters. Solar Physics, 222(2), 203–228. CrossRef De Moortel, I., Munday, S. A., & Hood, A. W. (2004). Wavelet analysis: The effect of varying basic wavelet parameters. Solar Physics, 222(2), 203–228. CrossRef
21.
go back to reference Daubechies, I. (1990). The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5), 961–1005. MathSciNetCrossRef Daubechies, I. (1990). The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5), 961–1005. MathSciNetCrossRef
22.
go back to reference Antonini, M., Barlaud, M., Mathieu, P., & Daubechies, I. (1992). Image coding using wavelet transform. IEEE Transactions on Image Processing, 1(2), 205–220. CrossRef Antonini, M., Barlaud, M., Mathieu, P., & Daubechies, I. (1992). Image coding using wavelet transform. IEEE Transactions on Image Processing, 1(2), 205–220. CrossRef
23.
go back to reference Chui, C. K. (1992). An introduction to wavelets; Wavelets: A tutorial in theory and applications. Wavelet analysis and its applications (Vols. 1, 2). San Diego, California; London, UK: Academic Press Inc.; Harcourt Brace Jovanovich Publishers. Chui, C. K. (1992). An introduction to wavelets; Wavelets: A tutorial in theory and applications. Wavelet analysis and its applications (Vols. 1, 2). San Diego, California; London, UK: Academic Press Inc.; Harcourt Brace Jovanovich Publishers.
24.
go back to reference Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. CrossRef Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. CrossRef
25.
go back to reference Nason, G. P., & Silverman, B. W. (1995). The stationary wavelet transform and some statistical applications. In R. R. Coifman & D. L. Donoho (Eds.), Wavelets and statistics (pp. 281–299). Lecture notes in statistics. New York: Springer. Nason, G. P., & Silverman, B. W. (1995). The stationary wavelet transform and some statistical applications. In R. R. Coifman & D. L. Donoho (Eds.), Wavelets and statistics (pp. 281–299). Lecture notes in statistics. New York: Springer.
Metadata
Title
The Voice Signal and Its Information Content—1
Author
Rita Singh
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-8403-5_4