Speech Spectrum Analysis

Author: Sean A. Fulop

Publisher: Springer Berlin Heidelberg

Book Series : Signals and Communication Technology

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

The accurate determination of the speech spectrum, particularly for short frames, is commonly pursued in diverse areas including speech processing, recognition, and acoustic phonetics. With this book the author makes the subject of spectrum analysis understandable to a wide audience, including those with a solid background in general signal processing and those without such background. In keeping with these goals, this is not a book that replaces or attempts to cover the material found in a general signal processing textbook. Some essential signal processing concepts are presented in the first chapter, but even there the concepts are presented in a generally understandable fashion as far as is possible. Throughout the book, the focus is on applications to speech analysis; mathematical theory is provided for completeness, but these developments are set off in boxes for the benefit of those readers with sufficient background. Other readers may proceed through the main text, where the key results and applications will be presented in general heuristic terms, and illustrated with software routines and practical "show-and-tell" discussions of the results. At some points, the book refers to and uses the implementations in the Praat speech analysis software package, which has the advantages that it is used by many scientists around the world, and it is free and open source software. At other points, special software routines have been developed and made available to complement the book, and these are provided in the Matlab programming language. If the reader has the basic Matlab package, he/she will be able to immediately implement the programs in that platform---no extra "toolboxes" are required.

Frontmatter

Chapter 1. Introduction

Abstract

The quote from the composer and musician Vangelis sounds at first like an artist’s mystical musing, but really it couldn’t be more true, particularly of speech sound. When the vocal cords vibrate (as they do during “voiced” speech sounds), they contact each other and produce air pressure pulses in a repeating rhythm between 70–250 times each second. This rhythm is so rapid that it yields a sound in the surrounding air having the vibration rate as its fundamental frequency, which is heard as a pitched tone (melody). Moreover, the complicated mechanical nature of the vocal cord vibration gives rise to a series of harmonic frequencies in the sound, which are integer multiples of the lowest frequency.

Sean A. Fulop

Chapter 2. Phonetics and Signal Processing

Abstract

The purpose of this chapter is twofold, so it is divided into two large sections. The first section provides a very quick overview of phonetics, not only for the benefit of scientists who have limited knowledge of the subject, but also to fix some concepts and notation for the rest of the book. The second section provides a quick introduction to the fundamentals of digital signal processing that will be relied upon to discuss the spectrum analysis techniques presented in later chapters. While the most complicated mathematical derivations and digressions have been set within gray boxes, and can thus be skimmed over by less mathematically inclined readers, this section does include a fair amount of mathematics in the main text. I found this to be necessary to accomplish my main goal, which is to present the fundamentals of signal processing to the uninitiated in a simplified format that will facilitate learning of those concepts which are essential to understanding the methods presented in later chapters. Readers whose background is too deficient in mathematics to permit understanding the equations will, I hope, still be able to learn many important concepts by focusing on the discussions, examples, and figures.

Sean A. Fulop

Chapter 3. History of Speech Spectrum Analysis

Abstract

This chapter traces the history of sound (and in particular, speech) spectrum analysis from its very beginnings in the theory developed by Fourier in the early 1800s. A particular goal of this historical outline is to describe not just the events and developments through the years, but also the beliefs and attitudes of scientists as these changed with the development of a better understanding. Some of the scientists whose work is discussed here are still widely known and cited, while others’ contributions have been unjustly forgotten. With this chapter, I also hope to straighten out the historical record in this respect, giving all due credit to those pioneers who uncovered many facts about speech spectra that are now taken for granted.

Sean A. Fulop

Chapter 4. The Fourier Power Spectrum and Spectrogram

Abstract

This chapter covers the traditional speech analysis methods which rely on the discrete Fourier transform and its extension to the ubiquitous time–frequency representation known as the spectrogram. The first topic is the power spectrum of a signal window, which is derived from the magnitude of the Fourier transform in the manner explained in Chap. 2. Here, I discuss some of the methods for making power spectra of speech sounds, in an effort to show the best ways of accomplishing the desired imaging. Power spectra may be used to examine the formants of vowels and other resonant sounds, and when treated statistically they may also illuminate aspects of the noise produced during voiceless consonants. A third important application of power spectra is in the analysis and detection of different phonation types such as creaky and breathy voicing. Numerous figures provide examples of power spectra illustrating the points discussed in the text.

Sean A. Fulop

Chapter 5. Alternative Time–Frequency Representations

Abstract

The spectrogram is a well-studied time–frequency representation, but there are numerous others. There has been a rich literature on this subject, and many different time–frequency representations have been devised, studied, and applied to various signal analysis problems (e.g. [1]). Unfortunately, the subject has never to my knowledge been made accessible to speech scientists, with the result that we have rarely availed ourselves of any such representations other than the spectrogram. This chapter is an attempt to rectify this situation somewhat, although the presentation takes on a more advanced mathematical character at certain points.

Sean A. Fulop

Chapter 6. The Reassigned Spectrogram

Abstract

This chapter introduces a relatively new modified form of the spectrogram which has variously been described by the term reassigned, or by the phrase time-corrected instantaneous frequency. While the latter is more descriptive of the scheme, the former is shorter and seems to have gained supremacy in the (still rather sparse) literature on the subject. The reassignment process yields a modification of a spectrogram which effectively “sharpens” it, concentrating the smeared out spectrographic points around tighter lines in both the frequency and time dimensions. This approach relies on the understanding of the spectrogram as showing instantaneous frequencies of line components, and sharpens that view of things while doing away with the idea of showing how energy is distributed in the time–frequency plane.

Sean A. Fulop

Chapter 7. Linear Prediction and ARMA Spectrum Estimation

Abstract

Up to this point, I have presented speech analysis methods which obtain spectral or time–frequency information from the signal data directly by means of some kind of transformation. It has been shown how different processing schemes can extract such information from the signal in different ways, but none has relied on any special assumptions about a speech signal beyond the most general and widely-held sort. In this chapter, I introduce an entirely different approach to what is often called “spectral estimation,” in which the signal is explicitly assumed to conform to the outlines of a model. The parameters of the assumed model are then estimated from the signal data, and the values of the parameters are used as a kind of proxy estimate of corresponding signal properties.

Sean A. Fulop

Backmatter

Title: Speech Spectrum Analysis
Author: Sean A. Fulop
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-17478-0
Print ISBN: 978-3-642-17477-3
DOI: https://doi.org/10.1007/978-3-642-17478-0

Springer Professional

About this book

Table of Contents

Frontmatter

Chapter 1. Introduction

Chapter 2. Phonetics and Signal Processing

Chapter 3. History of Speech Spectrum Analysis

Chapter 4. The Fourier Power Spectrum and Spectrogram

Chapter 5. Alternative Time–Frequency Representations

Chapter 6. The Reassigned Spectrogram

Chapter 7. Linear Prediction and ARMA Spectrum Estimation

Backmatter