Skip to main content
Top

2013 | Book

Vowel Inherent Spectral Change

Editors: Geoffrey Stewart Morrison, Peter F. Assmann

Publisher: Springer Berlin Heidelberg

Book Series : Modern Acoustics and Signal Processing

insite
SEARCH

About this book

It has been traditional in phonetic research to characterize monophthongs using a set of static formant frequencies, i.e., formant frequencies taken from a single time-point in the vowel or averaged over the time-course of the vowel. However, over the last twenty years a growing body of research has demonstrated that, at least for a number of dialects of North American English, vowels which are traditionally described as monophthongs often have substantial spectral change. Vowel inherent spectral change has been observed in speakers’ productions, and has also been found to have a substantial effect on listeners’ perception. In terms of acoustics, the traditional categorical distinction between monophthongs and diphthongs can be replaced by a gradient description of dynamic spectral patterns. This book includes chapters addressing various aspects of vowel inherent spectral change (VISC), including theoretical and experimental studies of the perceptually relevant aspects of VISC, the relationship between articulation (vocal-tract trajectories) and VISC, historical changes related VISC, cross-dialect, cross-language, and cross-age-group comparisons of VISC, the effects of VISC on second-language speech learning, and the use of VISC in forensic voice comparison.

Table of Contents

Frontmatter
Introduction
Abstract
The term vowel inherent spectral change (VISC) was coined in Nearey and Assmann (1986). It refers to the changes in spectral properties over the time course of a vowel which are characteristic of vowel-phoneme identity. It refers not only to the widely-recognized spectral changes found in diphthongs and triphthongs, but also to the less-well-recognized spectral changes which are characteristic of vowel-phonemes which have traditionally been called monophthongs in some dialects of some languages, particularly in North American English
Peter F. Assmann, Geoffrey Stewart Morrison

VISC Production

Frontmatter
Static and Dynamic Approaches to Vowel Perception
Abstract
The goal of this chapter is to provide a broad overview of work leading to the view that vowel inherent spectral change (VISC) plays a significant role in vowel perception. The view that implicitly guided vowel perception research for many years was the idea that nearly all of the information that was needed to specify vowel quality was to be found in a cross section of the vowel spectrum sampled at a reasonably steady portion of the vowel. A good deal of evidence shows that this static view is incomplete, including: (1) measurement data showing that most nominally monophthongal English vowels show significant spectral change throughout the course of the vowel; (2) pattern recognition studies showing that vowel categories are separated with far greater accuracy by models that take spectral change into account than otherwise comparable models using features sampled at steady-state; (3) perceptual experiments with “silent center” vowels showing that vowel steady-states can be removed with little or no effect on vowel intelligibility; and (4) perceptual experiments with both naturally spoken and synthetic speech showing that vowels with stationary spectral patterns are not well identified.
James M. Hillenbrand
Theories of Vowel Inherent Spectral Change
Abstract
In many dialects of North-American English, in addition to vowels which are traditionally described as true and phonetic diphthongs, several vowels traditionally described as monophthongs also have substantial formant movement. Vowel inherent spectral change (VISC) has also been found to be an important factor in the perception of vowel-phoneme identity. This chapter reviews literature pertinent to theories of the perceptually relevant aspects of VISC. Three basic hypotheses have been proposed, onset + offset, onset + slope, and onset + direction; each taking the position that initial formant values are relevant but then differing as to the relevant aspect of formant movement. Of these, the weight of evidence indicates that the onset + offset hypothesis is superior in terms of leading to higher correct-classification rates and higher correlation with listeners’ vowel identification responses. Models which fit curves to whole formant trajectories have, as yet, not been found to outperform simple models based on formant measurements taken at two points (onset and offset) in formant trajectories. A popular curve-fitting model (first-order discrete cosine-transform, DCT) is interpretable as a parameterization of the onset + offset hypothesis.
Geoffrey Stewart Morrison
Vowel Inherent Spectral Change in the Vowels of North American English
Abstract
Nearey and Assmann (1986) coined the term ‘vowel inherent spectral change’ (VISC) to refer to change in spectral properties inherent to the phonetic specification of vowels. Although such change includes the relatively large formant changes associated with acknowledged diphthongs, the term was explicitly intended to include reliable (but possibly more subtle) spectral change associated with vowel categories of North American English typically regarded as monophthongs. This chapter reviews statistical and graphical evidence of dynamic formant patterns in vowels of several CV and CVC syllable types in three regional dialects of English: Dallas, Texas (Assmann and Katz, 2000), Western Michigan (Hillenbrand et al., 1995) and Northern Alberta (Thomson 2007). Evidence is reviewed for the importance of VISC in vowel perception. While certain apparent VISC patterns show up across dialects, both dialect differences and differences in context make it clear that more sophisticated methods will be required to fully separate several factors affecting formant change in vowels. Promising preliminary results are presented using a new non-linear regression method that extends compositional models of Broad and Clermont (1987, 2002, 2010) to include dual vowel targets.
Terrance M. Nearey
Dynamic Specification of Coarticulated Vowels
Research Chronology, Theory, and Hypotheses
Abstract
This chapter summarizes research conducted over a 35 year period on the dynamic specification of vowels. A series of experiments comparing vowels in consonant context with vowels produced in isolation failed to support talker normalization theories that predicted higher accuracy through prior exposure to a talker’s “point vowels.” Instead, these studies showed that vowels in consonant context were more accurately identified than isolated vowels, supporting a dynamic specification of vowels theory over static target theories, leading to the proposal that important information is contained in the formant transitions. Consonant–vowel coarticulation is not a source of “noise”, rather it gives rise to an acoustic array in which the consonants and vowels are cospecified in the time-varying spectral configuration which we call dynamic specification. Subsequent experiments showed high identification accuracy for “silent center vowels” in which the central portion of the CVC syllable was removed by gating. Identification accuracy was not disrupted when the onset and offset portions were produced by different speakers. Vowel identification improved with increasing duration of the onsets or offset portions. Onsets were identified more accurately than offsets but neither was as well identified as the silent center syllables. Collectively these and other experiments summarized herein support the view that the most important source of information for speaker-invariant vowel identity is carried in dynamic specification of vowel onset and offset spectral patterns, with vowel duration also playing a role. Subsequent experiments with North German vowels, which do not exhibit the degree of vowel diphthongization reported in American English dialects, showed that listeners rely on dynamic spectro-temporal information specified by syllable onsets and offsets, in addition to cues provided by inherent vowel duration. Cross-language comparisons are presented from the perspective of adaptive dispersion theory. These comparisons support the view that dynamic properties are perceptually more important in differentiating vowels in languages with large vowel inventories.
Winifred Strange, James J. Jenkins
Perception of Vowel Sounds Within a Biologically Realistic Model of Efficient Coding
Abstract
Predicated upon principles of information theory, efficient coding has proven valuable for understanding visual perception. Here, we illustrate how efficient coding provides a powerful explanatory framework for understanding speech perception. This framework dissolves debates about objects of perception, instead focusing on the objective of perception: optimizing information transmission between the environment and perceivers. A simple measure of physiologically significant information is shown to predict intelligibility of variable-rate speech and discriminability of vowel sounds. Reliable covariance between acoustic attributes in complex sounds, both speech and nonspeech, is demonstrated to be amply available in natural sounds and efficiently coded by listeners. An efficient coding framework provides a productive approach to answer questions concerning perception of vowel sounds (including vowel inherent spectral change), perception of speech, and perception most broadly.
Keith R. Kluender, Christian E. Stilp, Michael Kiefte

VISC Production

Frontmatter
Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function
Abstract
In their purest form, vowels can be conceived as being produced with static configurations of the vocal tract shape. Laboratory measurements of both acoustic and articulatory characteristics of vowels are typically performed with this assumption. In the case of natural, connected speech, however, the vocal tract shape undergoes nearly continuous change thus a true “static” configuration is rarely produced. Listeners are able to identify vowels in this time-varying situation, often with greater accuracy than for a vowel deliberately produced without any vocal tract change. This chapter examines the time-varying changes of the vocal tract shape that produce vowel inherent spectral change. Specifically, a model of the vocal tract area function is used to investigate how time-dependent formant frequencies originate from movement of the vocal tract.
Brad H. Story, Kate Bunton

VISC In Different Populations of Speakers

Cross-Dialectal Differences in Dynamic Formant Patterns in American English Vowels
Abstract
This chapter provides evidence that vowel inherent spectral change (VISC) can vary systematically across dialects of the same language. The nature and use of VISC in selected “monophthongs” is examined in three distinct dialect regions in the United States. In each dialect area, the dynamic formant pattern is analyzed for five different age groups in order to observe cross-generational change in relation to specific vowel shifts and other vowel changes currently active in each dialect. The dialect regions examined included central Ohio (representing the Midland dialect), southeastern Wisconsin (representing the Inland North whose vowel system is affected by the Northern Cities Shift) and western North Carolina (representing the South whose vowel system is affected by the Southern Vowel Shift). Following a description of these dialect areas, we first introduce principles of chain shifting and the transmission problem, originally developed in the fields of sound change and sociolinguistics. Selective acoustic data are then presented for each dialect region and cross-generational patterns of vowel change are discussed. The chapter concludes that variation in formant trajectories produced between vowel onset and offset (VISC) is central to what differentiates regional variants of American English in the United States. Furthermore, a systematic variation in VISC is found in cross-generational change in acoustic characteristics of vowels within each dialect. The perceptual relevance of this acoustic variation needs to be addressed in future research.
Ewa Jacewicz, Robert Allen Fox
Developmental Patterns in Children’s Speech: Patterns of Spectral Change in Vowels
Abstract
The aim of this chapter is to compare the patterns of spectral change in American English vowels spoken by children and adults from the North Texas region. Children’s speech differs from adult speech in several important ways. First, children have smaller larynges and supra-laryngeal vocal tracts than adults, with the result that their formants and fundamental frequencies are higher. Second, the temporal and spectral properties of children’s speech are inherently more variable, a consequence of developmental changes in motor control. Both of these sources of variability raise interesting questions for the representation of vowel inherent spectral change (VISC) and theories of vowel specification. Acoustic analyses of children’s vowels indicate reliable VISC properties as early as age five, the youngest group studied here. Consistent with developmental changes in vocal tract anatomy, the frequencies of vowel formants show an overall systematic decrease with age, and these changes are larger in males than females. The effects of age on formant frequencies vary somewhat from vowel to vowel, but these discrepancies do not appear to interact systematically with VISC. Pattern classification tests indicate that (1) vowels are more accurately recognized when two analysis frames, sampled around 20 and 70 % of the vowel duration, are presented to the classifier, compared to any single frame; (2) adding a third analysis frame does not yield substantially higher recognition scores; and (3) the optimum locations for sampling the formant trajectory are consistent across different age groups of children.
Peter F. Assmann, Terrance M. Nearey, Sneha V. Bharadwaj
Vowel Inherent Spectral Change and the Second-Language Learner
Abstract
Because vowel inherent spectral change (VISC) is necessary for optimal identification of vowels by native English speakers, learners of English as a second language must acquire relevant information about VISC in order to achieve native-like levels of performance in both perception and production of vowels in English. This chapter reviews studies of both perception and production of VISC by learners of English as a second language, whose first language is Spanish, with either an earlier or later age of immersion in an English speaking environment. In perception, later learners of English appeared to rely more heavily on duration cues than monolinguals and early learners and, in some cases, to be less able to use VISC to discriminate near neighbors in the vowel space. In production, acoustic analyses were performed for American English vowels produced by participants in each group. The data are examined in terms of the degree of separation achieved by each talker group across the course of the vowel, as represented by three time points (20, 50 and 80 % of vowel duration). Additional analyses of productions by the most and least intelligible talkers in each group were used to explore individual talkers’ strategies for using VISC to distinguish neighbor vowels from one another.
Catherine L. Rogers, Merete M. Glasbrenner, Teresa M. DeMasi, Michelle Bianchi

VISC Applied

Frontmatter
Vowel Inherent Spectral Change in Forensic Voice Comparison
Abstract
The onset + offset model of vowel inherent spectral change has been found to be effective for vowel-phoneme identification, and not to be outperformed by more sophisticated parametric-curve models. This suggests that if only simple cues such as initial and final formant values are necessary for signaling phoneme identity, then speakers may have considerable freedom in the exact path taken between the initial and final formant values. If the constraints on formant trajectories are relatively lax with respect to vowel-phoneme identity, then with respect to speaker identity there may be considerable information contained in the details of formant trajectories. Differences in physiology and idiosyncrasies in the use of motor commands may mean that different individuals produce different formant trajectories between the beginning and end of the same vowel phoneme. If within-speaker variability is substantially smaller than between-speaker variability then formant trajectories may be effective features for forensic voice comparison. This chapter reviews a number of forensic-voice-comparison studies which have used different procedures to extract information from formant trajectories. It concludes that information extracted from formant trajectories can lead to a high degree of validity in forensic voice comparison (at least under controlled conditions), and that a whole trajectory approach based on parametric curves outperforms an onset + offset model.
Geoffrey Stewart Morrison
Backmatter
Metadata
Title
Vowel Inherent Spectral Change
Editors
Geoffrey Stewart Morrison
Peter F. Assmann
Copyright Year
2013
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-14209-3
Print ISBN
978-3-642-14208-6
DOI
https://doi.org/10.1007/978-3-642-14209-3