Skip to main content
main-content
Top

About this book

This book is about recent research in the area of profiling humans from their voice, which seeks to deduce and describe the speaker's entire persona and their surroundings from voice alone. It covers several key aspects of this technology, describing how the human voice is unique in its ability to both capture and influence the human persona -- how, in some ways, voice is more potent and valuable then DNA and fingerprints as a metric, since it not only carries information about the speaker, but also about their current state and their surroundings at the time of speaking. It provides a comprehensive review of advances made in multiple scientific fields that now contribute to its foundations. It describes how artificial intelligence enables mechanisms of discovery that were not possible before in this context, driving the field forward in unprecedented ways. It also touches upon related and relevant challenges posed by voice disguise and other mechanisms of voice manipulation. The book acts as a good resource for academic researchers, and for professional agencies in many areas such as law enforcement, healthcare, social services, entertainment etc.

Table of Contents

Frontmatter

Profiling and the Human Voice

Frontmatter

Chapter 1. Profiling and Its Facets

Abstract
The term profiling from voice refers to the deduction of personal characteristics, and information about the circumstances and environment of a speaker from their voice. At the outset, we note the distinction between the terms voice and speech. “Voice” refers to sound produced in the human vocal tract. “Speech” is the signal produced by modulating voice into meaningful patterns.
Rita Singh

Chapter 2. Production and Perception of Voice

Abstract
The goal of this chapter is to present the human speech production process in sufficient detail for the reader to understand why profiling should be possible, and to provide sufficient information to reason about the effects of different parameters on voice, so that profiling efforts may be better guided. The details are sufficient, but not complete since the area is too vast to be covered within one chapter of this book.
Rita Singh

Chapter 3. Relations Between Voice and Profile Parameters

Abstract
In Chap. 2 we saw why voice is susceptible to, and how it can respond to, intricate changes in the mechanisms of its production. In this chapter we will look at empirical evidence of this fact. We will take a closer look at what causes these changes—at the various bio-relevant and environmental parameters that have been observed to affect it. The purpose of this chapter is to give an overview of this topic, based on prior studies. However, the range of scientific studies in this context is vast. This chapter only represents a small sampling of the key findings and current understanding of the subject.
Rita Singh

Chapter 4. The Voice Signal and Its Information Content—1

Abstract
The voice signal, like all sounds, is a pressure wave. The actions of the speaker’s vocal tract result in continuous variations of pressure in the air surrounding the speaker’s mouth. These pressure waves radiate outward from the speaker’s mouth and are sensed by the listener’s ear. The information in the voice signal is encoded in these time variations of air pressure. Any computer-based analysis of voice must first convert these variations into a sequence of numbers that the computer can operate upon. This requires transduction of the pressure wave into a sequence of numbers in a manner that assuredly retains most of the information in it with minimal distortion. From the perspective of the computer, this sequence of numbers now represents the voice signal. We refer to the sequence of numbers representing the voice signal as a “digital” signal, and the process of converting the pressure wave into it as “digitization.” Subsequent computational procedures must be performed on this digitized signal in order to derive information from it. The sequence of procedures followed for computer analysis of sounds is illustrated in Fig. 4.1a.
Rita Singh

Chapter 5. The Voice Signal and Its Information Content—2

Abstract
Information in the voice signal is embedded in both its time progression and in its spectral content, i.e. in its time domain and spectrographic domain respectively. Within these domains, information relevant to profiling may be present in the patterns exhibited by specific characteristics of the voice signal. The signal may however also have characteristics that are not evident in these domains, and must be searched for in other (derivative) mathematical domains where the relevant patterns become more tangible for measurement and analysis. This, however, is the subject of feature discovery—a subject that is discussed in Part II of this book. A third domain that reflects the information in the voice signal is that of physical or abstract models that simulate or explain the voice signal and the processes that generate it. We will refer to this as the model domain.
Rita Singh

Chapter 6. Qualitative Aspects of the Voice Signal

Abstract
Of all the studies referenced in Chapters. 1 and 3, the majority have found positive correlations between various profile parameters and voice quality. The word “quality” is very loosely used in the context of audio processing. For example, one may refer to “perceptual quality,” “speech quality,” “audio quality,” “recording quality” etc. These usages must not be confused with the subject at hand—voice quality. From both signal processing and information theoretic perspectives, voice quality is an elusive entity. There is no consensus in the scientific community about its precise definition—quantitatively or even in a descriptive sense. It is in fact a complex entity that comprises a set of many (mostly) subjectively described characteristics, or sub-qualities that collectively represent it. These characteristics, or attributes of voice, give it its particular auditory flavor, and can also be thought of as comprising the overall quality of someone’s voice in a manner analogous to a sound mixer used in music production. Unfortunately, there is no consensus on even the number of sub-qualities that comprise voice quality. Regardless, some important ones are described in this chapter.
Rita Singh

Computational Profiling

Frontmatter

Chapter 7. Feature Engineering for Profiling

Abstract
Humans are able to perceive many profile parameters from voice. Such perceptual relationships between voice and parameters show that there is information relevant to profiling in the voice signal, but may not tell us what that information is. If a study finds perceptual relations between some parameter and specific features derived from voice, then that indicates the existence of a statistical relationship between those features and voice, which may be directly causal, or relate to a common underlying cause. Such features will usually be directly useful in profiling.
Rita Singh

Chapter 8. Mechanisms for Profiling

Abstract
So how is profiling actually done? Most of this book has been dedicated to developing the basic understanding needed for it. We have seen that the knowledge of how a parameter affects the vocal production mechanism can help us identify the most relevant representations from which we may extract the information needed for profiling. We have also seen how such knowledge can help us reason out why certain parameters may exert confusable influences on the voice signal. All of this knowledge can then help us design more targeted methods to discover features that are highly effective for profiling.
Rita Singh

Chapter 9. Reconstruction of the Human Persona in 3D from Voice, and its Reverse

Abstract
It seems magical that we are at a point in time where it is possible to discuss the subject of accurate, in-vacuo generation of a three dimensional image of the human form from the voice signal alone. From the discussion in this book so far, it should be evident that both direct and indirect relationships exist between voice and the human form. For example, voice can be indirectly related to bone structure. It can also at the same time be directly related to the person’s height, weight, age, gender and many other factors. These relationships can be transformed into predictive mechanisms. From predictions of the body dimensions and the weight, the person’s body mass index may be deduced; from predictions of the skull type, and the length of the vocal tract, the person’s likely skeletal proportions can be deduced.
Rita Singh

Chapter 10. Applied Profiling: Uses, Reliability and Ethics

Abstract
There are many uses of profiling. Currently, as represented by this book, the science of profiling is in its nascent stages. As it becomes more accurate, more uses for it will emerge. However, there is a dichotomy associated with this progression. While its increasing accuracy is likely to give rise to more applications, its potential to severely infringe on a person’s privacy through them will also rise. In the context of practical applications, two issues therefore become extremely important: whether the information generated through profiling is accurate or not, and whether it is relevant and ethical or not.
Rita Singh

Backmatter

Additional information