nach oben

2011 | Buch

Fundamentals of Speaker Recognition

verfasst von: Homayoon Beigi

Verlag: Springer US

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

An emerging technology, Speaker Recognition is becoming well-known for providing voice authentication over the telephone for helpdesks, call centres and other enterprise businesses for business process automation.

"Fundamentals of Speaker Recognition" introduces Speaker Identification, Speaker Verification, Speaker (Audio Event) Classification, Speaker Detection, Speaker Tracking and more. The technical problems are rigorously defined, and a complete picture is made of the relevance of the discussed algorithms and their usage in building a comprehensive Speaker Recognition System.

Designed as a textbook with examples and exercises at the end of each chapter, "Fundamentals of Speaker Recognition" is suitable for advanced-level students in computer science and engineering, concentrating on biometrics, speech recognition, pattern recognition, signal processing and, specifically, speaker recognition. It is also a valuable reference for developers of commercial technology and for speech scientists.

Please click on the link under "Additional Information" to view supplemental information including the Table of Contents and Index.

Inhaltsverzeichnis

Frontmatter

Basic Theory

Frontmatter

Chapter 1. Introduction

Abstract

Speaker recognition, sometimes referred to as speaker biometrics, includes identification, verification (authentication), classification, and by extension, segmentation, tracking and detection of speakers. It is a generic term used for any procedure which involves knowledge of the identity of a person based on his/her voice.

Homayoon Beigi

Chapter 2. The Anatomy of Speech

Abstract

To achieve an understanding of human speech production, first, one should study the anatomy of the vocal system (the speech signal production machinery). It is fair to say that one should grasp the process of speech production, before attempting to model a system that would understand it. Once this mechanism is better understood, we may try to create systems that recognize its distinguishing characteristics and nuances, thus recognizing the individual speaker.

Homayoon Beigi

Chapter 3. Signal Representation of Speech

Abstract

The main focus of this chapter is the signal representation of speech. Hence, before going any further we should define the concept of a signal.

Homayoon Beigi

Chapter 4. Phonetics and Phonology

Abstract

According to Summer Institute of Linguistics (SIL International) [20], the linguistic hierarchy from one of the leaves to the top is as follows, Phonetics, Phonology, Morphology, Syntax, Semantics, and Pragmatics. In Chapter 2, we reviewed the anatomy of the human speech production and perception. In this chapter we will start by exploring the range and limitations imposed by the speech production system, so called phonetics. Then, we will follow to a higher level in the hierarchy by studying how sounds are organized and used in human languages, so called Phonology, along with the rest of the hierarchy which we will call linguistics as a whole. In the last part of this chapter, we will pay specific attention to suprasegmental1 flow of human speech called prosody. This is to give the reader a basic understanding of the types of sounds produced by the vocal tract. Of course, as with many of the other topics covered in this book, we will only scratch the surface and will concentrate on portions of the discipline that are more directly relevant to the speaker recognition task.

Homayoon Beigi

Chapter 5. Signal Processing of Speech and Feature Extraction

Abstract

In this chapter we will be reviewing signal processing techniques for speech. It is very important for the reader to have been familiarized with the contents of Chapter 3 and Appendix B before continuing to read this chapter. Even if the reader is a seasoned researcher in the field, it is important to at least refer to those chapters for details. To keep the main flow of the book fluid, most of the major technical details have been moved to Appendix B.

Homayoon Beigi

Chapter 6. Probability Theory and Statistics

Abstract

In this chapter we will review some basic Probability Theory and Statistics to the level that applies to speaker recognition. This coverage is by no means complete. For a more complete treatment of these subjects, the avid reader is referred to [27, 37, 39, 42, 31, 22].

Homayoon Beigi

Chapter 7. Information Theory

Abstract

Figure 7.1 presents a generic communication system. In this figure, the message is composed at the source. Then, it undergoes some coding to become suitable for transmission through the channel. Most channels are noisy and this is signified by the noise that affects the channel from the block on top. At the receiving end of the channel, a decoder must decode the encoded and noisy message into one that the recipient will understand. This is the basis for the development of the topic of Information Theory which started with the advent of the telegraph and telephone systems. Fisher [7], Nyquist [14, 15], Hartley [9], Shannon [18], Wiener [22], and Kullback [12] were among some of the early developers of Information Theory. A lot of this work was developed in response to the encryption and decryption needs for sensitive communications during the second world war.

Homayoon Beigi

Chapter 8. Metrics and Divergences

Abstract

In this chapter, we continue the treatment of distances (metrics) and divergences by a introducing the terminology and the notation which will be used throughout this book for these two concepts.

Homayoon Beigi

Chapter 9. Decision Theory

Abstract

Decision theory is one of the most basic underlying theories which is crucial for the creation, understanding and implementation of a successful speaker recognition algorithm. To begin covering this topic, we need to understand the process of formalizing a hypothesis and testing it. Then, we will continue to talk about Bayesian decision theory. We also talk about hypotheses in the development of information theoretic concepts of Chapter 7.

Homayoon Beigi

Chapter 10. Parameter Estimation

Abstract

In Section 9.3, we discussed the definition of hypotheses which are designed to classify data into different categories.

Homayoon Beigi

Chapter 11. Unsupervised Clustering and Learning

Abstract

In Chapter 10 we discussed parameter estimation and model selection. In this chapter, we will review different techniques for partitioning the total sample space

Homayoon Beigi

Chapter 12. Transformation

Abstract

In Chapter 10, we discussed techniques for clustering and the estimation of model parameters.

Homayoon Beigi

Chapter 13. Hidden Markov Modeling (HMM)

Abstract

In speaker recognition, as well as other related topics such as speech recognition, speech synthesis, language modeling, language recognition, language understanding, language translation and so on, we often look for sequences of objects.

Homayoon Beigi

Chapter 14. Neural Networks

Abstract

Neural network (NN) models have been studied for many years with the hope that the superior learning and recognition capability of the human brain could be emulated by man-made machines.

Homayoon Beigi

Chapter 15. Support Vector Machines

Abstract

In Section B.4 we defined a kernel function,K (s, t) – see Definition B.56. Recently, quite a lot of attention has been given to kernel methods for their inherent discriminative abilities and the capability of handling nonlinear decision boundaries with good discrimination scalability, in relation with increasing dimensions of observations vectors. Of course using kernel techniques is nothing new. Integral transforms, for example, are some of the oldest techniques which use kernels to be able to transform a problem from one space to another space which would be more suitable for a solution. Eventually, the solution is transformed back to the original space. Appendix B is devoted to the details of such techniques and we have already used different transforms in other parts of the book, especially in doing feature extraction. It is highly recommended to the reader to review Appendix B entirely.

Homayoon Beigi

Advanced Theory

Frontmatter

Chapter 16. Speaker Modeling

Abstract

The basic objective of speaker modeling is to be able to be able to associate an identifier to the speech of an individual speaker which is different from all other unique speakers, if not in the world, at least in the database of interest. Once this is achieved, all the different branches of speaker recognition, discussed in Chapter 1, may come to fruition. In other words, speaker modeling lies at the heart of the speaker recognition task. This may not necessarily be true with many other seemingly similar fields. For example, speech recognition which is very closely related to speaker recognition requires many different stages, many of which are of similar importance. For instance, in speech recognition, the phonetic modeling, language modeling, and search are almost of similar importance. In speaker recognition, on the other hand, if a good model of the speaker is built, the rest of the work becomes extremely easy.

Homayoon Beigi

Chapter 17. Speaker Recognition

Abstract

The objective of the enrollment process is to modify (adapt) a speaker-independent model into one that best characterizes the target speaker’s vocal tract characteristics. Depending on whether the task at hand is text-dependent or text-independent, different objectives should be observed while designing the enrollment process.

Homayoon Beigi

Chapter 18. Signal Enhancement and Compensation

Abstract

In Chapter 5 we had to defer the topic of signal enhancement since we had not yet covered speaker modeling. As we shall see, some of the arguments for different techniques refer to the act of recognition. It is recommended that after reading this chapter, the reader would return to Chapter 5 and quickly glance at the topics which were discussed. Some aspects in that chapter may sink in better with accumulated knowledge.

Homayoon Beigi

Practice

Frontmatter

Chapter 19. Evaluation and Representation of Results

Abstract

Given all the discussions in this book, when it is time to present recognition results for the sake of performance evaluation and comparison among different techniques, some quantitative evaluation standards are deemed necessary. This chapter discusses the different evaluation metrics and jargon used in the speaker recognition discipline.

Homayoon Beigi

Chapter 20. Time Lapse Effects (Case Study)

Abstract

The effect of time-lapse has not been studied well in biometrics. Although the literature is full of brief discussions about time-lapse effects in speaker recognition, no proper quantitative study has been done on the subject. [9, 8] There are two main types of time-lapse effects: short-term and long-term (aging). Here, short-term effects are studied for speaker recognition (speaker identification and speaker verification). The RecoMadeEasy1 speaker recognition engine has been used to obtain baseline results for 22 speakers who have been involved in a persistent (ongoing) study.

Homayoon Beigi

Chapter 21. Adaptation over Time (Case Study)

Abstract

In the previous chapter we noted the degrading effects of time lapse.

Homayoon Beigi

Chapter 22. Overall Design

Abstract

As we have seen, there are quite a number of modeling tools which are available for utilization in different branches speaker recognition. There are many different variables which help us decide on the type of algorithms or techniques. These are mostly application-dependent. In fact one of the main purposes of writing this textbook was to bring the many methods together and provide enough information to the reader so that the art of choosing different components for a specific problem would be supported by some a-priori knowledge about these available techniques.

Homayoon Beigi

Background Material

Frontmatter

Chapter 23. Linear Algebra

Homayoon Beigi

Chapter 24. Integral Transforms

Abstract

This chapter is a rich section of the book which includes information usually covered in pieces in graduate courses such as Complex Variable Theory, Integral Transforms, Partial Differential Equations, Analog and Digital Signal Processing, and Control. As stated in the Preface, one of the goals of this book is to bring all the fundamental sciences and mathematics needed for doing speaker recognition into one place with a comprehensive narrative connecting all the dots in the field. The fact that speaker recognition is greatly multi-disciplinary has been the stumbling block for the development of such a textbook. Although it is impossible to be complete, but the goal is to include all the necessary information in one place. It makes this chapter ideal for students and professionals and allows for a complete understanding of the subject. It is recommended that it be treated like any other chapter of the book and not skipped. The only reason it is included in this background chapter is to keep the higher-level flow of speaker recognition smoother, but as they say, “the Devil is in the Details.”

Homayoon Beigi

Chapter 25. Nonlinear Optimization

Abstract

Throughout this chapter, we only treat the minimization problem for convex functions (see Definition 24.23). Furthermore, in most cases, we assume that the objective function being minimized is a quadratic function. These minimization assumptions may easily cover the cases where a function needs to be maximized. In the case of concave functions (Definition 24.25), where we are interested in the maxima, the function may be multiplied by -1 which inverts it into a convex function such that the location of the maximum now points to the minimum of the new function. So, the maximization function is changed to a minimization function.

Homayoon Beigi

Chapter 26. Standards

Abstract

In this chapter, we will be reviewing some of the standards and developments of standards, related to speaker recognition. As we will see, there are a few standards bodies which have made certain efforts in developing standard transmission, storage and control schemes for speech-related applications. Most of these standards are not directly developed for speaker recognition. A good portion of them are developments for the telephone communication industry. More recently standards have been developed to be used with audio transmission over the Internet and other telecommunication networks.

Homayoon Beigi

Backmatter

Titel: Fundamentals of Speaker Recognition
verfasst von: Homayoon Beigi
Verlag: Springer US
Electronic ISBN: 978-0-387-77592-0
Print ISBN: 978-0-387-77591-3
DOI: https://doi.org/10.1007/978-0-387-77592-0