Skip to main content

2010 | Buch

Speech Processing in Embedded Systems

insite
SUCHEN

Über dieses Buch

Speech Processing has rapidly emerged as one of the most widespread and well-understood application areas in the broader discipline of Digital Signal Processing. Besides the telecommunications applications that have hitherto been the largest users of speech processing algorithms, several non-traditional embedded processor applications are enhancing their functionality and user interfaces by utilizing various aspects of speech processing.

"Speech Processing in Embedded Systems" describes several areas of speech processing, and the various algorithms and industry standards that address each of these areas. The topics covered include different types of Speech Compression, Echo Cancellation, Noise Suppression, Speech Recognition and Speech Synthesis. In addition this book explores various issues and considerations related to efficient implementation of these algorithms on real-time embedded systems, including the role played by processor CPU and peripheral functionality.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
The ability to communicate with each other using spoken words is probably one of the most defining characteristics of human beings, one that distinguishes our species from the rest of the living world. Indeed, speech is considered by most people to be the most natural means of transferring thoughts, ideas, directions, and emotions from one person to another. While the written word, in the form of texts and letters, may have been the origin of modern civilization as we know it, talking and listening is a much more interactive medium of communication, as this allows two persons (or a person and a machine, as we will see in this book) to communicate with each other not only instantaneously but also simultaneously.
It is, therefore, not surprising that the recording, playback, and communication of human voice were the main objective of several early electrical systems. Microphones, loudspeakers, and telephones emerged out of this desire to capture and transmit information in the form of speech signals. Such primitive “speech processing” systems gradually evolved into more sophisticated electronic products that made extensive use of transistors, diodes, and other discrete components. The development of integrated circuits (ICs) that combined multiple discrete components together into individual silicon chips led to a tremendous growth of consumer electronic products and voice communications equipment. The size and reliability of these systems were enhanced to the point where homes and offices could widely use such equipment.
Priyabrata Sinha
Chapter 2. Signal Processing Fundamentals
Abstract
The first stepping stone to understanding the concepts and applications of Speech Processing is to be familiar with the fundamental principles of digital signal processing. Since all real-world signals are essentially analog, these must be converted into a digital format suitable for computations on a microprocessor. Sampling the signal and quantizing it into suitable digital values are critical considerations in being able to represent the signal accurately. Processing the signal often involves evaluating the effect of a predesigned system, which is accomplished using mathematical operations such as convolution. It also requires understanding the similarity or other relationship between two signals, through operations like autocorrelation and cross-correlation. Often, the frequency content of the signal is the parameter of primary importance, and in many cases this frequency content is manipulated through signal filtering techniques. This chapter will explore many of these foundational signal processing techniques and considerations, as well as the algorithmic structures that enable such processing.
Priyabrata Sinha
Chapter 3. Basic Speech Processing Concepts
Abstract
Before we explore the algorithms and techniques used to process speech signals to accomplish various objectives in an embedded application, we need to understand some fundamental principles behind the nature of speech signals. Of particular importance are the temporal and spectral characteristics of different types of vocal sounds produced by humans and what role the human speech production system itself plays in determining the properties of these sounds. This knowledge enables us to efficiently model the sounds generated, thereby providing the foundation of sophisticated techniques for compressing speech. Moreover, any spoken language is based on a combination and sequence of such sounds; hence understanding their salient features is useful for the design and implementation of effective speech recognition and synthesis techniques. In this section, we will learn how to classify the basic types of sounds generated by human voice and the underlying time-domain and frequency-domain characteristics behind these different types of sounds. Finally, and most importantly, we will explore some popular speech processing building-block techniques that enable us to extract critical pieces of information from the speech signal, such as which category a speech segment belongs to, the pitch of the sound, and the energy contained therein.
Priyabrata Sinha
Chapter 4. CPU Architectures for Speech Processing
Abstract
In the world of embedded applications such as the ones we discussed in the Introduction, the application system is typically implemented as a combination of software running on some kind of a microprocessor and external hardware. Each microprocessor available in the marketplace is associated with its own structure, functionality, and capabilities, and ultimately it is the capabilities of the microprocessor that determines what functions may or may not be executed by the software. Therefore, understanding the different types of processor architectures, both in terms of Central Processing Unit (CPU) functionality and on-chip peripheral features, is a key component of making the right system design choices for any given application. This chapter focuses on various types of CPU architectures that are commonly used for speech processing applications as well as general control applications that might include some speech processing functionality. There are several architectural features that serve as enablers for efficient execution of the signal processing and speech processing building-block functions we have discussed in the two previous chapters (not to mention the more complex algorithms we are going to explore in the remaining chapters). These architectural features and considerations are discussed in this chapter in some detail. A discussion about on-chip peripherals is left for the next chapter.
Priyabrata Sinha
Chapter 5. Peripherals for Speech Processing
Abstract
In the previous chapters, we have seen the importance of CPU architecture in enabling efficient and reliable implementation of speech processing applications. Although signal processing algorithms are executed by the CPU, and indeed some of these algorithms are complex enough that they can be greatly benefited by powerful and flexible architectural features, these are necessary but not sufficient for implementing a complete system. Sampling and conversion of real-world analog signals such as speech recordings, communicating control data to other components of the system, and accessing memory to store and access required speech samples and algorithmic parameters, are all things that are needed as part of a whole application solution. This necessitates the presence of a robust set of peripheral modules for data conversion, communications, and storage. The processor-side interface to these peripherals also needs to be efficient to utilize them effectively. Therefore, the availability of suitable peripherals on-chip is one of the vital criteria for selection of a processor platform for any embedded application, including those that include speech processing functionality. In this chapter, we will investigate the various peripheral features that are necessary or useful for implementing embedded speech processing applications.
Priyabrata Sinha
Chapter 6. Speech Compression Overview
Abstract
Speech compression, also known as speech encoding or simply speech coding, means reducing the amount of data required to represent speech. Speech encoding is becoming increasingly vital for embedded systems that involve speech processing. This section will discuss general principles and types of speech encoding techniques, and briefly address the usage of speech compression techniques in many different types of embedded systems. Advantages and technical considerations associated with incorporating speech compression algorithms into specific embedded applications will be reviewed, which will hopefully provide some insight into the process of selecting the appropriate speech compression techniques for embedded applications.
Priyabrata Sinha
Chapter 7. Waveform Coders
Abstract
In Chap. 6, we saw that there exist a wide variety of speech encoding/ decoding algorithms with varying capabilities in terms of quality of decoded speech, input sampling rate, and output data rate. One way to broadly classify speech coders is based on whether the encoding algorithm makes an attempt to represent the original waveform in some form or whether the encoded data purely consists of specific parameters that try to model the characteristics of human speech for a particular speech segment and speaker. The former category of speech coders is known as Waveform Coders. Waveform coders are immensely popular in embedded applications due to their low cost, low computational resource usage, and high speech quality, even though they do not provide as high a compression ratio as the latter category of speech coders known as Voice Coders. In this chapter, different types of quantization will be discussed and correlated with various popular Waveform Coders. A special focus will be on those coders that have been standardized by the International Telecommunications Union: G.711, G.722, and G.726A.
Priyabrata Sinha
Chapter 8. Voice Coders
Abstract
In Chap. 7, we saw various techniques for representing the waveform of speech signals in such a way that the number of bits used to represent each sample was minimized. Generally, such algorithms exploit the inherent redundancy and spectral characteristics of the speech signal, but nevertheless the original waveform can be reproduced at the decoder to a large extent. However, these Waveform Coder algorithms did not provide a very high compression ratio; hence they are not very effective when low output data rates are required in an application, either due to constraints in the memory available for storing the encoded speech or due to limited communication bandwidth. In this chapter, we will shift our focus to speech encoding algorithms that attempt to parameterize each segment of speech by encoding the characteristics of a human speech production model rather than the waveform itself. This class of speech coders, known as Voice Coders or simply Vocoders, provides applications with a greater degree of speech compression, albeit at the cost of not being able to reproduce the speech waveform itself. There are a large variety of Vocoder standards providing various capabilities, including several standards for mobile communications such as TIA IS54 VSELP and ETSI GSM Enhanced Full Rate ACELP. Only a few representative coding techniques suitable for embedded applications are described in this chapter, including some specific standardized vocoders (G.728, G.729, and G.723.1) and an open-source speech coding algorithm (Speex).
Priyabrata Sinha
Chapter 9. Noise and Echo Cancellation
Abstract
One of the key requirements in most real-life applications involving speech signals is that the speech should be devoid of unwanted noise and other interferences. Most embedded applications operate in nonideal acoustic environments. Ambient noise can often severely impair the quality of the speech signal, to the point that a person hearing it may not even be able to understand the content of the speech. If the speech is further processed by processing algorithms such as speech compression, these algorithms could potentially provide suboptimal performance when the input speech is noisy. Tasks such as speech recognition have even more stringent requirements for the quality of the speech input, and are particularly sensitive to background noise interference. To some extent such noise can be filtered using regular digital filtering techniques, but in many cases the noise does not follow a deterministic frequency profile and might even change dynamically. Such situations call for more advanced algorithms that can adapt to the noise and remove it. Besides noise, echo generated due to various electrical and acoustic reasons can also be a significant cause of signal corruption and must be eliminated. This chapter explores various techniques to reduce or eliminate noise and echo from speech signals, and looks at real-life applications where such algorithms are critical.
Priyabrata Sinha
Chapter 10. Speech Recognition
Abstract
Speech Recognition, or the process of automatically recognizing speech uttered by human users just a human listener would, is one of the most intriguing areas of speech processing. Indeed, it can be the most natural means for a human to interact with any electronics-based system as it is similar to how people receive and process a large proportion of information in their daily lives. The related tasks of Speaker Recognition and Speaker Identification are also vital elements of many embedded systems, especially those in which some kind of secure access is required. This chapter explains the benefits and various applications of speech and speaker recognition, and provides a broad overview of algorithms and techniques that are employed for these purposes. While the subtle details of such algorithms are an extensive area of research and tend to be mathematical, the core concepts will be described here with a view to efficient implementation on real-time embedded applications.
Priyabrata Sinha
Chapter 11. Speech Synthesis
Abstract
In the previous chapter, we have seen mechanisms and algorithms by which a processor-based electronic system can receive and recognize words uttered by a human speaker. The converse of Speech Recognition, in which a processor-based electronic system can actually produce speech that can be heard and understood by a human listener, is called Speech Synthesis. Like Speech Recognition, such algorithms also have a wide range of uses in daily life, some well-established and others yet to emerge to their fullest potential. Indeed, Speech Synthesis is the most natural user interface for a user of any product for receiving usage instructions, monitor system status, or simply carrying out a true man–machine communication. Speech Synthesis is also closely related to the subjects of Linguistics and Dialog Management. Although quite a few Speech Synthesis techniques are mature and well-understood in the research community, and some of these are available as software solutions in Personal Computers, there is tremendous potential for Speech Synthesis algorithms to be optimized and refined so that they gain wide acceptability in the world of embedded control.
Priyabrata Sinha
Chapter 12. Conclusion
Abstract
The road ahead for the usage of speech processing techniques in embedded applications is indeed extremely promising. As product designers continually look for ways to differentiate their products from those of their competitors, enhancements to their user interface become a critical area of differentiation. Production and interpretation of human speech will be one of the most important component of the user interface of many such systems. As we have seen in various chapters of this book, in many cases the speech processing aspect may be integral to the core functionality of the product. In yet other cases, it may simply be part of add-on features to enhance the overall user experience.
It is a very likely scenario that speech-based data input and output will be as commonplace in embedded applications as, say, entering data through a keyboard or through switches and displaying data on an LCD display or bank of LEDs. Being a natural means of communication between human beings, speech-based user interfaces will provide a greater degree of comfort for human operators, thereby enabling greater acceptance of digital implementations of products and instruments that were traditionally analog or even nonelectronic. Whether in the industrial, consumer, telecom, automotive, medical, military, or other market segments, the incorporation of speech-based controls and status reporting will add tremendous value to the application’s functionality and ease of use.
Of course, applications that are inherently speech based, such as telecommunication applications, will continue to benefit from a wider adoption of speech processing in embedded systems. For example, advances in speech compression techniques and the optimization of such algorithms for efficient execution on low-cost embedded processors will enable more optimal usage of communication bandwidth. This would also often reduce the cost of the product of interest, in turn increasing its adoption in the general population.
Continuing research and advancement in the various areas of speech processing will expand the scope of speech processing algorithms. Speech recognition in embedded systems will gradually no longer be limited to isolated word recognition but encompass the recognition and interpretation of continuous speech, e.g., in portable devices that can be used by healthcare professionals for automated transcription of patient records. Research efforts will also enable more effective recognition of multiple languages and dialects, as well as multiple accents of each language. This would, in turn, widen the employment of speech recognition in real-life products of daily use. Similar advances can be expected in Speech Synthesis, wherein enunciation of individual isolated words connected in a simplistic fashion would gradually be replaced by a more natural-sounding generation of complete sentences with the proper semantic emphases and expression of emotions as needed. These advances would find their way, for example, in a wider proportion of toys and dolls for children. Noise and Echo Cancellation algorithms would also develop further to produce greater Echo Return Loss Enhancement and finer control over the amount of noise reduction. Robustness of these algorithms, over a wide variety of acoustic environments and different types of noise, will also be an area of increasing research focus. Speech Compression research will continue to improve the quality of decoded speech while simultaneously reducing the data rate of encoded speech, i.e., increases in compression ratio. As speech processing becomes more and more popular in real-time embedded systems, an increasingly important research area would be in optimizing the computational requirements of all the above classes of speech processing algorithms.
Priyabrata Sinha
Backmatter
Metadaten
Titel
Speech Processing in Embedded Systems
verfasst von
Priyabrata Sinha
Copyright-Jahr
2010
Verlag
Springer US
Electronic ISBN
978-0-387-75581-6
Print ISBN
978-0-387-75580-9
DOI
https://doi.org/10.1007/978-0-387-75581-6

Neuer Inhalt