Skip to main content

2019 | Buch

Deep Learning for NLP and Speech Recognition

insite
SUCHEN

Über dieses Buch

This textbook explains Deep Learning Architecture, with applications to various NLP Tasks, including Document Classification, Machine Translation, Language Modeling, and Speech Recognition. With the widespread adoption of deep learning, natural language processing (NLP),and speech applications in many areas (including Finance, Healthcare, and Government) there is a growing need for one comprehensive resource that maps deep learning techniques to NLP and speech and provides insights into using the tools and libraries for real-world applications. Deep Learning for NLP and Speech Recognition explains recent deep learning methods applicable to NLP and speech, provides state-of-the-art approaches, and offers real-world case studies with code to provide hands-on experience.

Many books focus on deep learning theory or deep learning for NLP-specific tasks while others are cookbooks for tools and libraries, but the constant flux of new algorithms, tools, frameworks, and libraries in a rapidly evolving landscape means that there are few available texts that offer the material in this book.

The book is organized into three parts, aligning to different groups of readers and their expertise. The three parts are:

Machine Learning, NLP, and Speech Introduction

The first part has three chapters that introduce readers to the fields of NLP, speech recognition, deep learning and machine learning with basic theory and hands-on case studies using Python-based tools and libraries.

Deep Learning Basics

The five chapters in the second part introduce deep learning and various topics that are crucial for speech and text processing, including word embeddings, convolutional neural networks, recurrent neural networks and speech recognition basics. Theory, practical tips, state-of-the-art methods, experimentations and analysis in using the methods discussed in theory on real-world tasks.

Advanced Deep Learning Techniques for Text and Speech

The third part has five chapters that discuss the latest and cutting-edge research in the areas of deep learning that intersect with NLP and speech. Topics including attention mechanisms, memory augmented networks, transfer learning, multi-task learning, domain adaptation, reinforcement learning, and end-to-end deep learning for speech recognition are covered using case studies.

Inhaltsverzeichnis

Frontmatter

Machine Learning, NLP, and Speech Introduction

Frontmatter
Chapter 1. Introduction
Abstract
In recent years, advances in machine learning have led to significant and widespread improvements in how we interact with our world. One of the most portentous of these advances is the field of deep learning. Based on artificial neural networks that resemble those in the human brain, deep learning is a set of methods that permits computers to learn from data without human supervision and intervention. Furthermore, these methods can adapt to changing environments and provide continuous improvement to learned abilities. Today, deep learning is prevalent in our everyday life in the form of Google’s search, Apple’s Siri, and Amazon’s and Netflix’s recommendation engines to name but a few examples. When we interact with our email systems, online chatbots, and voice or image recognition systems deployed at businesses ranging from healthcare to financial services, we see robust applications of deep learning in action.
Uday Kamath, John Liu, James Whitaker
Chapter 2. Basics of Machine Learning
Abstract
The goal of this chapter is to review basic concepts in machine learning that are applicable or relate to deep learning. As it is not possible to cover every aspect of machine learning in this chapter, we refer readers who wish to get a more in-depth overview to textbooks, such as Learning from Data [AMMIL12] and Elements of Statistical Learning Theory [HTF09].
Uday Kamath, John Liu, James Whitaker
Chapter 3. Text and Speech Basics
Abstract
This chapter introduces the major topics in text and speech analytics and machine learning approaches. Neural network approaches are deferred to later chapters.
Uday Kamath, John Liu, James Whitaker

Deep Learning Basics

Frontmatter
Chapter 4. Basics of Deep Learning
Abstract
One of the most talked-about concepts in machine learning both in the academic community and in the media is the evolving field of deep learning. The idea of neural networks, and subsequently deep learning, gathers its inspiration from the biological representation of the human brain (or any brained creature for that matter).
Uday Kamath, John Liu, James Whitaker
Chapter 5. Distributed Representations
Abstract
In this chapter, we introduce the notion of word embeddings that serve as core representations of text in deep learning approaches. We start with the distributional hypothesis and explain how it can be leveraged to form semantic representations of words. We discuss the common distributional semantic models including word2vec and GloVe and their variants. We address the shortcomings of embedding models and their extension to document and concept representation. Finally, we discuss several applications to natural language processing tasks and present a case study focused on language modeling.
Uday Kamath, John Liu, James Whitaker
Chapter 6. Convolutional Neural Networks
Abstract
In the last few years, convolutional neural networks (CNNs), along with recurrent neural networks (RNNs), have become a basic building block in constructing complex deep learning solutions for various NLP, speech, and time series tasks. LeCun first introduced certain basic parts of the CNN frameworks as a general NN framework to solve various high-dimensional data problems in computer vision, speech, and time series. ImageNet applied convolutions to recognize objects in images; by improving substantially on the state of the art, ImageNet revived interest in deep learning and CNNs. Collobert et al. pioneered the application of CNNs to NLP tasks, such as POS tagging, chunking, named entity resolution, and semantic role labeling. Many changes to CNNs, from input representation, number of layers, types of pooling, optimization techniques, and applications to various NLP tasks have been active subjects of research in the last decade.
Uday Kamath, John Liu, James Whitaker
Chapter 7. Recurrent Neural Networks
Abstract
In the previous chapter, CNNs provided a way for neural networks to learn a hierarchy of weights, resembling that of n-gram classification on the text. This approach proved to be very effective for sentiment analysis, or more broadly text classification. One of the disadvantages of CNNs, however, is their inability to model contextual information over long sequences.
Uday Kamath, John Liu, James Whitaker
Chapter 8. Automatic Speech Recognition
Abstract
Automatic speech recognition (ASR) has grown tremendously in recent years, with deep learning playing a key role. Simply put, ASR is the task of converting spoken language into computer readable text (Fig. 8.1). It has quickly become ubiquitous today as a useful way to interact with technology, significantly bridging in the gap in human–computer interaction, making it more natural.
Uday Kamath, John Liu, James Whitaker

Advanced Deep Learning Techniques for Text and Speech

Frontmatter
Chapter 9. Attention and Memory Augmented Networks
Abstract
In deep learning networks, as we have seen in the previous chapters, there are good architectures for handling spatial and temporal data using various forms of convolutional and recurrent networks, respectively. When the data has certain dependencies such as out-of-order access, long-term dependencies, unordered access, most standard architectures discussed are not suitable. Let us consider a specific example from the bAbI dataset where there are stories/facts presented, a question is asked, and the answer needs to be inferred from the stories. As shown in Fig. 9.1, it requires out of order access and long-term dependencies to find the right answer.
Uday Kamath, John Liu, James Whitaker
Chapter 10. Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning
Abstract
Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.
Uday Kamath, John Liu, James Whitaker
Chapter 11. Transfer Learning: Domain Adaptation
Abstract
Domain adaptation is a form of transfer learning, in which the task remains the same, but there is a domain shift or a distribution change between the source and the target. As an example, consider a model that has learned to classify reviews on electronic products for positive and negative sentiments, and is used for classifying the reviews for hotel rooms or movies. The task of sentiment analysis remains the same, but the domain (electronics and hotel rooms) has changed. The application of the model to a separate domain poses many problems because of the change between the training data and the unseen testing data, typically known as domain shift. For example, sentences containing phrases such as “loud and clear” will be mostly considered positive in electronics whereas negative in hotel room reviews. Similarly, usage of keywords such as “lengthy” or “boring” which may be prevalent in domains such as book reviews might be completely absent in domains such as kitchen equipment reviews.
Uday Kamath, John Liu, James Whitaker
Chapter 12. End-to-End Speech Recognition
Abstract
In Chap. 8, we aimed to create an ASR system by dividing the fundamental equation
$$\displaystyle W^* = \operatorname *{argmax}_{W \in V^*} P(W|X) $$
into an acoustic model, lexicon model, and language model by using Bayes’ theorem. This approach relies heavily on the use of the conditional independence assumption and separate optimization procedures for the different models.
Uday Kamath, John Liu, James Whitaker
Chapter 13. Deep Reinforcement Learning for Text and Speech
Abstract
In this chapter, we investigate deep reinforcement learning for text and speech applications. Reinforcement learning is a branch of machine learning that deals with how agents learn a set of actions that can maximize expected cumulative reward. In past research, reinforcement learning has focused on game play. Recent advances in deep learning have opened up reinforcement learning to wider applications for real-world problems, and the field of deep reinforcement learning was spawned. In the first part of this chapter, we introduce the fundamental concepts of reinforcement learning and their extension through the use of deep neural networks. In the latter part of the chapter, we investigate several popular deep reinforcement learning algorithms and their application to text and speech NLP tasks.
Uday Kamath, John Liu, James Whitaker
Backmatter
Metadaten
Titel
Deep Learning for NLP and Speech Recognition
verfasst von
Uday Kamath
John Liu
James Whitaker
Copyright-Jahr
2019
Electronic ISBN
978-3-030-14596-5
Print ISBN
978-3-030-14595-8
DOI
https://doi.org/10.1007/978-3-030-14596-5

Premium Partner