The Informational Complexity of Learning

Perspectives on Neural Networks and Generative Grammar

verfasst von: Partha Niyogi

Verlag: Springer US

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Über dieses Buch

Among other topics, The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework. The first concerns the problem of learning functional mappings using neural networks, followed by learning natural language grammars in the principles and parameters tradition of Chomsky.
These two learning problems are seemingly very different. Neural networks are real-valued, infinite-dimensional, continuous mappings. On the other hand, grammars are boolean-valued, finite-dimensional, discrete (symbolic) mappings. Furthermore the research communities that work in the two areas almost never overlap.
The book's objective is to bridge this gap. It uses the formal techniques developed in statistical learning theory and theoretical computer science over the last decade to analyze both kinds of learning problems. By asking the same question - how much information does it take to learn? - of both problems, it highlights their similarities and differences. Specific results include model selection in neural networks, active learning, language learning and evolutionary models of language change.
The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar is a very interdisciplinary work. Anyone interested in the interaction of computer science and cognitive science should enjoy the book. Researchers in artificial intelligence, neural networks, linguistics, theoretical computer science, and statistics will find it particularly relevant.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract

We introduce the framework in which learning from examples is to be studied. We develop a precise notion of informational complexity and discuss the factors upon which this depends. Finally, we provide an outline of the four problems discussed in this book, our major contributions, and their implications.

Partha Niyogi

2. On the Relationship between Hyothesis Complexity, Sample Complexity and Generalization Error for Neural Networks

Abstract

Feedforward networks are a class of approximation techniques that can be used to learn to perform some tasks from a finite set of examples. The question of the capability of a network to generalize from a finite training set to unseen data is clearly of crucial importance. In this chapter, we bound the generalization error of a class of Radial Basis Functions, for certain well defined function learning tasks, in terms of the number of parameters and number of examples. We show that the total generalization error is partly due to the insufficient representational capacity of the network (because of the finite size of the network being used) and partly due to insufficient information about the target function because of the finite number of samples. Prior research has looked at representational capacity or sample complexity in isolation. In the spirit of A. Barron, H. White and S. Geman we develop a framework to look at both. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation technique. Our result also sheds light on ways to choose an appropriate network architecture for a particular problem and the kinds of problems that can be effectively solved with finite resources, i.e., with finite number of parameters and finite amounts of data.

Partha Niyogi

3. Investigating the Sample Complexity of Active Learning Schemes

Abstract

In the classical learning framework of the previous chapter (akin to PAC) examples were randomly drawn and presented to the learner. In this chapter, we consider the possibility of a more active learner who is allowed to choose his/her own examples. Our investigations can be divided into two natural parts. The first, is in a function approximation setting, and develops an adaptive sampling strategy (equivalent to adaptive approximation) motivated from the standpoint of optimal recovery (Micchelli and Rivlin, 1976). We provide a general formulation of the problem. This can be regarded as sequential optimal recovery. We demonstrate the application of this general formulation to two special cases of functions on the real line 1) monotonically increasing functions and 2) functions with bounded derivative. An extensive investigation of the sample complexity of approximating these functions is conducted yielding both theoretical and empirical results on test functions. Our theoretical results (stated in PAC-style), along with the simulations demonstrate the superiority of our active scheme over both passive learning as well as classical optimal recovery. The second part of this chapter is in a concept learning framework and discusses the idea of ∈-focusing: a scheme where the active learner can iteratively draw examples from smaller and smaller regions of the input space thereby gaining vast improvements in sample complexity.

Partha Niyogi

4. Language Learning Problems in the Principles and Parameters Framework

Abstract

This chapter considers a learning problem in which the hypothesis class is a class of parameterized grammars. After a brief introduction to the “principles and parameters” framework of modern linguistic theory, we consider a specific learning problem previously analyzed in a seminal work by Gibson and Wexler (1994). With our informational-complexity point of view developed in this book, we reanalyze their learning problem. This puts particular emphasis on the sample complexity of learning, in contrast to previous research in the inductive inference, or Gold frameworks (see Osherson and Weinstein, 1986). We show how to formally characterize this problem in particular, and a class of learning problems in finite parameter spaces in general, as a Markov structure. Important new language learning results follow directly: we explicitly compute sample complexity bounds under different distributional assumptions, learning regimes, and grammatical parameterizations. Briefly, we may view this as a precise way to model the “poverty of stimulus” children face in language acquisition. Our reanalysis alters several conclusions made by Gibson and Wexler. We therefore consider this chapter as a useful application of learning-theoretic notions to natural languages, and their acquisition. Finally, we describe several directions for further research.

Partha Niyogi

5. The Logical Problem of Language Change

Abstract

In this chapter, we consider the problem of language change. Linguists have to explain not only how languages are learned (a problem we investigated in the previous chapter), but also how and why they have evolved in certain trajectories. While the language learning problem has concentrated on the behavior of the individual child, and how it acquires a particular grammar (from a class of grammars g), we consider, in this chapter, a population of such child learners, and investigate the emergent, global, population characteristics of the linguistic community over several generations. We argue that language change is the logical consequence of specific assumptions about grammatical theories, and learning paradigms. In particular, we are able to transform the parameterized theories, and memoryless algorithms of the previous chapter into grammatical dynamical systems, whose evolution depicts the evolving linguistic composition of the population. We investigate the linguistic, and computational consequences of this fact. From a more programmatic perspective, we lay a possible logical framework for the scientific study of historical linguistics, and introduce thereby, a formal diachronic criterion for adequacy of linguistic theories.

Partha Niyogi

6. Conclusions

Abstract

This chapter concludes our discussion by articulating the perspective that emerges over the investigations of the previous chapters. We discuss the implications of some of our specific results, their role in illuminating our point of view, and directions for future research.

Partha Niyogi

Backmatter

Titel: The Informational Complexity of Learning
verfasst von: Partha Niyogi
Verlag: Springer US
Electronic ISBN: 978-1-4615-5459-2
Print ISBN: 978-1-4613-7493-0
DOI: https://doi.org/10.1007/978-1-4615-5459-2

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction

2. On the Relationship between Hyothesis Complexity, Sample Complexity and Generalization Error for Neural Networks

3. Investigating the Sample Complexity of Active Learning Schemes

4. Language Learning Problems in the Principles and Parameters Framework

5. The Logical Problem of Language Change

6. Conclusions

Backmatter