nach oben

2013 | Buch

Kapitel lesen Erstes Kapitel lesen

Optimization

verfasst von: Kenneth Lange

Verlag: Springer New York

Buchreihe : Springer Texts in Statistics

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Finite-dimensional optimization problems occur throughout the mathematical sciences. The majority of these problems cannot be solved analytically. This introduction to optimization attempts to strike a balance between presentation of mathematical theory and development of numerical algorithms. Building on students’ skills in calculus and linear algebra, the text provides a rigorous exposition without undue abstraction. Its stress on statistical applications will be especially appealing to graduate students of statistics and biostatistics. The intended audience also includes students in applied mathematics, computational biology, computer science, economics, and physics who want to see rigorous mathematics combined with real applications.

In this second edition the emphasis remains on finite-dimensional optimization. New material has been added on the MM algorithm, block descent and ascent, and the calculus of variations. Convex calculus is now treated in much greater depth. Advanced topics such as the Fenchel conjugate, subdifferentials, duality, feasibility, alternating projections, projected gradient methods, exact penalty methods, and Bregman iteration will equip students with the essentials for understanding modern data mining techniques in high dimensions.

Inhaltsverzeichnis

Frontmatter

1. Elementary Optimization

Abstract

As one of the oldest branches of mathematics, optimization theory served as a catalyst for the development of geometry and differential calculus [258]. Today it finds applications in a myriad of scientific and engineering disciplines. The current chapter briefly surveys material that most students encounter in a good calculus course. This review is intended to showcase the variety of methods used to find the exact solutions of elementary problems. We will return to some of these methods later from a more rigorous perspective. One of the recurring themes in optimization theory is its close connection to inequalities. This chapter introduces a few classical inequalities; more will appear in succeeding chapters.

Kenneth Lange

2. The Seven C’s of Analysis

Abstract

The current chapter explains key concepts of mathematical analysis summarized by the six adjectives convergent, complete, closed, compact, continuous, and connected. Chapter 6 will add to these six c’s the seventh c, convex. At first blush these concepts seem remote from practical problems of optimization. However, painful experience and exotic counterexamples have taught mathematicians to pay attention to details. Fortunately, we can benefit from the struggles of earlier generations and bypass many of the intellectual traps.

Kenneth Lange

3. The Gauge Integral

Abstract

Much of calculus deals with the interplay between differentiation and integration. The antiquated term “antidifferentiation” emphasizes the fact that differentiation and integration are inverses of one another. We will take it for granted that readers are acquainted with the mechanics of integration. The current chapter develops just enough integration theory to make our development of differentiation in Chap. 4 and the calculus of variations in Chap. 17 respectable. It is only fair to warn readers that in other chapters a few applications to probability and statistics will assume familiarity with properties of the expectation operator not covered here.

Kenneth Lange

4. Differentiation

Abstract

Differentiation and integration are the two pillars on which all of calculus rests. For real-valued functions of a real variable, all of the major issues surrounding differentiation were settled long ago. For multivariate differentiation, there are still some subtleties and snares. We adopt a definition of differentiability that avoids most of the pitfalls and makes differentiation of vectors and matrices relatively painless. In later chapters, this definition also improves the clarity of exposition.

Kenneth Lange

5. Karush-Kuhn-Tucker Theory

Abstract

In the current chapter, we study the problem of minimizing a real-valued function $f(\boldsymbol{x})$ subject to the constraints

$$\displaystyle\begin{array}{rcl} g_{i}(\boldsymbol{x})& =& 0,\quad \quad 1 \leq i \leq p \\ h_{j}(\boldsymbol{x})& \leq & 0,\quad \quad 1 \leq j \leq q.\end{array}$$

All of these functions share some open set U ⊂R ⁿ as their domain. Maximizing $f(\boldsymbol{x})$ is equivalent to minimizing $-f(\boldsymbol{x})$, so there is no loss of generality in considering minimization. The function $f(\boldsymbol{x})$ is called the objective function, the functions $g_{i}(\boldsymbol{x})$ are called equality constraints, and the functions $h_{j}(\boldsymbol{x})$ are called inequality constraints. Any point $\boldsymbol{x} \in U$ satisfying all of the constraints is said to be feasible.

Kenneth Lange

6. Convexity

Abstract

Convexity is one of the cornerstones of mathematical analysis and has interesting consequences for optimization theory, statistical estimation, inequalities, and applied probability. Despite this fact, students seldom see convexity presented in a coherent fashion. It always seems to take a backseat to more pressing topics. The current chapter is intended as a partial remedy to this pedagogical gap.

Kenneth Lange

7. Block Relaxation

Abstract

As a gentle introduction to optimization algorithms, we now consider block relaxation. The more descriptive terms block descent and block ascent suggest either minimization or maximization rather than generic optimization. Regardless of what one terms the strategy, in many problems it pays to update only a subset of the parameters at a time. Block relaxation divides the parameters into disjoint blocks and cycles through the blocks, updating only those parameters within the pertinent block at each stage of a cycle. When each block consists of a single parameter, block relaxation is called cyclic coordinate descent or cyclic coordinate ascent. Block relaxation is best suited to unconstrained problems where the domain of the objective function reduces to a Cartesian product of the subdomains associated with the different blocks. Obviously, exact block updates are a huge advantage. Equality constraints usually present insuperable barriers to coordinate descent and ascent because parameters get locked into position. In some problems it is advantageous to consider overlapping blocks.

Kenneth Lange

8. The MM Algorithm

Abstract

Most practical optimization problems defy exact solution. In the current chapter we discuss an optimization method that relies heavily on convexity arguments and is particularly useful in high-dimensional problems such as image reconstruction [171]. This iterative method is called the MM algorithm. One of the virtues of this acronym is that it does double duty. In minimization problems, the first M of MM stands for majorize and the second M for minimize. In maximization problems, the first M stands for minorize and the second M for maximize. When it is successful, the MM algorithm substitutes a simple optimization problem for a difficult optimization problem. Simplicity can be attained by: (a) separating the variables of an optimization problem, (b) avoiding large matrix inversions, (c) linearizing an optimization problem, (d) restoring symmetry, (e) dealing with equality and inequality constraints gracefully, and (f) turning a nondifferentiable problem into a smooth problem. In simplifying the original problem, we must pay the price of iteration or iteration with a slower rate of convergence.

Kenneth Lange

9. The EM Algorithm

Abstract

Maximum likelihood is the dominant form of estimation in applied statistics. Because closed-form solutions to likelihood equations are the exception rather than the rule, numerical methods for finding maximum likelihood estimates are of paramount importance. In this chapter we study maximum likelihood estimation by the EM algorithm a special case of the MM algorithm. At the heart of every EM algorithm is some notion of missing data. Data can be missing in the ordinary sense of a failure to record certain observations on certain cases. Data can also be missing in a theoretical sense. We can think of the E (expectation) step of the algorithm as filling in the missing data. This action replaces the loglikelihood of the observed data by a minorizing function. This surrogate function is then maximized in the M step. Because the surrogate function is usually much simpler than the likelihood, we can often solve the M step analytically. The price we pay for this simplification is that the EM algorithm is iterative. Reconstructing the missing data is bound to be slightly wrong if the parameters do not already equal their maximum likelihood estimates.

Kenneth Lange

10. Newton’s Method and Scoring

Abstract

Block relaxation and the MM algorithm are hardly the only methods of optimization. Newton’s method is better known and more widely applied. Despite its defects, Newton’s method is the gold standard for speed of convergence and forms the basis of most modern optimization algorithms in low dimensions. Its many variants seek to retain its fast convergence while taming its defects. The variants all revolve around the core idea of locally approximating the objective function by a strictly convex quadratic function. At each iteration the quadratic approximation is optimized. Safeguards are introduced to keep the iterates from veering toward irrelevant stationary points.

Kenneth Lange

11. Conjugate Gradient and Quasi-Newton

Abstract

Our discussion of Newton’s method has highlighted both its strengths and its weaknesses. Related algorithms such as scoring and Gauss-Newton exploit special features of the objective function $f(\boldsymbol{x})$ in overcoming the defects of Newton’s method. We now consider algorithms that apply to generic functions $f(\boldsymbol{x})$. These algorithms also operate by locally approximating $f(\boldsymbol{x})$ by a strictly convex quadratic function. Indeed, the guiding philosophy behind many modern optimization algorithms is to see what techniques work well with quadratic functions and then to modify the best techniques to accommodate generic functions.

Kenneth Lange

12. Analysis of Convergence

Abstract

Proving convergence of the various optimization algorithms is a delicate exercise. In general, it is helpful to consider local and global convergence patterns separately. The local convergence rate of an algorithm provides a useful benchmark for comparing it to other algorithms. On this basis, Newton’s method wins hands down. However, the tradeoffs are subtle. Besides the sheer number of iterations until convergence, the computational complexity and numerical stability of an algorithm are critically important. The MM algorithm is often the epitome of numerical stability and computational simplicity. Scoring lies somewhere between Newton’s method and the MM algorithm. It tends to converge more quickly than the MM algorithm and to behave more stably than Newton’s method. Quasi-Newton methods also occupy this intermediate zone. Because the issues are complex, all of these algorithms survive and prosper in certain computational niches.

Kenneth Lange

13. Penalty and Barrier Methods

Abstract

Penalties and barriers feature prominently in two areas of modern optimization theory. First, both devices are employed to solve constrained optimization problems [96, 183, 226]. The general idea is to replace hard constraints by penalties or barriers and then exploit the well-oiled machinery for solving unconstrained problems. Penalty methods operate on the exterior of the feasible region and barrier methods on the interior. The strength of a penalty or barrier is determined by a tuning constant. In classical penalty methods, a single global tuning constant is gradually sent to ∞; in barrier methods, it is gradually sent to 0. Nothing prevents one from assigning different tuning constants to different penalties or barriers in the same problem. Either strategy generates a sequence of solutions that converges in practice to the solution of the original constrained optimization problem.

Kenneth Lange

14. Convex Calculus

Abstract

Two generations of mathematicians have labored to extend the machinery of differential calculus to convex functions. For many purposes it is convenient to generalize the definition of a convex function $f(\boldsymbol{x})$ to include the possibility that $f(\boldsymbol{x}) = \infty $. This maneuver has the advantage of allowing one to enlarge the domain of a convex function $f(\boldsymbol{x})$ defined on a convex set C ;⊂ ;R ⁿ to all of R ⁿ by the simple device of setting $f(\boldsymbol{x}) = \infty $ for $\boldsymbol{x}\not\in C$.

Kenneth Lange

15. Feasibility and Duality

Abstract

This chapter provides a concrete introduction to several advanced topics in optimization theory. Specifying an interior feasible point is the first issue that must be faced in applying a barrier method. Given an exterior point, Dykstra’s algorithm [21, 70, 79] finds the closest point in the intersection $\cap _{i=0}^{r-1}C_{i}$ of a finite number of closed convex sets. If C _i is defined by the convex constraint $h_{i}(\boldsymbol{x}) \leq 0$, then one obvious tactic for finding an interior point is to replace C _i by the set $C_{i}(\epsilon ) =\{ \boldsymbol{x} : h_{j}(\boldsymbol{x}) \leq -\epsilon \}$ for some small ε > 0. Projecting onto the intersection of the C _i(ε) then produces an interior point.

Kenneth Lange

16. Convex Minimization Algorithms

Abstract

This chapter delves into three advanced algorithms for convex minimization. The projected gradient algorithm is useful in minimizing a strictly convex quadratic over a closed convex set. Although the algorithm extends to more general convex functions, the best theoretical results are available in this limited setting. We rely on the MM principle to motivate and extend the algorithm. The connections to Dykstra’s algorithm and the contraction mapping principle add to the charm of the subject. On the minus side of the ledger, the projected gradient method can be very slow to converge. This defect is partially offset by ease of coding in many problems.

Kenneth Lange

17. The Calculus of Variations

Abstract

The calculus of variations deals with infinite dimensional optimization problems. Seventeenth century mathematicians and physicists such as Newton, Galileo, Huygens, John and James Bernoulli, L’Hôpital, and Leibniz posed and solved many variational problems. In the eighteenth century Euler made more definitive strides that were clarified and extended by Lagrange.

Kenneth Lange

Backmatter

Titel: Optimization
verfasst von: Kenneth Lange
Verlag: Springer New York
Electronic ISBN: 978-1-4614-5838-8
Print ISBN: 978-1-4614-5837-1
DOI: https://doi.org/10.1007/978-1-4614-5838-8

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Elementary Optimization

2. The Seven C’s of Analysis

3. The Gauge Integral

4. Differentiation

5. Karush-Kuhn-Tucker Theory

6. Convexity

7. Block Relaxation

8. The MM Algorithm

9. The EM Algorithm

10. Newton’s Method and Scoring

11. Conjugate Gradient and Quasi-Newton

12. Analysis of Convergence

13. Penalty and Barrier Methods

14. Convex Calculus

15. Feasibility and Duality

16. Convex Minimization Algorithms

17. The Calculus of Variations

Backmatter

Premium Partner