Perspectives on Automatic Differentiation: Past, Present, and Future?

Abstract

Automatic (or algorithmic) differentiation (AD) is discussed from the standpoint of transformation of algorithms for evaluation of functions into algorithms for evaluation of their derivatives. Such dinite numerical algorithms are commonly formulated as computer programs or subroutines, hence the use of the term “automatic.” Transformations to evaluate derivatives are thus based on the wellknown formulas for derivatives of arithmetic operations and various differentiable intrinsic functions which constitute the basic steps of the algorithm. The chain rule of elementary calculus then guarantees the validity of the process. The chain rule can be applied in various ways to obtain what are called the “forward” and “reverse” modes of automatic differentiation. These modes are described in the context of the early stages of the development of AD, and a brief comparison is given. Following this brief survey, a view of present tasks and future prospects focuses on the need for further education, communication of results, and expansion of areas of application of AD. In addition, some final remarks are made concerning extension of the method of algorithm transformation to problems other than derivative evaluation.

Louis B. Rall

Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities

Abstract

Backwards calculation of derivatives – sometimes called the reverse mode, the full adjoint method, or backpropagation – has been developed and applied in many fields. This paper reviews several strands of history, advanced capabilities and types of application – particularly those which are crucial to the development of brain-like capabilities in intelligent control and artificial intelligence.

Paul J. Werbos

Solutions of ODEs with Removable Singularities

Abstract

We discuss explicit ODEs of the form ẋ = R(t, x), where R is a polynomial or rational function, and the solution x(t) has a removable singularity. We are particularly interested in functions built from elementary functions, such as x(t) = t/ sin t. We also consider implicit ODEs of the forms P(t, x,ẋ) = 0 and P(t, x,ẋ x, Ẍ) = 0.

Harley Flanders

Automatic Propagation of Uncertainties

Abstract

Motivated by problems in metrology, we consider a numerical evaluation program y = f(x) as a model for a measurement process. We use a probability density function to represent the uncertainties in the inputs x and examine some of the consequences of using Automatic Differentiation to propagate these uncertainties to the outputs y.We show how to use a combination of Taylor series propagation and interval partitioning to obtain coverage (confidence) intervals and ellipsoids based on unbiased estimators for means and covariances of the outputs, even where f is sharply non-linear, and even when the level of probability required makes the use of Monte Carlo techniques computationally problematic.

Bruce Christianson, Maurice Cox

High-Order Representation of Poincarée Maps

Abstract

A method to obtain polynomial approximations of Poincarée maps directly from a polynomial approximation of the flow or dynamical system for certain types of flows and Poincarée sections is presented. Examples for the method and its computational implementation are given.

Johannes Grote, Martin Berz, Kyoko Makino

Computation of Matrix Permanent with Automatic Differentiation

Abstract

With a well-known formulation of matrix permanent by a multivariate polynomial, algorithms for the computation of the matrix permanent are considered in terms of automatic differentiation, where a succinct program with a C++ template for the higher order derivatives is described. A special set of commutative quadratic nilpotent elements is introduced, and it is shown that the permanent can be computed efficiently as a variation of implementation of higher order automatic differentiation. Given several ways for transforming the multivariate polynomial into univariate polynomials, six algorithms that compute the value of the permanent are described with their computational complexities. One of the complexities is O(n2n), the same as that of the most popular Ryser’s algorithm.

Koichi Kubota

Computing Sparse Jacobian Matrices Optimally

Abstract

A restoration procedure based on a priori knowledge of sparsity patterns of the compressed Jacobian matrix rows is proposed. We show that if the rows of the compressed Jacobian matrix contain certain sparsity patterns the unknown entries can essentially be restored with cost at most proportional to substitution while the number of matrix-vector products to be calculated still remains optimal. We also show that the conditioning of the reduced linear system can be improved by employing a combination of direct and indirect methods of computation. Numerical test results are presented to demonstrate the effectiveness of our proposal.

Shahadat Hossain, Trond Steihaug

Application of AD-based Quasi-Newton Methods to Stiff ODEs

Abstract

Systems of stiff ordinary differential equations (ODEs) can be integrated properly only by implicit methods. For that purpose, one usually has to solve a system of nonlinear equations at each time step. This system of equations may be solved by variants of Newton’s method. The main computing effort lies in forming and factoring the Jacobian or a suitable approximation to it. We examine a new approach of constructing an appropriate quasi-Newton approximation for solving stiff ODEs. The method makes explicit use of tangent and adjoint information that can be obtained using the forward and the reverse modes of algorithmic differentiation (AD). We elaborate the conditions for invariance with respect to linear transformations of the state space and thus similarity transformations of the Jacobian. We present one new updating variant that yields such an invariant method. Numerical results for Runge-Kutta methods and linear multi-step methods are discussed.

Sebastian Schlenkrich, Andrea Walther, Andreas Griewank

Reduction of Storage Requirement by Checkpointing for Time-Dependent Optimal Control Problems in ODEs

Abstract

We consider a time-dependent optimal control problem, where the state evolution is described by an ODE. There is a variety of methods for the treatment of such problems. We prefer to view them as boundary value problems and apply to them the Riccati approach for non-linear BVPs with separated boundary conditions.

Julia Sternberg, Andreas Griewank

Improving the Performance of the Vertex Elimination Algorithm for Derivative Calculation

Abstract

In previous work [TOMS, 2004, 30(3), 266–299], we used Markowitz-like heuristics to find elimination sequences that minimise the number of floating-point operations (flops) for vertex elimination Jacobian code. We also used the depth-first traversal algorithm to reorder the statements of the Jacobian code with the aim of reducing the number of memory accesses. In this work, we study the effects of reducing flops or memory accesses within the vertex elimination algorithm for Jacobian calculation. On RISC processors, we observed that for data residing in registers, the number of flops gives a good estimate of the execution time, while for out-of-register data, the execution time is dominated by the time for memory access operations. We also present a statement reordering scheme based on a greedy list scheduling algorithm using ranking functions. This statement reordering will enable us to trade off the exploitation of the instruction level parallelism of such processors with the reduction in memory accesses.

M. Tadjouddine, F. Bodman, J. D. Pryce, S. A. Forth

Flattening Basic Blocks

Abstract

Preaccumulation of Jacobians of low-level code sections is beneficial in certain automatic differentiation application scenarios. Cross-country vertex, edge, or face elimination can produce highly efficient Jacobian preaccumulation code but requires the code section’s computational graph to be known. Such graphs can easily be derived for straight-line codes with scalar variables. Practical codes in languages such as Fortran, C, and C++ with array indexing, pointers, and references introduce ambiguities. Constructing unambiguous computational graphs is necessary to realize the full potential of cross-country elimination. We give a practical solution and investigate theoretical options for graph construction.

Jean Utke

The Adjoint Data-Flow Analyses: Formalization, Properties, and Applications

Abstract

Automatic Differentiation (AD) is a program transformation that yields derivatives. Building efficient derivative programs requires complex and specific static analysis algorithms to reduce run time and memory usage. Focusing on the reverse mode of AD, which computes adjoint programs, we specify jointly the central static analyses that are required to generate an efficient adjoint code. We use a set-based formalization from classical data-flow analysis to specify Adjoint Liveness, Adjoint Write, and To Be Recorded analyses, and their mutual influences, taking into account the specific structure of adjoint programs. We give illustrations on examples taken from real numerical programs, that we differentiate with our AD tool tapenade, which implements these analyses.

Laurent Hascoëet, Mauricio Araya-Polo

Semiautomatic Differentiation for Efficient Gradient Computations

Abstract

Many large-scale computations involve a mesh and first (or sometimes higher) partial derivatives of functions of mesh elements. In principle, automatic differentiation (AD) can provide the requisite partials more efficiently and accurately than conventional finite-difference approximations. AD requires source-code modifications, which may be little more than changes to declarations. Such simple changes can easily give improved results, e.g., when Jacobian-vector products are used iteratively to solve nonlinear equations. When gradients are required (say, for optimization) and the problem involves many variables, “backward AD” in theory is very efficient, but when carried out automatically and straightforwardly, may use a prohibitive amount of memory. In this case, applying AD separately to each element function and manually assembling the gradient pieces — semiautomatic differentiation — can deliver gradients efficiently and accurately. This paper concerns on-going work; it compares several implementations of backward AD, describes a simple operator-overloading implementation specialized for gradient computations, and compares the implementations on some mesh-optimization examples. Ideas from the specialized implementation could be used in fully general source-to-source translators for C and C++.

David M. Gay

Computing Adjoints with the NAGWare Fortran 95 Compiler

Abstract

We present a new experimental version of the differentiation-enabled NAGWare Fortran 95 compiler (from now on referred to as “the AD compiler”) that provides support for the computation of adjoints in the reverse mode of automatic differentiation (AD) [42, 136, 227]. Our implementation uses split program reversal [225, Chapter 10] in conjunction with a stack of gradients of all assignments executed inside the active section. Two papers describe the modifications of the compiler infrastructure that were required to provide forward-mode AD capabilities [126, 401]. The reverse mode presented in this paper makes extensive use of these features.

Uwe Naumann, Jan Riehme

Extension of TAPENADE toward Fortran 95

Abstract

We present extensions to the automatic differentiation tool tapenade to increase coverage of the Fortran 95 language. We show how the existing architecture of the tool, with a language independent kernel and separate front-ends and back-ends, made it easier to deal with new syntactic forms and new control structures. However, several new features of Fortran 95 required us to make important choices and improvements in tapenade. We present these features, sorted into four categories: about the top-level structure of nested modules, subprograms, and interfaces; about structured data types; about overloading capabilities; and about array features. For each category, we discuss the choices made, and we illustrate their impact on small Fortran 95 examples. Dealing with pointers and dynamic memory allocation is delayed until extension to c begins. We consider this extension to Fortran 95 as a first step towards object-oriented languages.

Valéerie Pascual, Laurent Hascoëet

A Macro Language for Derivative Definition in ADiMat

Abstract

Any automatic differentiation tool for MATLAB needs to cope with the large number of functions provided by the toolboxes. For many of these functions, derivatives have to be defined. A powerful macro language for the derivative definition, embedded in the source transformation tool ADiMat, is introduced. The macro language consists of a part where the signature of a function is matched and another part specifying the derivative of that function. Several examples illustrate the expressiveness and use of the macro language. A subset of the macro language is available to the user of ADiMat to improve the performance of the generated derivative code by exploiting high-level structure.

Christian H. Bischof, H. Martin Bücker, Andre Vehreschild

Transforming Equation-Based Models in Process Engineering

Abstract

For the solution of realistic dynamic optimization problems, the computation of derivative information is typically among the crucial ingredients in terms of both numerical accuracy and execution time. This work aims to incorporate automatic differentiation into the DyOS framework for dynamic optimization. In this framework, the optimization algorithms and the mathematical models of the process systems under consideration are implemented in separate modules. In real-life settings, a process system is formed by integrating different submodels which are possibly formulated by means of different equation-oriented modeling languages such as gPROMS or Modelica. DyOS is currently redesigned to be capable of handling such component-based models by relying on a common intermediate format called CapeML, which defines a layer of abstraction so that various models can be expressed in a manner independent from a specific modeling language. Hence, CapeML is the adequate format to which automatic differentiation is applied in this dynamic optimization framework. A novel system called ADiCape is proposed, implementing the forward mode for models written in the XML-based language CapeML. This AD transformation is expressed in the form of an XSLT stylesheet.

Christian H. Bischof, H. Martin Bücker, Wolfgang Marquardt, Monika Petera, Jutta Wyes

Simulation and Optimization of the Tevatron Accelerator

Abstract

The Tevatron accelerator, currently the particle accelerator with the highest energy in the world, consists of a ring with circumference of four miles in which protons are brought into collision with antiprotons at speeds very close to the speed of light. The accelerator currently under development at Fermilab represents a significant upgrade, but experienced significant limitations during initial operation. The correction of some of the problems that appeared using techniques of automatic differentiation are described. The skew quadrupole correction problems are addressed in more detail, and different schemes of correction are proposed.

Pavel Snopok, Carol Johnstone, Martin Berz

Periodic Orbits of Hybrid Systems and Parameter Estimation via AD

Abstract

Periodic processes are ubiquitous in biological systems, yet modeling these processes with high fidelity as periodic orbits of dynamical systems is challenging. Moreover, mathematical models of biological processes frequently contain many poorly-known parameters. This paper describes techniques for computing periodic orbits in systems of hybrid differential-algebraic equations and parameter estimation methods for fitting these orbits to data. These techniques make extensive use of automatic differentiation to evaluate derivatives accurately and effciently for time integration, parameter sensitivities, root finding and optimization. The resulting algorithms allow periodic orbits to be computed to high accuracy using coarse discretizations. Derivative computations are carried out using a new automatic differentiation package called ADMC++ that provides derivatives and Taylor series coeficients of matrix-valued functions written in the MATLAB programming language. The algorithms are applied to a periodic orbit problem in rigid-body dynamics and a parameter estimation problem in neural oscillations.

Eric Phipps, Richard Casey, John Guckenheimer

Implementation of Automatic Differentiation Tools for Multicriteria IMRT Optimization

Abstract

Automatic differentiation tools (ADOL-C) have been implemented for large-scale NLP optimization problems encountered in an advanced radiotherapy technique called Intensity Modulated Radiation Therapy (IMRT). Since IMRT treatments involve many tissue structures and their associated clinical objectives, the corresponding optimization problems are typically multi-objective. In this study, they are solved by a multi-criteria approach called Lexicographic Ordering. This approach allows clinical objectives to be categorized into several priorities or levels, and optimization is performed sequentially in order of priority while keeping the previously optimized results constrained. As a result, the feasible solution region is gradually reduced as the method progresses. For each level of optimization, the objective function and constraints are constructed interactively by a treatment planner and the corresponding Jacobian is provided by AD tools at a machine-precision level. Results indicate that a high degree of accuracy for Jacobian is essential to produce both feasible and optimal results for clinical IMRT optimization problems.

Kyung-Wook Jee, Daniel L. McShan, Benedick A. Fraass

Application of Targeted Automatic Differentiation to Large-Scale Dynamic Optimization

Abstract

A targeted AD approach is presented to calculate directional second order derivatives of ODE/DAE embedded functionals accurately and eficiently. This advance enables us to tackle the solution of large scale dynamic optimization problems using a truncated-Newton method where the Newton equation is solved approximately to update the direction for the next optimization step. The proposed directional second order adjoint method (dSOA) provides accurate Hessian-vector products for this algorithm. The implementation of the “dSOA powered” truncated- Newton method for the solution of large scale dynamic optimization problems is showcased with an example.

Derya B. Özyurt, Paul I. Barton

Automatic Differentiation: A Tool for Variational Data Assimilation and Adjoint Sensitivity Analysis for Flood Modeling

Abstract

This paper illustrates the potential of automatic differentiation (AD) for very challenging problems related to the modeling of complex environmental systems prone to floods. Numerical models are driven by inputs (initial conditions, boundary conditions and parameters) which cannot be directly inferred from measurements. For that reason, robust and eficient methods are required to assess the effects of inputs variations on computed results and estimate the key inputs to fit available observations. We thus consider variational data assimilation to solve the parameter estimation problem for a river hydraulics model, and adjoint sensitivity analysis for a rainfall-runo. model, two essential components involved in the generation and propagation of floods. Both applications require the computation of the gradient of a functional, which can be simply derived from the solution of an adjoint model. The adjoint method, which was successfully applied in meteorology and oceanography, is described from its mathematical formulation to its practical implementation using the automatic differentiation tool Tapenade.

W. Castaings, D. Dartus, M. Honnorat, F.-X. Le Dimet, Y. Loukili, J. Monnier

Development of an Adjoint for a Complex Atmospheric Model, the ARPS, using TAF

Abstract

Large-scale scientific computer models, such as operational weather predictions models, pose challenges for the applicability of AD tools. We report the application of TAF to the development of the adjoint model and tangent linear model of a complex atmospheric model, ARPS. Strategies to overcome the problems encountered during the development process are discussed. A rigorous verification procedure of the adjoint model is presented. Simple experiments are carried out for sensitivity study, and the results confirm the correctness of the generated models.

Ying Xiao, Ming Xue, William Martin, Jidong Gao

Tangent Linear and Adjoint Versions of NASA/GMAO’s Fortran 90 Global Weather Forecast Model

Abstract

The NASA finite-volume General Circulation Model (fvGCM) is a three-dimensional Navier-Stokes solver being used for quasi-operational weather forecasting at NASA/GMAO. We use the automatic differentiation tool TAF to generate eficient tangent linear and adjoint versions from the Fortran 90 source code of fvGCM’s dynamical core. fvGCM’s parallelisation capabilities based on OpenMP and MPI have been transferred to the tangent linear and adjoint codes. For OpenMP, TAF automatically inserts corresponding OpenMP directives in the derivative code. For MPI, TAF generates interfaces to hand-written tangent linear and adjoint wrapper routines. TAF also generates a scheme that allows the tangent linear and adjoint models to linearise around an external trajectory of the model state. The generation procedure is set up in an automated way, allowing quick updates of the derivative codes after modifications of fvGCM.

Ralf Giering, Thomas Kaminski, Ricardo Todling, Ronald Errico, Ronald Gelaro, Nathan Winslow

Efficient Sensitivities for the Spin-Up Phase

Abstract

In geosciences, it is common to spin up models by integrating with annually repeated boundary conditions. AD-generated code for evaluating sensitivities of the FInal cyclo-stationary state with respect to model parameters or boundary conditions usually includes a similar iteration for the derivative statements, possibly with a reduced number of iterations. We evaluate an alternative strategy that FIrst carries out the spin-up, then evaluates the full Jacobian for the FInal iteration and then applies the implicit function theorem to solve for the sensitivities of the cyclo-stationary state. We demonstrate the beneFIt of the strategy for the spin-up of a simple box-model of the atmospheric transport. We derive a heuristic inequality for this beneFIt, which increases with the number of iterations and decreases with the size of the state space.

Thomas Kaminski, Ralf Giering, Michael Voßbeck

Streamlined Circuit Device Model Development with fREEDAR® ãnd ADOL-C

Abstract

Time-marching simulation of electronic circuits using the U.C. Berkeley program Spice and variants has been a standard practice for electronics engineers since the mid-1970s. Unfortunately, the development cycle of Spice models may be lengthy because device model equations and their derivatives must be coded manually. Also, many files in the source tree must be modified to define a new model. fREEDA®, www.freeda.org, an object-oriented circuit simulator under development at several universities, overcomes many limitations of the conventional electronic model development paradigm. A key to this implementation is the ADOL-C package, which is used to automatically evaluate the derivatives of the device model equations. Resulting models are more compact, and the development time is shorter. The development history of selected Spice models and their fREEDAR® counterparts are presented to illustrate the advantages of this approach.

Frank P. Hart, Nikhil Kriplani, Sonali R. Luniya, Carlos E. Christoffersen, Michael B. Steer

Adjoint Differentiation of a Structural Dynamics Solver

Abstract

The design of a satellite boom using passive vibration control by Keane [J. of Sound and Vibration, 1995, 185(3), 441–453] has previously been carried out using an energy function of the design geometry aimed at minimising mechanical noise and vibrations. To minimise this cost function, a Genetic Algorithm (GA) was used, enabling modification of the initial geometry for a better design. To improve eficiency, it is proposed to couple the GA with a local search method involving the gradient of the cost function. In this paper, we detail the generation of an adjoint solver by automatic differentiation via Adifor 3.0. This has resulted in a gradient code that runs in 7.4 times the time of the function evaluation. This should reduce the rather time-consuming process (over 10 CPU days by using parallel processing) of the GA optimiser for this problem.

Mohamed Tadjouddine, Shaun A. Forth, Andy J. Keane

A Bibliography of Automatic Differentiation

Abstract

This is a bibliography of scientific literature cited by all chapters in the volume, Automatic Differentiation: Applications, Theory, and Implementations, Martin Bücker, George Corliss, Paul Hovland, Uwe Naumann, and Boyana Norris (eds.), Springer, New York, 2005 [78]. The collection contains more than 570 references, mostly related to automatic differentiation.

H. Martin Bücker, George F. Corliss

Springer Professional

Automatic Differentiation: Applications, Theory, and Implementations

Inhaltsverzeichnis

Frontmatter

Perspectives on Automatic Differentiation: Past, Present, and Future?

Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities

Solutions of ODEs with Removable Singularities

Automatic Propagation of Uncertainties

High-Order Representation of Poincarée Maps

Computation of Matrix Permanent with Automatic Differentiation

Computing Sparse Jacobian Matrices Optimally

Application of AD-based Quasi-Newton Methods to Stiff ODEs

Reduction of Storage Requirement by Checkpointing for Time-Dependent Optimal Control Problems in ODEs

Improving the Performance of the Vertex Elimination Algorithm for Derivative Calculation

Flattening Basic Blocks

The Adjoint Data-Flow Analyses: Formalization, Properties, and Applications

Semiautomatic Differentiation for Efficient Gradient Computations

Computing Adjoints with the NAGWare Fortran 95 Compiler

Extension of TAPENADE toward Fortran 95

A Macro Language for Derivative Definition in ADiMat

Transforming Equation-Based Models in Process Engineering

Simulation and Optimization of the Tevatron Accelerator

Periodic Orbits of Hybrid Systems and Parameter Estimation via AD

Implementation of Automatic Differentiation Tools for Multicriteria IMRT Optimization

Application of Targeted Automatic Differentiation to Large-Scale Dynamic Optimization

Automatic Differentiation: A Tool for Variational Data Assimilation and Adjoint Sensitivity Analysis for Flood Modeling

Development of an Adjoint for a Complex Atmospheric Model, the ARPS, using TAF

Tangent Linear and Adjoint Versions of NASA/GMAO’s Fortran 90 Global Weather Forecast Model

Efficient Sensitivities for the Spin-Up Phase

Streamlined Circuit Device Model Development with fREEDAR® ãnd ADOL-C

Adjoint Differentiation of a Structural Dynamics Solver

A Bibliography of Automatic Differentiation

Backmatter