Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders

https://doi.org/10.1016/j.jcp.2019.108973Get rights and content

Highlights

  • Two model-reduction methods that project dynamical systems on nonlinear manifolds.

  • Analysis including conditions under which the two methods are equivalent.

  • A novel convolutional autoencoder architecture to construct the nonlinear manifold.

  • Numerical experiments demonstrating the method outperforms linear subspaces.

Abstract

Nearly all model-reduction techniques project the governing equations onto a linear subspace of the original state space. Such subspaces are typically computed using methods such as balanced truncation, rational interpolation, the reduced-basis method, and (balanced) proper orthogonal decomposition (POD). Unfortunately, restricting the state to evolve in a linear subspace imposes a fundamental limitation to the accuracy of the resulting reduced-order model (ROM). In particular, linear-subspace ROMs can be expected to produce low-dimensional models with high accuracy only if the problem admits a fast decaying Kolmogorov n-width (e.g., diffusion-dominated problems). Unfortunately, many problems of interest exhibit a slowly decaying Kolmogorov n-width (e.g., advection-dominated problems). To address this, we propose a novel framework for projecting dynamical systems onto nonlinear manifolds using minimum-residual formulations at the time-continuous and time-discrete levels; the former leads to manifold Galerkin projection, while the latter leads to manifold least-squares Petrov–Galerkin (LSPG) projection. We perform analyses that provide insight into the relationship between these proposed approaches and classical linear-subspace reduced-order models; we also derive a posteriori discrete-time error bounds for the proposed approaches. In addition, we propose a computationally practical approach for computing the nonlinear manifold, which is based on convolutional autoencoders from deep learning. Finally, we demonstrate the ability of the method to significantly outperform even the optimal linear-subspace ROM on benchmark advection-dominated problems, thereby demonstrating the method's ability to overcome the intrinsic n-width limitations of linear subspaces.

Introduction

Physics-based modeling and simulation has become indispensable across many applications in engineering and science, ranging from aircraft design to monitoring national critical infrastructure. However, as simulation is playing an increasingly important role in scientific discovery, decision making, and design, greater demands are being placed on model fidelity. Achieving high fidelity often necessitates including fine spatiotemporal resolution in computational models of the system of interest; this can lead to very large-scale models whose simulations consume months on thousands of computing cores. This computational burden precludes the integration of such high-fidelity models in important scenarios that are real time or many queries in nature, as these scenarios require the (parameterized) computational model to be simulated very rapidly (e.g., model predictive control) or thousands of times (e.g., uncertainty propagation).

Projection-based reduced-order models (ROMs) provide one approach for overcoming this computational burden. These techniques comprise two stages: an offline stage and an online stage. During the offline stage, these methods perform computationally expensive training tasks (e.g., simulating the high-fidelity model at several points in the parameter space) to compute a representative low-dimensional ‘trial’ subspace for the system state. Next, during the inexpensive online stage, these methods rapidly compute approximate solutions for different points in the parameter space via projection: they compute solutions in the low-dimensional trial subspace by enforcing the high-fidelity-model residual to be orthogonal to a low-dimensional test subspace of the same dimension.

As suggested above, nearly all projection-based model-reduction approaches employ linear trial subspaces. This includes the reduced-basis technique [64], [69] and proper orthogonal decomposition (POD) [39], [16] for parameterized stationary problems; balanced truncation [56], rational interpolation [5], [34], and Craig–Bampton model reduction [21] for linear time invariant (LTI) systems; and Galerkin projection [39], least-squares Petrov–Galerkin projection [14], and other Petrov–Galerkin projections [80] with (balanced) POD [39], [46], [80], [68] for nonlinear dynamical systems.

The Kolmogorov n-width [63] provides one way to quantify the optimal linear trial subspace; it is defined asdn(M):=infSnsupfMinfgSnfg, where the first infimum is taken over all n-dimensional subspaces of the state space, and M denotes the manifold of solutions over all time and parameters. Assuming the dynamical system has a unique trajectory for each parameter instance, the intrinsic solution-manifold dimensionality is (at most) equal to the number of parameters plus one (time). For problems that exhibit a fast decaying Kolmogorov n-width (e.g., diffusion-dominated problems), employing a linear trial subspace is theoretically justifiable [4], [7] and has enjoyed many successful demonstrations. Unfortunately, many computational problems exhibit slowly decaying Kolmogorov n-width (e.g., advection-dominated problems). In such cases, the use of low-dimensional linear trial subspaces often produces inaccurate results; the ROM dimensionality must be significantly increased to yield acceptable accuracy [60]. Indeed, the Kolmogorov n-width with n equal to the intrinsic solution-manifold dimensionality is often quite large for such problems.

Several approaches have been pursued to address this n-width limitation of linear trial subspaces. One set of approaches transforms the trial basis to improve its approximation properties for advection-dominated problems. Such methods include separating transport dynamics via ‘freezing’ [59], applying a coordinate transformation to the trial basis [42], [58], [11], shifting the POD basis [65], transforming the physical domain of the snapshots [79], [78], constructing the trial basis on a Lagrangian formulation of the governing equations [55], and using Lax pairs of the Schrödinger operator to construct a time-evolving trial basis [29]. Other approaches pursue the use of multiple linear subspaces instead of employing a single global linear trial subspace; these local subspaces can be tailored to different regions of the time domain [26], [24], physical domain [73], or state space [3], [62]; Ref. [2] employs local trial subspaces in time and parameter spaces and applies 1-norm minimizations of the residual. Ref. [12] aims to overcome the limitations of using a linear trial subspace by providing online-adaptive h-refinement mechanism that constructs a hierarchical sequence of linear subspaces that converges to the original state space. However, all of these methods attempt to construct, manually transform, or refine a linear basis to be locally accurate; they do not consider nonlinear trial manifolds of more general structure. Further, many of these approaches rely on substantial additional knowledge about the problem, such as the particular advection phenomena governing basis shifting.

This work aims to address the fundamental n-width deficiency of linear trial subspaces. However, in contrast to the above methods, we pursue an approach that is both more general (i.e., it should not be limited to piecewise linear manifolds or mode transformations) and only requires the same snapshot data as typical POD-based approaches (e.g., it should require no special knowledge about any particular advection phenomena). To accomplish this objective, we propose an approach that (1) performs optimal projection of dynamical systems onto arbitrary nonlinear trial manifolds (during the online stage), and (2) computes this nonlinear trial manifold from snapshot data alone (during the offline stage).

For the first component, we perform optimal projection onto arbitrary (continuously differentiable) nonlinear trial manifolds by applying minimum-residual formulations at the time-continuous (ODE) and time-discrete (OΔE) levels. The time-continuous formulation leads to manifold Galerkin projection, which can be interpreted as performing orthogonal projection of the velocity onto the tangent space of the trial manifold. The time-discrete formulation leads to manifold least-squares Petrov–Galerkin (LSPG) projection, which can also be straightforwardly extended to stationary (i.e., steady-state) problems. We also perform analyses that illustrate the relationship between these manifold ROMs and classical linear-subspace ROMs. Manifold Galerkin and manifold LSPG projection require the trial manifold to be characterized as a (generally nonlinear) mapping from the low-dimensional reduced state to the high-dimensional state; the mapping from the high-dimensional state to the low-dimensional state is not required.

The second component aims to compute a nonlinear trial manifold from snapshot data alone. Many machine-learning methods exist to perform nonlinear dimensionality reduction. However, many of these methods do not provide the required mapping from the low-dimensional embedding to the high-dimensional input; examples include Isomap [75], locally linear embedding (LLE) [67], Hessian eigenmaps [25], spectral embedding [6], and t-SNE [51]. Methods that do provide this required mapping include self-organizing maps [45], generative topographic mapping [8], kernel principal component analysis (PCA) [72], Gaussian process latent variable model [47], diffeomorphic dimensionality reduction [77], and autoencoders [37]. In principle, manifold Galerkin and manifold LSPG projection could be applied with manifolds constructed by any of the methods in the latter category. However, this study restricts focus to autoencoders—more specifically deep convolutional autoencoders—due to their expressiveness and scalability, as well as the availability of high-performance software tools for their construction.

Autoencoders (also known as auto-associators [23]) comprise a specific type of feedforward neural network that aim to learn the identity mapping: they attempt to copy the input to an accurate approximation of itself. Learning the identity mapping is not a particularly useful task unless, however, it associates with a dimensionality-reduction procedure comprising data compression and subsequent recovery. This is precisely what autoencoders accomplish by employing a neural-network architecture consisting of two parts: an encoder that provides a nonlinear mapping from the high-dimensional input to a low-dimensional embedding, and a decoder that provides a nonlinear mapping from the low-dimensional embedding to an approximation of the high-dimensional input. Convolutional autoencoders are a specific type of autoencoder that employ convolutional layers, which have been shown to be effective for extracting representative features in images [53]. Inspired by the analogy between images and spatially distributed dynamical-system states (e.g., when the dynamical system corresponds to the spatial discretization of a partial-differential-equations model), we propose a specific deep convolutional autoencoder architecture tailored to dynamical systems with states that are spatially distributed. Critically, training this autoencoder requires only the same snapshot data as POD; no additional problem-specific information is needed.

In summary, new contributions of this work include:

  • 1.

    Manifold Galerkin (Section 3.2) and manifold LSPG (Section 3.3) projection techniques, which project the dynamical-system model onto arbitrary continuously-differentiable manifolds. We equip these methods with

    • (a)

      the ability to exactly satisfy the initial condition (Remark 3.1), and

    • (b)

      quasi-Newton solvers (Section 3.4) to solve the system of algebraic equations arising from implicit time integration.

  • 2.

    Analysis (Section 4), which includes

    • (a)

      demonstrating that employing an affine trial manifold recovers classical linear-subspace Galerkin and LSPG projection (Proposition 4.1),

    • (b)

      sufficient conditions for commutativity of time discretization and manifold Galerkin projection (Theorem 4.1),

    • (c)

      conditions under which manifold Galerkin and manifold LSPG projection are equivalent (Theorem 4.2), and

    • (d)

      a posteriori discrete-time error bounds for both the manifold Galerkin and manifold LSPG projection methods (Theorem 4.3).

  • 3.

    A novel convolutional autoencoder architecture tailored to spatially distributed dynamical-system states (Section 5.2) with accompanying offline training algorithm that requires only the same snapshot data as POD (Section 6).

  • 4.

    Numerical experiments on advection-dominated benchmark problems (Section 7). These experiments illustrate the ability of the method to outperform even the projection of the solution onto the optimal linear subspace; further, the proposed method is close to achieving the optimal performance of any nonlinear-manifold method. This demonstrates the method's ability to overcome the intrinsic n-width limitations of linear trial subspaces.

We note that the methodology is applicable to both linear and nonlinear dynamical systems.

To the best of our knowledge, Refs. [35], [43] comprise the only attempts to incorporate an autoencoder within a projection-based ROM. These methods seek solutions in the nonlinear trial manifold provided by an autoencoder; however, these methods reduce the number of equations by applying the encoder to the velocity. Unfortunately, as we discuss in Remark 3.5, this approach is kinematically inconsistent, as the velocity resides in the tangent space to the manifold, not the manifold itself. Thus, encoding the velocity can produce significant approximation errors. Instead, the proposed manifold Galerkin and manifold LSPG projection methods produce approximations that associate with minimum-residual formulations and adhere to the kinematics imposed by the trial manifold.

Relatedly, Ref. [33] proposes a general framework for projection of dynamical systems onto nonlinear manifolds. However, the proposed method constructs a piecewise linear trial manifold by generating local linear subspaces and concatenating those subspaces. Then, the method projects the residual of the governing equations onto a nonlinear test manifold that is also piecewise linear; this is referred to as a ‘piecewise linear projection function’. Thus, the approach is limited to piecewise-linear manifolds, and the resulting approximation does not associate with any optimality property.

We also note that autoencoders have been applied to various non-intrusive model-reduction methods that are purely data driven in nature and are not based on a projection process. Examples include Ref. [54], which constructs a nonlinear model of wall turbulence using an autoencoder; Ref. [31], which applies an autoencoder to compress the state, followed by a recurrent neural network (RNN) [70] to learn the dynamics; Refs. [74], [61], [50], [57], which apply autoencoders to learn approximate invariant subspaces of the Koopman operator; and Ref. [18], which applies hierarchical dimensionality reduction comprising autoencoders and PCA followed by dynamics learning to recover missing CFD data.

The paper is organized as follows. Section 2 describes the full-order model, which corresponds to a parameterized system of (linear or nonlinear) ordinary differential equations. Section 3 describes model reduction on nonlinear manifolds, including the mathematical characterization of the nonlinear trial manifold (Section 3.1), manifold Galerkin projection (Section 3.2), manifold LSPG projection (Section 3.3), and associated quasi-Newton methods to solve the system of algebraic equations arising at each time instance in the case of implicit time integration (Section 3.4). Section 4 provides the aforementioned analysis results. Section 5 describes a practical approach for constructing the nonlinear trial manifold using deep convolutional autoencoders, including a brief description of autoencoders (Section 5.1), the proposed autoencoder architecture applicable to spatially distributed states (Section 5.2), and the way in which the proposed autoencoder can be used to satisfy the initial condition (Section 5.3). When the manifold Galerkin and manifold LSPG ROMs employ this choice of decoder, we refer to them as Deep Galerkin and Deep LSPG ROMs, respectively. Section 6 describes offline training, which entails snapshot-based data collection (Section 6.1), data standardization (Section 6.2), and autoencoder training (Section 6.3); Algorithm 1 summarizes the offline training stage. Section 7 assesses the performance of the proposed Deep Galerkin and Deep LSPG ROMs compared to (linear-subspace) POD–Galerkin and POD–LSPG ROMs on two advection-dominated benchmark problems. Finally, Section 8 concludes the paper.

Section snippets

Full-order model

This work considers the full-order model (FOM) to correspond to a dynamical system expressed as a parameterized system of ordinary differential equations (ODEs)x˙=f(x,t;μ),x(0;μ)=x0(μ), where t[0,T] denotes time with final time TR+, and x:[0,T]×DRN denotes the time-dependent, parameterized state implicitly defined as the solution to problem (2.1) given the parameters μD. Here, DRnμ denotes the parameter space, x0:DRN denotes the parameterized initial condition, and f:RN×[0,T]×DRN denotes

Model reduction on nonlinear manifolds

This section proposes two classes of residual-minimizing ROMs on nonlinear manifolds. The first minimizes the (time-continuous) FOM ODE residual and is analogous to classical Galerkin projection, while the second minimizes the (time-discrete) FOM OΔE residual and is analogous to least-squares Petrov–Galerkin (LSPG) projection [14], [17], [13]. Section 3.1 introduces the notion of a nonlinear trial manifold, Section 3.2 describes the manifold Galerkin ROM resulting from time-continuous residual

Analysis

We now perform analysis of the proposed manifold Galerkin and manifold LSPG projection methods. For notational simplicity, this section omits the dependence of operators on parameters μ.

Proposition 4.1 Affine trial manifold recovers classical Galerkin and LSPG

If the trial manifold is affine, then manifold LSPG projection is equivalent to classical linear-subspace LSPG projection. If additionally the decoder mapping associates with an orthgonal matrix, then manifold Galerkin projection is equivalent to classical linear-subspace Galerkin projection.

Proof

If the trial

Nonlinear trial manifold based on deep convolutional autoencoders

This section describes the approach we propose for constructing the decoder g:RpRN that defines the nonlinear trial manifold. As described in the introduction, any nonlinear-manifold learning method equipped with a continuously differentiable mapping from the generalized coordinates (i.e., the latent state) to an approximation of the state (i.e., the data) is compatible with the manifold Galerkin and manifold LSPG methods proposed in Section 3. Here, we pursue deep convolutional autoencoders

Offline training

This section describes the offline training process used to train the deep convolutional autoencoder proposed in Section 5.2. The approach employs precisely the same snapshot data used by POD. Section 6.1 describes the (snapshot-based) data collection procedure, which is identical to that employed by POD. Section 6.2 describes how the data are scaled to improve numerical stability of autoencoder training. Section 6.3 summarizes the gradient-based-optimization approach employed for training the

Numerical experiments

This section assesses the performance of the proposed Deep Galerkin and Deep LSPG ROMs, which employ nonlinear trial manifolds, compared to POD–Galerkin and POD–LSPG ROMs, which employ affine trial subspaces. We consider two advection-dominated benchmark problems: 1D Burgers' equation and a chemically reacting flow. We employ the numerical PDE tools and ROM functionality provided by pyMORTestbed [81], and we construct the autoencoder using TensorFlow [1].

For both benchmark problems, the Deep

Conclusion

This work has proposed novel manifold Galerkin and manifold LSPG projection techniques, which project dynamical-system models onto arbitrary continuously-differentiable nonlinear manifolds. We demonstrated how these methods can exactly satisfy the initial condition, and provided quasi-Newton solvers for implicit time integration.

We performed analyses that demonstrated that employing an affine trial manifold recovers classical linear-subspace Galerkin and LSPG projection. We also derived

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors gratefully acknowledge Matthew Zahr for graciously providing the pyMORTestbed code that was modified to obtain the numerical results, as well as Jeremy Morton for useful discussions on the application of convolutional neural networks to simulation data. The authors also thank Patrick Blonigan and Eric Parish for providing useful feedback. This work was sponsored by Sandia's Advanced Simulation and Computing (ASC) Verification and Validation (V&V) Project/Task #103723/05.30.02. This

References (81)

  • R. Abgrall et al.

    Model reduction using L1-norm minimization as an application to nonlinear hyperbolic problems

    Int. J. Numer. Methods Fluids

    (2018)
  • D. Amsallem et al.

    Nonlinear model order reduction based on local reduced-order bases

    Int. J. Numer. Methods Eng.

    (2012)
  • M. Bachmayr et al.

    Kolmogorov widths and low-rank approximations of parametric elliptic PDEs

    Math. Comput.

    (2017)
  • U. Baur et al.

    Interpolatory projection methods for parameterized model reduction

    SIAM J. Sci. Comput.

    (2011)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • P. Binev et al.

    Convergence rates for greedy algorithms in reduced basis methods

    SIAM J. Math. Anal.

    (2011)
  • C.M. Bishop et al.

    GTM: a principled alternative to the self-organizing map

  • L. Bottou et al.

    Optimization methods for large-scale machine learning

    SIAM Rev.

    (2018)
  • M. Buffoni et al.

    Projection-based model reduction for reacting flows

  • N. Cagniart et al.

    Model order reduction for problems with large convection effects

  • K. Carlberg

    Adaptive h-refinement for reduced-order models

    Int. J. Numer. Methods Eng.

    (2015)
  • K. Carlberg et al.

    Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations

    Int. J. Numer. Methods Eng.

    (2011)
  • K. Carlberg et al.

    A low-cost, goal-oriented ‘compact proper orthogonal decomposition’ basis for model reduction of static systems

    Int. J. Numer. Methods Eng.

    (2011)
  • K. Carlberg et al.

    Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning

  • D. Clevert et al.

    Fast and accurate deep network learning by exponential linear units (ELUs)

  • R. Craig et al.

    Coupling of substructures for dynamic analyses

    AIAA J.

    (1968)
  • D. DeMers et al.

    Non-linear dimensionality reduction

  • M. Dihlmann et al.

    Model reduction of parametrized evolution problems using the reduced basis method with adaptive time-partitioning

  • D.L. Donoho et al.

    Hessian eigenmaps: locally linear embedding techniques for high-dimensional data

    Proc. Natl. Acad. Sci.

    (2003)
  • M. Drohmann et al.

    Adaptive reduced basis methods for nonlinear convection–diffusion equations

  • J. Duchi et al.

    Adaptive subgradient methods for online learning and stochastic optimization

    J. Mach. Learn. Res.

    (2011)
  • V. Dumoulin et al.

    A guide to convolution arithmetic for deep learning

  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

  • F.J. Gonzalez et al.

    Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems

  • I. Goodfellow et al.

    Deep Learning, vol. 1

    (2016)
  • C. Gu

    Model Order Reduction of Nonlinear Dynamical Systems

    (2011)
  • S. Gugercin et al.

    H2 model reduction for large-scale linear dynamical systems

    SIAM J. Matrix Anal. Appl.

    (2008)
  • D. Hartman et al.

    A deep learning framework for model reduction of dynamical systems

  • K. He et al.

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • Cited by (593)

    • Bi-fidelity variational auto-encoder for uncertainty quantification

      2024, Computer Methods in Applied Mechanics and Engineering
    View all citing articles on Scopus
    View full text