Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders
Introduction
Physics-based modeling and simulation has become indispensable across many applications in engineering and science, ranging from aircraft design to monitoring national critical infrastructure. However, as simulation is playing an increasingly important role in scientific discovery, decision making, and design, greater demands are being placed on model fidelity. Achieving high fidelity often necessitates including fine spatiotemporal resolution in computational models of the system of interest; this can lead to very large-scale models whose simulations consume months on thousands of computing cores. This computational burden precludes the integration of such high-fidelity models in important scenarios that are real time or many queries in nature, as these scenarios require the (parameterized) computational model to be simulated very rapidly (e.g., model predictive control) or thousands of times (e.g., uncertainty propagation).
Projection-based reduced-order models (ROMs) provide one approach for overcoming this computational burden. These techniques comprise two stages: an offline stage and an online stage. During the offline stage, these methods perform computationally expensive training tasks (e.g., simulating the high-fidelity model at several points in the parameter space) to compute a representative low-dimensional ‘trial’ subspace for the system state. Next, during the inexpensive online stage, these methods rapidly compute approximate solutions for different points in the parameter space via projection: they compute solutions in the low-dimensional trial subspace by enforcing the high-fidelity-model residual to be orthogonal to a low-dimensional test subspace of the same dimension.
As suggested above, nearly all projection-based model-reduction approaches employ linear trial subspaces. This includes the reduced-basis technique [64], [69] and proper orthogonal decomposition (POD) [39], [16] for parameterized stationary problems; balanced truncation [56], rational interpolation [5], [34], and Craig–Bampton model reduction [21] for linear time invariant (LTI) systems; and Galerkin projection [39], least-squares Petrov–Galerkin projection [14], and other Petrov–Galerkin projections [80] with (balanced) POD [39], [46], [80], [68] for nonlinear dynamical systems.
The Kolmogorov n-width [63] provides one way to quantify the optimal linear trial subspace; it is defined as where the first infimum is taken over all n-dimensional subspaces of the state space, and denotes the manifold of solutions over all time and parameters. Assuming the dynamical system has a unique trajectory for each parameter instance, the intrinsic solution-manifold dimensionality is (at most) equal to the number of parameters plus one (time). For problems that exhibit a fast decaying Kolmogorov n-width (e.g., diffusion-dominated problems), employing a linear trial subspace is theoretically justifiable [4], [7] and has enjoyed many successful demonstrations. Unfortunately, many computational problems exhibit slowly decaying Kolmogorov n-width (e.g., advection-dominated problems). In such cases, the use of low-dimensional linear trial subspaces often produces inaccurate results; the ROM dimensionality must be significantly increased to yield acceptable accuracy [60]. Indeed, the Kolmogorov n-width with n equal to the intrinsic solution-manifold dimensionality is often quite large for such problems.
Several approaches have been pursued to address this n-width limitation of linear trial subspaces. One set of approaches transforms the trial basis to improve its approximation properties for advection-dominated problems. Such methods include separating transport dynamics via ‘freezing’ [59], applying a coordinate transformation to the trial basis [42], [58], [11], shifting the POD basis [65], transforming the physical domain of the snapshots [79], [78], constructing the trial basis on a Lagrangian formulation of the governing equations [55], and using Lax pairs of the Schrödinger operator to construct a time-evolving trial basis [29]. Other approaches pursue the use of multiple linear subspaces instead of employing a single global linear trial subspace; these local subspaces can be tailored to different regions of the time domain [26], [24], physical domain [73], or state space [3], [62]; Ref. [2] employs local trial subspaces in time and parameter spaces and applies -norm minimizations of the residual. Ref. [12] aims to overcome the limitations of using a linear trial subspace by providing online-adaptive h-refinement mechanism that constructs a hierarchical sequence of linear subspaces that converges to the original state space. However, all of these methods attempt to construct, manually transform, or refine a linear basis to be locally accurate; they do not consider nonlinear trial manifolds of more general structure. Further, many of these approaches rely on substantial additional knowledge about the problem, such as the particular advection phenomena governing basis shifting.
This work aims to address the fundamental n-width deficiency of linear trial subspaces. However, in contrast to the above methods, we pursue an approach that is both more general (i.e., it should not be limited to piecewise linear manifolds or mode transformations) and only requires the same snapshot data as typical POD-based approaches (e.g., it should require no special knowledge about any particular advection phenomena). To accomplish this objective, we propose an approach that (1) performs optimal projection of dynamical systems onto arbitrary nonlinear trial manifolds (during the online stage), and (2) computes this nonlinear trial manifold from snapshot data alone (during the offline stage).
For the first component, we perform optimal projection onto arbitrary (continuously differentiable) nonlinear trial manifolds by applying minimum-residual formulations at the time-continuous (ODE) and time-discrete (OΔE) levels. The time-continuous formulation leads to manifold Galerkin projection, which can be interpreted as performing orthogonal projection of the velocity onto the tangent space of the trial manifold. The time-discrete formulation leads to manifold least-squares Petrov–Galerkin (LSPG) projection, which can also be straightforwardly extended to stationary (i.e., steady-state) problems. We also perform analyses that illustrate the relationship between these manifold ROMs and classical linear-subspace ROMs. Manifold Galerkin and manifold LSPG projection require the trial manifold to be characterized as a (generally nonlinear) mapping from the low-dimensional reduced state to the high-dimensional state; the mapping from the high-dimensional state to the low-dimensional state is not required.
The second component aims to compute a nonlinear trial manifold from snapshot data alone. Many machine-learning methods exist to perform nonlinear dimensionality reduction. However, many of these methods do not provide the required mapping from the low-dimensional embedding to the high-dimensional input; examples include Isomap [75], locally linear embedding (LLE) [67], Hessian eigenmaps [25], spectral embedding [6], and t-SNE [51]. Methods that do provide this required mapping include self-organizing maps [45], generative topographic mapping [8], kernel principal component analysis (PCA) [72], Gaussian process latent variable model [47], diffeomorphic dimensionality reduction [77], and autoencoders [37]. In principle, manifold Galerkin and manifold LSPG projection could be applied with manifolds constructed by any of the methods in the latter category. However, this study restricts focus to autoencoders—more specifically deep convolutional autoencoders—due to their expressiveness and scalability, as well as the availability of high-performance software tools for their construction.
Autoencoders (also known as auto-associators [23]) comprise a specific type of feedforward neural network that aim to learn the identity mapping: they attempt to copy the input to an accurate approximation of itself. Learning the identity mapping is not a particularly useful task unless, however, it associates with a dimensionality-reduction procedure comprising data compression and subsequent recovery. This is precisely what autoencoders accomplish by employing a neural-network architecture consisting of two parts: an encoder that provides a nonlinear mapping from the high-dimensional input to a low-dimensional embedding, and a decoder that provides a nonlinear mapping from the low-dimensional embedding to an approximation of the high-dimensional input. Convolutional autoencoders are a specific type of autoencoder that employ convolutional layers, which have been shown to be effective for extracting representative features in images [53]. Inspired by the analogy between images and spatially distributed dynamical-system states (e.g., when the dynamical system corresponds to the spatial discretization of a partial-differential-equations model), we propose a specific deep convolutional autoencoder architecture tailored to dynamical systems with states that are spatially distributed. Critically, training this autoencoder requires only the same snapshot data as POD; no additional problem-specific information is needed.
In summary, new contributions of this work include:
- 1.
Manifold Galerkin (Section 3.2) and manifold LSPG (Section 3.3) projection techniques, which project the dynamical-system model onto arbitrary continuously-differentiable manifolds. We equip these methods with
- (a)
the ability to exactly satisfy the initial condition (Remark 3.1), and
- (b)
quasi-Newton solvers (Section 3.4) to solve the system of algebraic equations arising from implicit time integration.
- (a)
- 2.
Analysis (Section 4), which includes
- (a)
demonstrating that employing an affine trial manifold recovers classical linear-subspace Galerkin and LSPG projection (Proposition 4.1),
- (b)
sufficient conditions for commutativity of time discretization and manifold Galerkin projection (Theorem 4.1),
- (c)
conditions under which manifold Galerkin and manifold LSPG projection are equivalent (Theorem 4.2), and
- (d)
a posteriori discrete-time error bounds for both the manifold Galerkin and manifold LSPG projection methods (Theorem 4.3).
- (a)
- 3.
A novel convolutional autoencoder architecture tailored to spatially distributed dynamical-system states (Section 5.2) with accompanying offline training algorithm that requires only the same snapshot data as POD (Section 6).
- 4.
Numerical experiments on advection-dominated benchmark problems (Section 7). These experiments illustrate the ability of the method to outperform even the projection of the solution onto the optimal linear subspace; further, the proposed method is close to achieving the optimal performance of any nonlinear-manifold method. This demonstrates the method's ability to overcome the intrinsic n-width limitations of linear trial subspaces.
To the best of our knowledge, Refs. [35], [43] comprise the only attempts to incorporate an autoencoder within a projection-based ROM. These methods seek solutions in the nonlinear trial manifold provided by an autoencoder; however, these methods reduce the number of equations by applying the encoder to the velocity. Unfortunately, as we discuss in Remark 3.5, this approach is kinematically inconsistent, as the velocity resides in the tangent space to the manifold, not the manifold itself. Thus, encoding the velocity can produce significant approximation errors. Instead, the proposed manifold Galerkin and manifold LSPG projection methods produce approximations that associate with minimum-residual formulations and adhere to the kinematics imposed by the trial manifold.
Relatedly, Ref. [33] proposes a general framework for projection of dynamical systems onto nonlinear manifolds. However, the proposed method constructs a piecewise linear trial manifold by generating local linear subspaces and concatenating those subspaces. Then, the method projects the residual of the governing equations onto a nonlinear test manifold that is also piecewise linear; this is referred to as a ‘piecewise linear projection function’. Thus, the approach is limited to piecewise-linear manifolds, and the resulting approximation does not associate with any optimality property.
We also note that autoencoders have been applied to various non-intrusive model-reduction methods that are purely data driven in nature and are not based on a projection process. Examples include Ref. [54], which constructs a nonlinear model of wall turbulence using an autoencoder; Ref. [31], which applies an autoencoder to compress the state, followed by a recurrent neural network (RNN) [70] to learn the dynamics; Refs. [74], [61], [50], [57], which apply autoencoders to learn approximate invariant subspaces of the Koopman operator; and Ref. [18], which applies hierarchical dimensionality reduction comprising autoencoders and PCA followed by dynamics learning to recover missing CFD data.
The paper is organized as follows. Section 2 describes the full-order model, which corresponds to a parameterized system of (linear or nonlinear) ordinary differential equations. Section 3 describes model reduction on nonlinear manifolds, including the mathematical characterization of the nonlinear trial manifold (Section 3.1), manifold Galerkin projection (Section 3.2), manifold LSPG projection (Section 3.3), and associated quasi-Newton methods to solve the system of algebraic equations arising at each time instance in the case of implicit time integration (Section 3.4). Section 4 provides the aforementioned analysis results. Section 5 describes a practical approach for constructing the nonlinear trial manifold using deep convolutional autoencoders, including a brief description of autoencoders (Section 5.1), the proposed autoencoder architecture applicable to spatially distributed states (Section 5.2), and the way in which the proposed autoencoder can be used to satisfy the initial condition (Section 5.3). When the manifold Galerkin and manifold LSPG ROMs employ this choice of decoder, we refer to them as Deep Galerkin and Deep LSPG ROMs, respectively. Section 6 describes offline training, which entails snapshot-based data collection (Section 6.1), data standardization (Section 6.2), and autoencoder training (Section 6.3); Algorithm 1 summarizes the offline training stage. Section 7 assesses the performance of the proposed Deep Galerkin and Deep LSPG ROMs compared to (linear-subspace) POD–Galerkin and POD–LSPG ROMs on two advection-dominated benchmark problems. Finally, Section 8 concludes the paper.
Section snippets
Full-order model
This work considers the full-order model (FOM) to correspond to a dynamical system expressed as a parameterized system of ordinary differential equations (ODEs) where denotes time with final time , and denotes the time-dependent, parameterized state implicitly defined as the solution to problem (2.1) given the parameters . Here, denotes the parameter space, denotes the parameterized initial condition, and denotes
Model reduction on nonlinear manifolds
This section proposes two classes of residual-minimizing ROMs on nonlinear manifolds. The first minimizes the (time-continuous) FOM ODE residual and is analogous to classical Galerkin projection, while the second minimizes the (time-discrete) FOM OΔE residual and is analogous to least-squares Petrov–Galerkin (LSPG) projection [14], [17], [13]. Section 3.1 introduces the notion of a nonlinear trial manifold, Section 3.2 describes the manifold Galerkin ROM resulting from time-continuous residual
Analysis
We now perform analysis of the proposed manifold Galerkin and manifold LSPG projection methods. For notational simplicity, this section omits the dependence of operators on parameters μ. Proposition 4.1 Affine trial manifold recovers classical Galerkin and LSPG If the trial manifold is affine, then manifold LSPG projection is equivalent to classical linear-subspace LSPG projection. If additionally the decoder mapping associates with an orthgonal matrix, then manifold Galerkin projection is equivalent to classical linear-subspace Galerkin projection. Proof If the trial
Nonlinear trial manifold based on deep convolutional autoencoders
This section describes the approach we propose for constructing the decoder that defines the nonlinear trial manifold. As described in the introduction, any nonlinear-manifold learning method equipped with a continuously differentiable mapping from the generalized coordinates (i.e., the latent state) to an approximation of the state (i.e., the data) is compatible with the manifold Galerkin and manifold LSPG methods proposed in Section 3. Here, we pursue deep convolutional autoencoders
Offline training
This section describes the offline training process used to train the deep convolutional autoencoder proposed in Section 5.2. The approach employs precisely the same snapshot data used by POD. Section 6.1 describes the (snapshot-based) data collection procedure, which is identical to that employed by POD. Section 6.2 describes how the data are scaled to improve numerical stability of autoencoder training. Section 6.3 summarizes the gradient-based-optimization approach employed for training the
Numerical experiments
This section assesses the performance of the proposed Deep Galerkin and Deep LSPG ROMs, which employ nonlinear trial manifolds, compared to POD–Galerkin and POD–LSPG ROMs, which employ affine trial subspaces. We consider two advection-dominated benchmark problems: 1D Burgers' equation and a chemically reacting flow. We employ the numerical PDE tools and ROM functionality provided by pyMORTestbed [81], and we construct the autoencoder using TensorFlow [1].
For both benchmark problems, the Deep
Conclusion
This work has proposed novel manifold Galerkin and manifold LSPG projection techniques, which project dynamical-system models onto arbitrary continuously-differentiable nonlinear manifolds. We demonstrated how these methods can exactly satisfy the initial condition, and provided quasi-Newton solvers for implicit time integration.
We performed analyses that demonstrated that employing an affine trial manifold recovers classical linear-subspace Galerkin and LSPG projection. We also derived
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors gratefully acknowledge Matthew Zahr for graciously providing the pyMORTestbed code that was modified to obtain the numerical results, as well as Jeremy Morton for useful discussions on the application of convolutional neural networks to simulation data. The authors also thank Patrick Blonigan and Eric Parish for providing useful feedback. This work was sponsored by Sandia's Advanced Simulation and Computing (ASC) Verification and Validation (V&V) Project/Task #103723/05.30.02. This
References (81)
- et al.
Galerkin v. least-squares Petrov–Galerkin projection in nonlinear model reduction
J. Comput. Phys.
(2017) - et al.
Conservative model reduction for finite-volume models
J. Comput. Phys.
(2018) - et al.
The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows
J. Comput. Phys.
(2013) - et al.
Waanders, decreasing the temporal complexity for nonlinear, implicit reduced-order models by forecasting
Comput. Methods Appl. Mech. Eng.
(2015) - et al.
Model order reduction for parametrized nonlinear hyperbolic problems as an application to uncertainty quantification
J. Comput. Appl. Math.
(2019) - et al.
Approximated Lax pairs for the reduced order integration of nonlinear evolution equations
J. Comput. Phys.
(2014) Practical quasi-Newton methods for solving nonlinear systems
J. Comput. Appl. Math.
(2000)- et al.
Neural network modeling for near wall turbulent flow
J. Comput. Phys.
(2002) - et al.
Nonlinear reduced basis approximation of parameterized evolution equations via the method of freezing
C. R. Math.
(2013) - M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat,...
Model reduction using -norm minimization as an application to nonlinear hyperbolic problems
Int. J. Numer. Methods Fluids
Nonlinear model order reduction based on local reduced-order bases
Int. J. Numer. Methods Eng.
Kolmogorov widths and low-rank approximations of parametric elliptic PDEs
Math. Comput.
Interpolatory projection methods for parameterized model reduction
SIAM J. Sci. Comput.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Convergence rates for greedy algorithms in reduced basis methods
SIAM J. Math. Anal.
GTM: a principled alternative to the self-organizing map
Optimization methods for large-scale machine learning
SIAM Rev.
Projection-based model reduction for reacting flows
Model order reduction for problems with large convection effects
Adaptive h-refinement for reduced-order models
Int. J. Numer. Methods Eng.
Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations
Int. J. Numer. Methods Eng.
A low-cost, goal-oriented ‘compact proper orthogonal decomposition’ basis for model reduction of static systems
Int. J. Numer. Methods Eng.
Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning
Fast and accurate deep network learning by exponential linear units (ELUs)
Coupling of substructures for dynamic analyses
AIAA J.
Non-linear dimensionality reduction
Model reduction of parametrized evolution problems using the reduced basis method with adaptive time-partitioning
Hessian eigenmaps: locally linear embedding techniques for high-dimensional data
Proc. Natl. Acad. Sci.
Adaptive reduced basis methods for nonlinear convection–diffusion equations
Adaptive subgradient methods for online learning and stochastic optimization
J. Mach. Learn. Res.
A guide to convolution arithmetic for deep learning
Understanding the difficulty of training deep feedforward neural networks
Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems
Deep Learning, vol. 1
Model Order Reduction of Nonlinear Dynamical Systems
model reduction for large-scale linear dynamical systems
SIAM J. Matrix Anal. Appl.
A deep learning framework for model reduction of dynamical systems
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
Reducing the dimensionality of data with neural networks
Science
Cited by (593)
A hybrid method based on proper orthogonal decomposition and deep neural networks for flow and heat field reconstruction
2024, Expert Systems with ApplicationsLagrangian operator inference enhanced with structure-preserving machine learning for nonintrusive model reduction of mechanical systems
2024, Computer Methods in Applied Mechanics and EngineeringA graph convolutional autoencoder approach to model order reduction for parametrized PDEs
2024, Journal of Computational PhysicsBi-fidelity variational auto-encoder for uncertainty quantification
2024, Computer Methods in Applied Mechanics and EngineeringError assessment of an adaptive finite elements—neural networks method for an elliptic parametric PDE
2024, Computer Methods in Applied Mechanics and Engineering