1 Introduction

Simulation of multi-phase flow in a subsurface porous media is an essential task for a number of engineering applications including ground water management, contaminant transport, and effective extraction of hydrocarbon resources (Petvipusit et al. 2014; Elsheikh et al. 2013). The physics governing subsurface flow simulations are mainly modeled by a system of coupled nonlinear partial differential equations (PDEs) parametrized by subsurface properties (e.g., porosity and permeability) (Aarnes et al. 2007). In realistic settings, subsurface models are computationally expensive (i.e., large number of grid block is needed) as the subsurface properties are heterogeneous and the solution exhibits multi-scale features (Elsheikh et al. 2012; Petvipusit et al. 2014).

Moreover, these subsurface properties are only known at a sparse set of points (i.e., well locations), and the grid properties are populated stochastically over the entire domain (Ibrahima 2016; Elsheikh et al. 2012, 2013). Monte Carlo methods are usually employed to propagate the uncertainties in the subsurface properties to the flow response. Monte Carlo methods are computationally very expensive since a large number of forward simulations are necessary to estimate the statistics of the engineering quantities of interest (Petvipusit et al. 2014; Elsheikh et al. 2013; Ibrahima 2016). Likewise, Bayesian inference tasks require a very large number of forward simulations to sharpen our knowledge about the unknown model parameters by utilizing field observation data (Elsheikh et al. 2012, 2013). For example, Markov chain Monte Carlo (MCMC) method (and its variants) requires a large number (in millions) of reservoir simulations to reach convergence and to avoid biased posterior estimates of the model parameters.

Surrogate models can be used to overcome the computational burden of multi-query tasks (e.g., uncertainty quantification, model-based optimization) governed by large-scale PDEs (Frangos et al. 2010; Koziel and Leifsson 2013; He 2013; Elsheikh et al. 2014; Josset et al. 2015; Bazargan et al. 2015). Surrogate models are computationally efficient mathematical models that can effectively approximate the main characteristics of the full-order model (full model) (Frangos et al. 2010). A number of surrogate modeling techniques have been developed and could be broadly classified into three classes: simplified physics-based models (Durlofsky and Chen 2012; Josset et al. 2015), data-fit black-box models (Frangos et al. 2010; Li et al. 2017; Yeten et al. 2005), and projection-based reduced-order models commonly referred to as reduced model (Berkooz et al. 1993; Lassila et al. 2014; Antoulas et al. 2001; Fang et al. 2013). Physics-based surrogate models are derived from high-fidelity models using approaches such as simplifying physics assumptions, using coarse grids, and/or upscaling of the model parameters (Durlofsky and Chen 2012; Frangos et al. 2010; He 2013; Babaei et al. 2013). Data-fit models are generated using the detailed simulation data to regress the relation between the input and the corresponding output of interest (Frangos et al. 2010; Yeten et al. 2005; Abdi-Khanghah et al. 2018; Wood 2018). For a complete review of various surrogate modeling techniques, we refer the readers to the following papers by Asher et al. (2015), Frangos et al. (2010), Koziel and Leifsson (2013) and Razavi et al. (2012).

In projection-based reduced-order models (utilized in this paper), the governing equations of the full model are projected into a low-dimensional subspace spanned by a small set of basis functions via Galerkin projection (Lassila et al. 2014; Antoulas et al. 2001). Projection-based ROMs rely on the assumption that most of the information and characteristics of the full model state variables can be efficiently represented by linear combinations of only a small number of basis functions. This assumption enables reduced model to accurately capture the input–output relationship of the full model with a significantly lower number of unknowns (Frangos et al. 2010; Lassila et al. 2014; Antoulas et al. 2001). Projection-based reduced-order models are broadly categorized into system-based methods and snapshot-based methods. System-based methods like balanced truncation realization methods (Gugercin and Antoulas 2004) and Krylov subspace methods (Freund 2003) use the characteristics of the full model and have been developed mainly for linear time-invariant problems, although much progress has been done on extensions of these methods to nonlinear problems (Lall et al. 2002). Snapshot-based methods such as reduced basis methods (Rozza et al. 2007) and proper orthogonal decomposition (POD) (Sirovich 1987; Berkooz et al. 1993) derive the projection bases from a set of full model solutions (the snapshots).

In this work, we employ POD-based reduced model to accelerate Monte Carlo simulation of subsurface flow models. The basis functions obtained from the POD are optimal in the sense that, for the same number of basis functions, no other bases can represent the given snapshot set with lower least-squares error than the POD bases (Lassila et al. 2014; Sirovich 1987) (see Sect. 3 for further details). Lumley (1967) was the first to apply POD techniques in fluid flow simulations. Since then, POD procedures have successfully been applied in a number of application areas (e.g., Sirovich 1987; Zheng et al. 2002; Cao et al. 2006; Bui-Thanh et al. 2004; Meyer and Matthies 2003; Astrid 2004; Jin and Durlofsky 2018).

In the context of fluid flow in porous media, Vermeulen et al. (2004) introduced POD in the confined, groundwater flow problems (linear subsurface flow model). Vermeulen et al. (2006) applied POD in gradient-based optimization problem involving groundwater flow model. McPhee and Yeh (2008) employed POD to enhance the groundwater management optimization problem. Siade et al. (2010) introduced a new methodology for the optimal selection of snapshots in such a way that the resulting POD basis functions account for the maximal variance of the full model solution. Within the context of oil reservoir simulation, Heijn et al. (2003) and Van Doren et al. (2006) applied POD to accelerate the optimization of a waterflood process. Cardoso et al. (2009) incorporated a new snapshot clustering procedure to enhance the standard POD for oil–water subsurface flow problems.

In the context of Monte Carlo simulations applied to stochastic subsurface flow problems, POD-based ROMs were mainly employed only when the governing equation was linear (or nearly linear) (Cardoso and Durlofsky 2010; Pasetto et al. 2011, 2013; Boyce and Yeh 2014). Pasetto et al. (2011) employed POD-based reduced model to construct MC realizations of two-dimensional steady-state confined groundwater flow subject to a spatially distributed random recharge. Pasetto et al. (2013) applied POD to accelerate the MC simulations of transient confined groundwater flow models with stochastic hydraulic conductivity. Baú (2012) derived a set of POD ROMs for each MC realization of hydraulic conductivity to solve a stochastic, multi-objective, confined groundwater management problem. Boyce and Yeh (2014) applied a single parameter-independent POD reduced model to the deterministic inverse problem and the Bayesian inverse problem involving linear groundwater flow model. In addition to the limitation of using only linear flow models, the UQ tasks in the aforementioned literature involve only low-dimensional uncertain parameters.

Within the context of nonlinear subsurface flow problems, the target application of POD was mainly hydrocarbon production optimization, where POD ROMs were used mainly to optimize well control parameters (e.g., bottomhole pressure) (Cardoso and Durlofsky 2010; He et al. 2011; Trehan and Durlofsky 2016; Rousset et al. 2014; Jansen and Durlofsky 2017). Recently, Jansen and Durlofsky (2017) has done an extensive review on the use of reduced-order models in well control optimization. For the well control applications, POD achieved reasonable levels of accuracy only when the well controls in test runs were relatively close to those used in training runs. In the case where the test controls substantially differ from those used in the initial training runs, additional computational steps were needed. For example, refitting the POD basis functions was performed in Trehan and Durlofsky (2016), which impose some additional computational overhead. Although POD combined with Galerkin projection has been applied more frequently to nonlinear flow problems (Bui-Thanh et al. 2004; Berkooz et al. 1993; Rousset et al. 2014), the effectiveness of POD–Galerkin-based model in handling nonlinear systems is limited mainly by two factors. The first factor is related to the treatment of the nonlinear terms in the POD–Galerkin reduced model (Chaturantabut and Sorensen 2010; Rewienski and White 2003; Cardoso and Durlofsky 2010), and the second factor is related to maintaining the overall stability of the resulting reduced model (Cardoso and Durlofsky 2010; He 2010, 2013; Bui-Thanh et al. 2007; Wang et al. 2012).

In relation to computing reduced non-polynomial nonlinear functions, POD-based ROMs are usually dependent on the full model state variables, and henceforth, the computational cost of evaluating the reduced model is still a function of full model dimension. Several techniques have been developed to reduce the computational cost of evaluating the nonlinear term in POD ROMs including trajectory piecewise linearization (TPWL) (Rewienski and White 2003), gappy POD technique (Willcox 2006), missing point estimation (MPE) (Barrault et al. 2004), best point interpolation method (Nguyen et al. 2008), and discrete empirical interpolation method (DEIM) (Barrault et al. 2004; Chaturantabut and Sorensen 2010). Among these techniques, TPWL and DEIM are widely used for efficient treatment of nonlinearities in multi-phase flow reservoir simulations (Ghasemi 2015; He 2010, 2013).

In TPWL method (Rewienski and White 2003), the nonlinear function is first approximated by a piecewise linear function obtained by linearizing the full-order model at a predetermined set of points in the time and the parameter space. Then, the nonlinear full model is replaced by an adequately weighted sum of the selected linearized systems (Rewienski and White 2003). Finally, the reduced model can be obtained by projecting the resultant linearized full-order system using standard techniques like POD (Rewienski and White 2003). The TPWL method was first introduced in Rewienski and White (2003) for modeling nonlinear circuits and micromachined devices. In the context of subsurface flow problems, TPWL procedures were applied in Cardoso and Durlofsky (2010), He et al. (2011), Trehan and Durlofsky (2016) and Rousset et al. (2014) to accelerate the solution of production optimization problems.

In DEIM, the nonlinear term in the full model is approximated by a linear combination of a set of basis vectors (Chaturantabut and Sorensen 2010). The coefficients of expansion are determined by evaluating the nonlinear term only at a small number of selected interpolation points (Chaturantabut and Sorensen 2010). DEIM was developed in Chaturantabut and Sorensen (2010) for model reduction of general nonlinear system of ordinary differential equations (ODEs) and has been used in several areas (Chaturantabut and Sorensen 2012; Xiao et al. 2014; Buffoni and Willcox 2010). Within the context of subsurface flow problems, Chaturantabut and Sorensen (2011) applied DEIM for model reduction of viscous fingering problems of an incompressible fluid through a two-dimensional homogeneous porous medium. Alghareeb and Williams (2013) combined DEIM with POD procedures, and the resultant reduced model was applied in waterflood optimization problem. Recently, Ghasemi (2015) applied POD with DEIM to an optimal control problem governed by two-phase flow in a porous media. Next, Ghasemi (2015) used machine learning technique to construct a number of POD–DEIM local reduced-order models. In that work, machine learning technique was used to construct a number of POD–DEIM local reduced-order models and then a specific local reduced-order model was selected with respect to the current state of the dynamical system during the gradient-based optimization task. Similarly, Yoon et al. (2014) used multiple local DEIM approximations in POD reduced model framework to reduce the computational costs of high-fidelity reservoir simulations.

The overall convergence and stability is another issue that limits the applicability of POD–Galerkin-based ROMs. POD–Galerkin projection methods manage to decrease the computational complexity by orders of magnitude as a result of state variable’s dimension reduction. However, this reduction goes hand in hand with a loss in accuracy. Moreover, slow convergence and in some cases model instabilities (Wang et al. 2012; He 2010; Bui-Thanh et al. 2007) are observed as the errors in the reduced state variables are propagated in time. More specifically, the performance of POD–Galerkin ROMs is directly influenced by the number of POD basis used in the POD–Galerkin projection. However, in many applications involving nonlinear conservation laws (e.g., high Reynolds number fluid flow), POD–Galerkin reduced-order models have shown poor performance even after retaining a sufficient number of POD basis (Wang et al. 2012; Sirovich 1987; Berkooz et al. 1993).

Several stabilization techniques have been proposed in the recent literature to build a stabilized POD-based reduced models. A notable stabilization technique relies on closing the POD reduced model using a set of closure models similar to those adopted in turbulence modeling (Berkooz et al. 1993; Wang et al. 2012). The objective of applying closure models within POD-based reduced model is to include the effects of the discarded POD basis functions in the extracted reduced model (Berkooz et al. 1993; Wang et al. 2012). Wang et al. (2012) showed that POD–Galerkin reduced model yielded inaccurate and physically implausible results when applied to the numerical simulation of a 3D turbulent flow past a cylinder at Reynolds number of 1000. Wang et al. (2012) addressed the aforementioned accuracy and stability issues of POD reduced model by various closure models, where artificial viscosity was added to the real viscosity parameter to stabilize the POD-based reduced model.

Another major approach to enhance the stability of POD–Galerkin reduced model is to compute a new set of optimal basis or to improve the POD basis vectors by solving a constrained optimization problem. Bui-Thanh et al. (2007) determined a new set of optimal basis vectors by formulating an optimization problem constrained by the equations of the resultant reduced model and demonstrated the stability of the proposed approach on linear dynamical systems. We note that POD–Galerkin reduced model orthogonally projects the nonlinear residual into the subspace spanned by the POD basis vectors. Unlike POD–Galerkin reduced model, Petrov–Galerkin projection scheme designs a different set of orthonormal basis called left reduced-order basis into which the nonlinear residual is projected. Carlberg et al. (2011) formulated stable Petrov–Galerkin reduced model in which the left reduced-order basis vectors were computed from an optimization problem at every iteration of the Gauss Newton method. He (2010) observed that poor spectral properties of the reduced Jacobian matrix could cause numerical instabilities in POD–Galerkin TPWL reduced model. Hence, He (2010) improved the stability of the POD-based reduced model by determining the optimal dimension of the reduced model through an extensive search over a range of integer numbers. We note that all the above-mentioned optimization procedures involve computationally expensive procedures to maintain stability and in many cases, the stability of the extracted reduced model is still not guaranteed (He 2010, 2013).

Recently, data-fit black-box models have been combined with POD (Xiao et al. 2017) to develop non-intrusive POD-based ROMs, where the data-fit models are used to regress the relationship between the input parameter and the reduced representation of the full model state vector. Hence, non-intrusive ROMs do not require any knowledge of the full-order model and are mainly developed to circumvent the shortcomings in accessing the governing equations of the full model (Xiao et al. 2017). However, it can also be used to address the stability and nonlinearity issues of POD-based ROMs. Wang et al. (2017) developed a non-intrusive POD reduced model using recurrent neural network (RNN) as a data-fit model and presented two fluid dynamics test cases namely, flow past a cylinder and a simplified wind-driven ocean gyre. RNN is a class of artificial neural network (Pascanu et al. 2013a; Mikolov et al. 2014) which has at least one feedback connection in addition to the feedforward connections (Pascanu et al. 2013a, b; Irsoy and Cardie 2014). In the context of data-fit models, RNN has been successfully applied to various sequence modeling tasks such as automatic speech recognition and system identification of time series data (Hermans and Schrauwen 2013; He et al. 2015; Hinton et al. 2012; Graves 2013). Additionally, RNN has been applied to emulate the evolution of nonlinear dynamical systems in a number of applications (Zimmermann et al. 2012; Bailer-Jones et al. 1998) and henceforth has large potential in building reduced-order models. However, the applicability of non-intrusive ROMs is severely undermined in many real-world problems, where increasing the dimensionality of the input parameter space increases the complexity and training time of the data-fit model.

In summary, among many surrogate modeling techniques, POD–Galerkin reduced model is a viable option for accelerating multi-query tasks like UQ. Generally, POD–Galerkin reduced model is well established for linear systems, and for nonlinear systems with parametric dependence, POD could be either combined with TPWL or with DEIM for modeling subsurface flow systems (Cardoso and Durlofsky 2010; He et al. 2011; Trehan and Durlofsky 2016; Ghasemi 2015). However, POD reduced model does not preserve the stability properties of the corresponding full-order model, and current state-of-the-art POD stabilization techniques (Wang et al. 2012; He 2010, 2013) are not cost-effective and ultimately do not guarantee stability of the extracted reduced-order models.

In this paper, we use DR-RNN (Nagoor Kani and Elsheikh 2017) to alleviate the potential limitations of POD–Galerkin reduced models. More specifically, we combine DR-RNN with POD–Galerkin and DEIM methods to derive an accurate and computationally effective reduced model for uncertainty quantification (UQ) tasks. The architecture of DR-RNN is inspired by the iterative line search methods where the parameters of the DR-RNN are optimized such that the residual of the numerically discretized PDEs is minimized (Bertsekas 1999; Tieleman and Hinton 2012; Nagoor Kani and Elsheikh 2017). Unlike the standard RNN which is very generic, DR-RNN (Nagoor Kani and Elsheikh 2017) uses the residual of the discretized differential equation. In addition, the parameters of the DR-RNN are fitted such that the computed DR-RNN output optimally minimizes the residual of the targeted equation. In this context, DR-RNN is a physics-aware RNN as it is tailored to leverage the physics embedded in the targeted dynamical system (i.e., residual of the equation or reduced residual in the current manuscript).

The resultant reduced model obtained from DR-RNN combined with POD–Galerkin and DEIM algorithm has a number of salient features. First, the dynamics of DR-RNN is explicit in time with superior convergence and stability properties for large time steps that violate the numerical stability conditions (Nagoor Kani and Elsheikh 2017; Pletcher et al. 2012). Second, as the dynamics modeled in DR-RNN are explicit in time, there is a reduction in the computational complexity of the extracted reduced model from \(\mathcal {O}(r^3)\) corresponding to implicit POD–DEIM reduced-order models, to \(\mathcal {O}(r^2)\), where r is the size of the reduced model. Third, DR-RNN requires only very few training samples (obtained by solving the full model) to optimize the parameters of the DR-RNN as it accounts for the physics of the full model within the RNN architecture (via the reduced residual). This is a major advantage when compared to pure data-driven algorithms (e.g., standard RNN architectures). Moreover, DR-RNN can effectively emulate the parameterized nonlinear dynamical system with a significantly lower number of parameters in comparison with standard RNN architectures (Nagoor Kani and Elsheikh 2017).

In this work, we demonstrate the superior properties of DR-RNN in accelerating UQ tasks for subsurface reservoir models using Monte Carlo method. As far as we are aware, the use of a single parameter-independent POD–Galerkin reduced model in Monte Carlo method involving nonlinear subsurface flow with high-dimensional stochastic permeability field has not been previously explored. The reason is that the resultant reduced model might require significantly more basis functions to reconstruct stable solutions (Cardoso and Durlofsky 2010; He et al. 2011; Boyce and Yeh 2014; Ghasemi 2015). However, only a single set of small number of POD basis functions would be sufficient to reconstruct the solution with reasonable accuracy using least-squares (see Sect. 3.2 for more details). Hence, the aim of this paper is to illustrate how DR-RNN could be used to reconstruct stable solutions emulating the full model dynamics using only a small set of POD basis functions. The proposed DR-RNN technique is validated on two forward uncertainty quantification problems involving two-phase flow in subsurface porous media. The two flow problems are commonly known within the reservoir simulation community as the quarter five spot problem and the uniform flow problem (Aarnes et al. 2007). In these two numerical examples, the permeability field is modeled as log-normal distribution. The obtained results demonstrate that DR-RNN combined with POD–DEIM provides an accurate and stable reduced-order model with a drastic reduction in the computational cost. The reason for selecting simplified flow problems is to illustrate the potential benefit of DR-RNN to formulate an accurate and computationally effective POD–DEIM reduced model for flow problems where the standard POD–Galerkin reduced models are inaccurate and possibly unstable. We also note that DR-RNN architecture is generic and could be used to emulate any well-posed nonlinear dynamical system (Nagoor Kani and Elsheikh 2017) including subsurface flow problems while accounting for capillary pressure effects, gravity effects, and compressibility.

The outline of the rest of this manuscript is as follows: In Sect. 2, we present the formulation of multi-phase flow problem in a porous media. In Sect. 3, we introduce POD–Galerkin method for model reduction followed by a discussion of DEIM for handling nonlinear systems. In Sect. 4, we describe the architecture of DR-RNN, and in Sect. 5, we evaluate the reduced model derived by combining DR-RNN with POD–DEIM on two uncertainty quantification test cases. Finally, in Sect. 6, we present the conclusions of this manuscript.

2 Problem Formulation

The equations governing two-phase flow of a wetting phase (water) and non-wetting phase (e.g., oil) in a porous media are the conservation of mass (continuity) equation and Darcy’s law for each phase  (Aarnes et al. 2007; He 2013; Chen et al. 2006; Bastian 1999). The continuity equation for each phase \(\alpha \) takes the form

$$\begin{aligned} \dfrac{\partial (\phi \rho _{\alpha } s_{\alpha }) }{ \partial t} - \nabla \cdot (\rho _{\alpha } \lambda _{\alpha } \mathbf {K}~(\nabla p_{\alpha } - \rho _{\alpha } g \nabla h) ) + q_{\alpha } = 0 \end{aligned}$$
(1)

where the subscript \(\alpha =w\) denotes the water phase, the subscript \(\alpha =o\) denotes the oil phase, \(\mathbf {K}\) is the absolute permeability tensor, \(\lambda _{\alpha } = k_{r\alpha }/\mu _{\alpha }\) is the phase mobility, with \(k_{r\alpha }\) the relative permeability to phase \(\alpha \) and \(\mu _{\alpha }\) the viscosity of phase \(\alpha \), \(p_{\alpha }\) is the phase pressure, \(\rho _{\alpha }\) is the density of phase \(\alpha \), g is the gravitational acceleration, h is the depth, \(\phi \) is the porosity, \(s_{\alpha }\) is the saturation of the phase \(\alpha \) and \(q_{\alpha }\) is the phase source and sink terms (Aarnes et al. 2007; Chen et al. 2006). Further, the phase saturations are constrained by \(s_w + s_o = 1\), since the oil and the water jointly fill the void space (Aarnes et al. 2007; He 2013).

The phase velocities are modeled by the multi-phase Darcy’s law to relate the phase velocities to the phase pressures and take the form

$$\begin{aligned} \mathbf {v}_{\alpha } = -\mathbf {K}\lambda _{\alpha } \nabla ~(p_{\alpha } - \rho _{\alpha } g h) \end{aligned}$$
(2)

where \(\mathbf {v}_{\alpha }\) is the phase velocity. The phase relative permeabilities \(k_{r\alpha }\) and the capillary pressure (\( p_{cow} = p_o - p_w\)) are usually modeled as functions of the phase saturations (Aarnes et al. 2007). Neglecting the capillary pressure, the compressibility effects, the gravitational effects, and assuming the density ratio to be equal to one, the continuity equations [Eq. (1)] can be combined with the Darcy’s law [Eq. (2)] to derive a global pressure equation and the saturation equation for water phase (Aarnes et al. 2007; He 2013; Bastian 1999). The simplified global pressure equation takes the form

$$\begin{aligned} \nabla \cdot \mathbf {K}\lambda ~\nabla p = q \end{aligned}$$
(3)

where \(p=p_o=p_w\) is the global pressure, \(\lambda = \lambda _w + \lambda _o\) is the total mobility, \(q=q_w+q_o\) is the source and sink term. The saturation equation for the water phase takes the following form

$$\begin{aligned} \phi ~\dfrac{\partial s}{\partial t} + \mathbf {v}\cdot \nabla f_w = \frac{q_w}{\rho _w} \end{aligned}$$
(4)

where \(f_w = {\lambda _w}/(\lambda _w + \lambda _o)\) is a function of saturation termed as the fractional flow function for the water phase, \(\mathbf {v}=-\mathbf {K}\lambda ~\nabla p\) is the total velocity vector, and \(s=s_w\) is the water saturation (Aarnes et al. 2007; Chen et al. 2006). In the rest of the paper, we write the water phase saturation as \(s=s_w\) for simplicity. Coupled equations Eqs. (3) and (4) could then be solved for the evolution of the saturation by providing the appropriate initial and boundary conditions. Equations (3) and (4) are continuous (in space and time) form of the full model.

The discrete form of the full model is obtained by dividing the problem domain into n grid blocks and then applying the finite-volume method to discretize the spatial derivatives of Eqs. (3) and (4). The discretized pressure equation takes the form

$$\begin{aligned} \mathbf {A}~\mathbf {y}_p = \mathbf {b}\end{aligned}$$
(5)

where \(\mathbf {A}\in \mathbb {R}^{n \times n}\), \(\mathbf {b}\in \mathbb {R}^n\), and \(\mathbf {y}_p \in \mathbb {R}^n\) is the pressure vector in which each component \( y_{p_i} \) of \(\mathbf {y}_p\) represent the pressure value at the \(i\hbox {th}\) grid block. Similarly, the spatially discretized saturation equation takes the form

$$\begin{aligned} \dfrac{{\hbox {d}} \mathbf {y}_s}{{\hbox {d}}t} + \mathbf {B}(\mathbf {v})~\mathbf {f}_w(\mathbf {y}_s) = \mathbf {d}\end{aligned}$$
(6)

where \(\mathbf {B}\in \mathbb {R}^{n \times n}\), \(\mathbf {d}\in \mathbb {R}^n\), \(\mathbf {v}\) is the total velocity vector, and \(\mathbf {y}_s \in \mathbb {R}^n\) is the saturation vector in which each component \( y_{s_i} \) of \(\mathbf {y}_s\) is the saturation value at the \(i\hbox {th}\) grid block.

Equations (5) and (6) are the discrete form of the full model for multi-phase flow problem under consideration. These two equations exhibit two way coupling from the dependence of the matrix \(\mathbf {A}\) on the mobilities \(\lambda (\mathbf {y}_s(t))\) in the pressure full model [Eq. (5)] and from the dependence of the matrix \(\mathbf {B}\) on the velocity vector \(\mathbf {v}(\mathbf {y}_p)\) in the saturation full model [Eq. (6)]. In this paper, we adopt an implicit sequential splitting method to solve the full model [Eqs. (5) and (6)]. In this method, the saturation vector \(\mathbf {y}_s(t)\) from the present time step is used to assemble the matrix \(\mathbf {A}\) in Eq. (5) and then the pressure full model [Eq. (5)] is solved for the pressure vector \(\mathbf {y}_p\). Following that, the velocity vector \(\mathbf {v}\) (computed from the pressure vector \(\mathbf {y}_p\)) is used to assemble the matrix \(\mathbf {B}\) in Eq. (6) and then the saturation full model [Eq. (6)] is solved implicitly in time for the saturation at the next time step. In the following section, we formulate a Galerkin projection-based reduced model to reduce the computational effort for multi-query tasks (e.g., uncertainty quantification) involving repeated solutions of Eqs. (5) and (6), when n (the number of grid block) is large (Chaturantabut and Sorensen 2010; Ghasemi 2015).

3 Reduced-Order Model Formulation

In this section, we formulate the POD–Galerkin reduced model (POD reduced model) and POD-DEIM reduced model where POD–Galerkin is combined with DEIM for handling the nonlinear terms. Both methods are introduced to reduce the computational effort associated with solving the full model [Eqs. (5) and (6)].

3.1 POD Basis

As stated in Sect. 1, POD-based reduced model is a projection-based reduced-order model in which the governing equations are projected onto an optimal low-dimensional subspace \(\mathcal {U}\) spanned by a small set of r basis vectors. Galerkin projection reduced model is based on the assumption that most of the system information and characteristics can be efficiently represented by linear combinations of only a small number of basis vectors (Rewienski and White 2003).

The optimal basis vectors \(\lbrace \mathbf {u}_i \rbrace _{i=1}^{r}\) in POD are computed by singular value decomposition (SVD) of the solution snapshot matrix \(\mathbf {X}\). The solution snapshot matrix \(\mathbf {X}\) is obtained from a set of solution vectors of size \(n_s\) obtained by solving the full model at selected points in the input parameter space. The SVD of \(\mathbf {X}\) is expressed as

$$\begin{aligned} \mathbf {X}= \mathbf {U}~\varSigma ~\mathbf {W}\end{aligned}$$
(7)

where \(\mathbf {X}\in \mathbb {R}^{n \times n_s}\), \(\mathbf {U}=[\mathbf {u}_1~\mathbf {u}_2~\mathbf {u}_3~\cdots ~\mathbf {u}_n] \in \mathbb {R}^{n \times n}\) is the left singular matrix and \(\varSigma =\text {diag}(\sigma _1>\sigma _2>\sigma _3>\cdots ~\sigma _{ns} \ge 0)\) is the diagonal matrix containing the singular values \(\sigma _i\) of the snapshot matrix \(\mathbf {X}\) in descending order. The dominant left singular vectors \(\lbrace \mathbf {u}_i \rbrace _{i=1}^{r}\) corresponding to the first r largest singular values represent the basis vectors to span the optimal subspace \(\mathcal {U}\) of POD-based reduced model. Thus, the first step in deriving the POD-based reduced model is to express the state vector \(\mathbf {y}\) of the full-order model by a linear combination of r basis vectors as follows:

$$\begin{aligned} \mathbf {y}\approx \mathbf {U}^r~\tilde{\mathbf {y}} \end{aligned}$$
(8)

where \(\tilde{\mathbf {y}} \in \mathbb {R}^r\) is the reduced state vector representation of full-dimensional state vector \(\mathbf {y}\), and \(\mathbf {U}^r=[\mathbf {u}_1~\cdots ~\mathbf {u}_r] \in \mathbb {R}^{n \times r}\) is the matrix that contains r orthonormal basis vectors in its columns.

By following this step, for example, the optimal basis vectors for the saturation state vector \(\mathbf {y}_s\) are obtained from the SVD of the saturation snapshot matrix \(\mathbf {X}_{s}=\left( (\mathbf {y}_{s_1}~\ldots ~\mathbf {y}_{s_T})^1~\ldots ~(\mathbf {y}_{s_1}~\ldots ~\mathbf {y}_{s_T})^L\right) \), where T is the number of time steps and L is the number of samples of input parameter used to build the snapshot matrix. The SVD of \(\mathbf {X}_{s}\) is expressed as

$$\begin{aligned} \mathbf {X}_{s} = \mathbf {U}_{s}~\varSigma _{s}~\mathbf {W}_{s} \end{aligned}$$
(9)

where \(\mathbf {U}_{s} \in \mathbb {R}^{n \times n}\) is the left singular matrix, and \(\varSigma _{s}\) is the diagonal matrix containing the singular values of the snapshot matrix \(\mathbf {X}_{s}\) in descending order. The saturation state vector \(\mathbf {y}_s\) is optimally expressed as

$$\begin{aligned} \mathbf {y}_s \approx \mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s \end{aligned}$$
(10)

where \(\tilde{\mathbf {y}}_s \in \mathbb {R}^r\) is the reduced state vector representation of \(\mathbf {y}_s\), and \(\mathbf {U}_{s}^r \in \mathcal {R}^{n \times r}\) is the matrix that contains r orthonormal basis vectors in its columns. Similarly, we can represent the pressure state vector \(\mathbf {y}_p\) from its reduced state vector representation \(\tilde{\mathbf {y}}_p\) using optimal basis matrix \(\mathbf {U}_p\) obtained from the SVD of the pressure snapshot matrix \(\mathbf {X}_p\).

3.2 Least-Squares Approximation

The capacity of a set of basis functions to represent a new solution vector could be tested using least-squares fitting (Eldén 2007; Trefethen and Bau III 1997). For example, the least-squares solution for approximating a saturation state vector \(\mathbf {y}_s^* \in \mathbb {R}^n\) is defined as

$$\begin{aligned} \mathbf {y}_s^{*} \approx \mathbf {U}_s^r~\tilde{\mathbf {y}}_s = \mathbf {U}_s^r~({\mathbf {U}_s^r}^{\top }~\mathbf {y}_s) \end{aligned}$$
(11)

The associated error termed as least-squares errors in approximating \(\mathbf {y}_s\) by \(\mathbf {y}_s^*\) using only r basis vectors is given by

$$\begin{aligned} \varepsilon _s = \Vert \mathbf {y}_s - \mathbf {y}_s^{*} \Vert _2 \end{aligned}$$
(12)

The least-squares error \(\varepsilon _s\) [Eq. (12)] is equivalent to the omitted energy \(\varOmega _s = \sum _{i=r+1}^{n} \sigma _{s_i}\) (Lucia et al. 2004; Berkooz et al. 1993). In practice, r is commonly chosen as the smallest integer such that the relative omitted energy \(\nu \) is less than a preset value (e.g., 0.01), where the omitted energy is defined by the following equation

$$\begin{aligned} \nu = 1 - \frac{\sum _{i=r+1}^{n}\sigma _{s_i}}{\sum _{i=1}^{n}\sigma _{s_i}} \end{aligned}$$
(13)

Similar expressions mentioned in Eqs. (11), (12), and (13) can be obtained for the pressure state vector as well. We note that least-squares errors are not necessarily equivalent to the omitted energy for state vectors not included in the snapshot matrix or for the state vector solved at a new point in the input parameter space as these new vectors might not fall within the span of the snapshot matrix (Frangos et al. 2010; Lucia et al. 2004). The least-squares solution is the best approximation of the state variables in the sense that, for the chosen low-dimensional subspace \(\mathcal {U}\), no other low-dimensional approximation can represent the given snapshot set with a lower least-squares error (Lassila et al. 2014; Sirovich 1987; Berkooz et al. 1993). In this paper, we use the best approximation of the state variables to assess the quality of the approximation obtained from different reduced-order models in the numerical examples presented in Sect. 5.

3.3 POD–Galerkin

Once the POD basis vectors are obtained, the reduced representation of the pressure vector \(\mathbf {y}_p\) is substituted into the pressure full model [Eq. (5)], followed by Galerkin projection of the pressure equation into the subspace spanned by \(\mathbf {U}^r_{p}\). The resulting POD-based reduced model for the pressure equation then takes the following form

$$\begin{aligned} \tilde{\mathbf {A}}~\tilde{\mathbf {y}}_p = \tilde{\mathbf {b}} \end{aligned}$$
(14)

where \(\tilde{\mathbf {A}} = {\mathbf {U}_{p}^r}^{\top }~\mathbf {A}~\mathbf {U}_{p}^r \in \mathbb {R}^{r \times r}\) and \(\tilde{\mathbf {b}}={\mathbf {U}_{p}^r}^{\top }~\mathbf {b}\in \mathbb {R}^r\). Similarly, POD-based reduced model for the saturation equation [Eq. (6)] takes the form

$$\begin{aligned} \dfrac{{\hbox {d}} \tilde{\mathbf {y}}_s}{{\hbox {d}}t} + {\mathbf {U}_{s}^r}^{\top }~\mathbf {B}(\mathbf {v})~\mathbf {f}_w(\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s) = \tilde{\mathbf {d}}, \end{aligned}$$
(15)

where \(\tilde{\mathbf {d}}={\mathbf {U}_{s}^r}^{\top }~\mathbf {d}\) and \(\tilde{\mathbf {d}} \in \mathbb {R}^r\).

The POD-based reduced model formulated by Eqs. (14) and (15) is of the reduced dimension r. However, the nonlinear function \(\mathbf {f}_w\) in Eq. (15) is still of the order of full dimension n. Moreover, the reduced Jacobian matrix \(\tilde{\mathbf {J}} = \tilde{\mathbf {I}} - {\mathbf {U}_s^r}^{\top }\mathbf {B}~\mathbf {J}_f(\mathbf {f}_w (\mathbf {U}_s^r~\tilde{\mathbf {y}}_s))\mathbf {U}_s^r~\in \mathbb {R}^{r \times r}\) needed for Newton-like iterations to solve this nonlinear equation is also of order n (Chaturantabut and Sorensen 2010) as it relies on evaluating the full-order nonlinear function \(\mathbf {f}_w\). Therefore, for problems with general nonlinear functions involved in POD-based reduced model, the computational cost of solving the reduced system is still a function of the full system dimension n.

3.4 DEIM

Discrete empirical interpolation method (DEIM) was introduced in Chaturantabut and Sorensen (2010) to approximate the nonlinear terms in POD-based reduced model using a limited number of points that are independent of the full system dimension n. Similar to POD, the first step of DEIM is to approximate the nonlinear function \(\mathbf {f}_w\) in Eq. (15) using a separate set of basis vectors \(\mathbf {V}^m=[\mathbf {v}_1~\mathbf {v}_2~\mathbf {v}_3~\ldots ~\mathbf {v}_m]\) as

$$\begin{aligned} \mathbf {f}_w = \mathbf {V}^m~\tilde{\mathbf {f}} \end{aligned}$$
(16)

where \(\tilde{\mathbf {f}}\) is the coefficient of expansion of the nonlinear function \(\mathbf {f}_w\) in the reduced subspace spanned by \(\lbrace \mathbf {v}_i \rbrace _{i=1}^m\), \(\mathbf {V}^m \in \mathbb {R}^{n \times m}\) is the matrix containing the first m columns of the left singular matrix \(\mathbf {V}\in \mathbb {R}^{n \times n}\) obtained from the SVD of the snapshot matrix \(\mathbf {X}_{f}\) of the nonlinear function \(\mathbf {f}_w\). We note that no additional computational costs are associated with collecting the snapshot matrix of the nonlinear terms \(\mathbf {X}_f\) as it is already evaluated during the computation of the state snapshot vectors. The nonlinear term in Eq. (15) can then be expressed as

$$\begin{aligned} {\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {f}_w = ({\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {V}^m)~\tilde{\mathbf {f}} = ({\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {V}^m)~\cdot ~({\mathbf {V}^m}^\top ~\mathbf {f}_w) \end{aligned}$$
(17)

The matrix factor \(({\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {V}^m) \in \mathbb {R}^{r \times m} \) in Eq. (17) is precomputed before solving Eq. (15). The overdetermined system \(\tilde{\mathbf {f}} = {\mathbf {V}^m}^\top ~\mathbf {f}_w\) is approximated using the DEIM algorithm introduced in Chaturantabut and Sorensen (2010) by first computing a matrix \(\mathbf {P}\in \mathbb {R}^{n \times m}\) that selects m rows of the matrix \(\mathbf {V}^m\) to obtain \(\tilde{\mathbf {f}}\) as follows:

$$\begin{aligned} \mathbf {P}^{\top }~\mathbf {f}_w = \mathbf {P}^{\top }~\mathbf {V}^m~\tilde{\mathbf {f}} \rightarrow \tilde{\mathbf {f}} = (\mathbf {P}^{\top }~\mathbf {V}^m)^{-1}~\mathbf {P}^{\top }~\mathbf {f}_w \end{aligned}$$
(18)

Using this expression of \(\tilde{\mathbf {f}}\) to approximate the nonlinear function in Eq. (17), we obtain a nonlinear term that is independent of n that takes the form

$$\begin{aligned} {\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {f}_w \approx \mathbf {D}~\mathbf {f}_w(\mathbf {P}^{\top }~\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s) \end{aligned}$$
(19)

where the matrix \(\mathbf {D}= {\mathbf {U}_{s}^r}^{\top }~\mathbf {B}~\mathbf {V}^m~(\mathbf {P}^{\top }~\mathbf {V}^m)^{-1} \in \mathbb {R}^{r \times m}\) termed as the DEIM matrix. Similarly, the Jacobian of the nonlinear term in Eq. (15) is approximated using DEIM as follows:

$$\begin{aligned} \tilde{\mathbf {J}} = \tilde{\mathbf {I}} - ({\mathbf {U}_{s}^r}^{\top }\mathbf {B}\mathbf {V}^m (\mathbf {P}^{\top }~\mathbf {V}^m)^{-1})~\hat{\mathbf {J}}_f(\mathbf {f}_w(\mathbf {P}^{\top } ~\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s))~(\mathbf {P}^{\top } \mathbf {U}_{s}^r) \end{aligned}$$
(20)

where \(\hat{\mathbf {J}}_f(\mathbf {f}_w(\mathbf {P}^{\top }~\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s)) \in \mathbb {R}^{m \times m}\) is the Jacobian matrix computed using the m components of \(\mathbf {f}_w\) evaluated by the DEIM algorithm (Chaturantabut and Sorensen 2010; Rewienski and White 2003; Nagoor Kani and Elsheikh 2017). Finally, the POD–DEIM-based reduced model takes the form

$$\begin{aligned} \dfrac{{\hbox {d}} \tilde{\mathbf {y}}_s}{{\hbox {d}}t} + \mathbf {D}~\mathbf {f}_w(\mathbf {P}^{\top }~\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_s) = \tilde{\mathbf {d}} \end{aligned}$$
(21)

We note that POD–DEIM formulation is independent of the full model dimension n and that the DEIM procedure exploits the structure of the nonlinear function \(\mathbf {f}_w\) as component-wise operation at \(\mathbf {U}_s^r~\tilde{\mathbf {y}}_s\) (Chaturantabut and Sorensen 2010).

4 Deep Residual RNN

POD–DEIM reduced-order models, as introduced in the last chapter, could be used to perform parametric UQ tasks. However, the POD–DEIM formulation is nonlinear and relies on using Newton method at each time step to solve the resulting system of nonlinear equations. The computational efficiency of the Newton iteration depends on the method employed to assemble the Jacobian matrix and more importantly on the conditioning of the reduced Jacobian matrix. It also depends on the method used to solve the resulting linear system at each iteration of the Newton step, and generally, it takes \(\mathcal {O}(r^3)\) operations for each saturation update (Nagoor Kani and Elsheikh 2017; Bertsekas 1999). Moreover, previous studies (He 2010, 2013) pointed to the loss of stability of POD–Galerkin reduced model in several cases, and it was attributed to ill-conditioning and poor spectral properties of the reduced Jacobian matrix.

In this paper, we build on the recently introduced DR-RNN (Nagoor Kani and Elsheikh 2017) and formulate an accurate POD–DEIM reduced-order models. DR-RNN is a deep RNN architecture (Nagoor Kani and Elsheikh 2017), constructed by stacking K physics-aware network layers. DR-RNN could be applied to any nonlinear dynamical system of the form

$$\begin{aligned} \dfrac{{\hbox {d}}\mathbf {y}}{{\hbox {d}}t} = \mathbf {A}~\mathbf {y}+ \mathbf {F}(\mathbf {y}) \end{aligned}$$
(22)

where \({\mathbf {y}}(\mathbf {a}, t)\in \mathbb {R}^n\) is the state variable at time t, \(\mathbf {a}\in \mathbb {R}^d\) is a system parameter vector, the matrix \(\mathbf {A}\in \mathbb {R}^{n\times n}\) is the linear part of the dynamical system, and the vector \(\mathbf {F}({\mathbf {y}}) \in \mathbb {R}^n\) is the nonlinear term (Nagoor Kani and Elsheikh 2017). The state variable \(\mathbf {y}(t)\) at different time steps is obtained by solving the nonlinear residual equation defined as

$$\begin{aligned} \mathbf {r}_{t+1} = \mathbf {y}_{t+1} - \mathbf {y}_t - \Delta t~\mathbf {A}~\mathbf {y}_{t+1} - \Delta t~\mathbf {F}(\mathbf {y}_{t+1}) \end{aligned}$$
(23)

where \(\mathbf {r}(t)\) is termed as the residual vector at time step t and \(\mathbf {y}(t+1)\) is the approximate solution of Eq. (22) at time step \(t+1\) obtained by using implicit Euler time integration method. DR-RNN  (Nagoor Kani and Elsheikh 2017) approximates the solution of Eq. (22) using the following iterative update equations

$$\begin{aligned} \begin{array}{ll} \mathbf {y}^{(k)}_{t+1} = \mathbf {y}^{(k-1)}_{t+1} - \mathbf {w}~\circ ~\phi _h(\mathbf {U}~\mathbf {r}^{(k)}_{t+1}) &{} \quad \text {for}~k = 1, \\ \mathbf {y}^{(k)}_{t+1} = \mathbf {y}^{(k-1)}_{t+1} - \frac{\eta _k}{\sqrt{G_k+\epsilon }} ~\mathbf {r}^{(k)}_{t+1} &{} \quad \text {for}~k > 1, \end{array} \end{aligned}$$
(24)

where \(\mathbf {U}, \mathbf {w}, \eta _k\) are the training parameters of DR-RNN, \(\phi _h\) is the \(\tanh \) activation function, \(\circ \) is an element-wise multiplication operator, \(\mathbf {r}^{(k)}_{t+1}\) is the residual in layer k obtained by substituting \(\mathbf {y}_{t+1} = \mathbf {y}^{(k-1)}_{t+1}\) into Eq. (23), and \(G_k \) is an exponentially decaying squared norm of the residual defined by

$$\begin{aligned} G_{k} = \gamma ~\Vert \mathbf {r}^{(k)}_{t+1}\Vert ^2 + \zeta ~G_{k-1} \end{aligned}$$
(25)

where \(\gamma , \zeta \) are fraction factors and \(\epsilon \) is a smoothing term to avoid divisions by zero (Nagoor Kani and Elsheikh 2017). In this formulation, we set \(\mathbf {y}^{(k=0)}_{t+1} = \mathbf {y}_t\). The architecture of DR-RNN is inspired by the rmsprop algorithm (Tieleman and Hinton 2012) which is a variant of the steepest descent method. The DR-RNN output at each time step is defined as

$$\begin{aligned} \mathbf {y}_{t+1}^{\tiny {\text {(RNN)}}} = \mathbf {y}_{t+1}^K \end{aligned}$$
(26)

The formulation of DR-RNN is explicit in time and has a fixed number of iterations K per time step. However, the dimension of the DR-RNN system depends on the dimension of the residual. For example, DR-RNN [Eq. (24)] can be derived from the POD–DEIM reduced model residual (\(\tilde{\mathbf {r}}_{t+1} = -\tilde{\mathbf {y}}_{s_{t+1}} + \tilde{\mathbf {y}}_{s_{t}} + \mathbf {D}~\mathbf {f}_w(\mathbf {P}^{\top }~\mathbf {U}_{s}^r~\tilde{\mathbf {y}}_{s_{t+1}}) + \tilde{\mathbf {d}}\)). In such setting, the DR-RNN dynamics has a fixed computational budget of \(\mathcal {O}(r^2)\) for each time step. In addition, DR-RNN has the prospect of employing large time step violating the numerical stability constraint (Nagoor Kani and Elsheikh 2017). Furthermore, DR-RNN does not rely on the reduced Jacobian matrix to approximate the solution of POD–DEIM reduced model.

The DR-RNN parameters \({\varvec{\theta }}= \lbrace \mathbf {U},~\mathbf {w},~\eta _k \rbrace \) are fitted by minimizing the mean square error (mse) defined by

$$\begin{aligned} \mathbf {J}_{\tiny {\text {MSE}}}({\varvec{\theta }}) = \frac{1}{L}\sum _{\ell =1}^L \sum _{t=1}^{T} \left( \mathbf {y}_{t} - \mathbf {y}_{t}^{\tiny {\text {(RNN)}}}\right) ^2 , \end{aligned}$$
(27)

where \(\mathbf {J}_{\tiny {\text {MSE}}}\) (mse) is the average distance between the reference solution \(\mathbf {y}_{t}\) and the RNN output \({\mathbf {y}}_{t}^{\tiny {\text {RNN}}}\) across a number of samples L with time-dependent observations \((t=1~\ldots ~T~\text {and}~\ell =1~\ldots ~L)\) (Nagoor Kani and Elsheikh 2017; Pascanu et al. 2013b). The set of parameters \({\varvec{\theta }}\) is commonly estimated by a technique called backpropagation through time (BPTT) (Werbos 1990; Rumelhart et al. 1986; Pascanu et al. 2013a; Mikolov et al. 2014), which backpropagates the gradient of the loss function \(\mathbf {J}_{\tiny {\text {MSE}}}\) with respect to \({\varvec{\theta }}\) in time over the length of the simulation.

5 Numerical Experiments

In this section, we evaluate the performance of the reduced-order models based on DR-RNN against the standard implementation of POD–Galerkin reduced model. Specifically, we develop two POD–Galerkin-based reduced model using DR-RNN architecture namely, \(\hbox {DR-RNN}^{\text {p}}\) (DR-RNN combined with POD–Galerkin) and \(\hbox {DR-RNN}^{\text {pd}}\) (DR-RNN combined with POD–Galerkin and DEIM). The numerical evaluations are performed using two uncertainty quantification tasks involving subsurface flow models. We did not include  standard POD–DEIM reduced model implementation as we expect that the standard POD reduced model results to be far superior (Chaturantabut and Sorensen 2010; Nagoor Kani and Elsheikh 2017; Chaturantabut and Sorensen 2010).

The outline of this section is as follows: In Sect. 5.1, we present the description of the flow problem, followed by a brief description of the finite-volume approach employed for obtaining the full-order model solution. Following that, in Sect. 5.2, we outline the specific details to formulate POD reduced model. Then, we list the settings adopted to model the DR-RNN ROMs (i.e., number of layers, optimization settings, etc) in the Sect. 5.3. In Sect. 5.4, we provide a set of error metrics utilized to evaluate the performance of the different ROMs. In Sect. 5.5, we present the numerical results for the quarter five spot model followed by results for the uniform flow model in the Sect. 5.6.

5.1 Full-Order Model Setup

We consider a two-phase (oil and water) porous media flow problem over the two-dimensional domain \([0~1] \times [0~1]\) m. The equations governing the two-phase flow are the pressure equation [Eq. (3)] and the saturation equation [Eq. (4)]. The relative permeability is defined as a function of saturation using Corey’s model \(k_{rw}(s)={s^*}^2\), \(k_{ro}=(1-s^*)^2\), where \(s^*=(s-s_{wc})/(1-s_{or}-s_{wc})\), \(s_{wc}\) is the irreducible water saturation, and \(s_{or}\) is the residual oil saturation (Aarnes et al. 2007). We set \(s_{or}=0.2\) and \(s_{wc}=0.2\). We set the initial water saturation over the domain to the irreducible water saturation \(s_{wc}=0.2\). The water and oil viscosities are 1 and 1.5 centipoise, respectively. The porosity is assumed to be a constant value of 0.2 over the entire problem domain. The uncertain permeability field is modeled as a log-normal distribution function with zero mean and exponential covariance kernel of the form

$$\begin{aligned} \mathbb {C}ov = \sigma _k~\exp \left[ -\frac{\vert x_1 - x_2 \vert }{L_k} \right] \end{aligned}$$
(28)

where \(\sigma _k\) is the variance, \(L_k\) is the correlation length. In all test cases, we set \(\sigma _k\) to 1 and the correlation length \(L_k\) to 0.1 m. Figure 1 shows several realizations of the log-permeability values. For solving the full-order model, the problem domain is discretized using a uniform grid of \(64 \times 64\) blocks. The pressure equation is discretized using simple finite-volume method (aka. two-point flux approximation) (Aarnes et al. 2007), and an upwind finite-volume scheme is used to discretized the saturation equation. For the time discretization, an implicit backward Euler method combined with Newton–Raphson iterative method is used to solve the resulting system of nonlinear equations. We set the time step size to 0.015, and the total number of time steps is set to 160. We note that, the time is measured in a non-dimensional quantity called pore volumes injected (PVI). PVI defines the net volume of water injected as a fraction of the total pore volume. As the pressure changes at much slower rate than the saturation, the pressure equation (and hence the velocity) is solved at every eighth saturation time step. For reference solutions, this system of equations is solved for 2000 random permeability realizations to estimate an ensemble-based statistics using Monte Carlo method (Ibrahima 2016).

Fig. 1
figure 1

Plots of log values of random permeability field modeled by log-normal probability distribution. The unit of the permeability field is \(m^2\)

5.2 POD–Galerkin-Based Reduced Model Setup

The first step in formulating POD reduced model is to compute the optimal POD basis matrices \(\mathbf {U}_p^r\) and \(\mathbf {U}_s^r\). In order to obtain these basis matrices, we initially preformed a realization clustering algorithm to enforce the diversity of the collected snapshots and clustered the 2000 random permeability realizations into 45 clusters (Ghasemi 2015). Then, we randomly selected a single permeability realization from each cluster (total 45 random samples of the permeability field). The full system is then solved for each of the 45 realizations, and the solution vectors are collected to build the snapshot matrices (pressure, saturation, nonlinear function). Finally, we compute the POD basis matrices from the SVD of the collected snapshot matrices.

Following that, the obtained basis vectors are used to build POD reduced model (as detailed in the Sect. 3). We then employ the same sequential implicit technique settings adopted for obtaining the full model solutions to solve the resultant POD reduced model. For numerical evaluations, we solve the POD reduced model for the same 2000 permeability realizations to estimate an ensemble-based statistics in the engineering quantities of interest.

5.3 DR-RNN Setup

In all the numerical test cases, we utilize DR-RNN with six layers [\(K=6\) in Eq. (24)]. We evaluate \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) for different number of POD basis; however, we fix the number of DEIM basis to 35. The PyTorch framework (Paszke et al. 2017), a deep learning python package using Torch library as a backend, is used to implement the DR-RNN. Further, we optimize the DR-RNN model parameters using rmsprop algorithm (Tieleman and Hinton 2012; Paszke et al. 2017) as implemented in PyTorch, where we set the weighted average parameter to 0.9 and the learning rate to 0.001. The weight matrix \(\mathbf {U}\) in Eq. (24) is initialized randomly from the uniform distribution function \(\mathtt {U [0.01, 0.02]}\). The vector training parameter \(\mathbf {w}\) in Eq. (24) is initialized randomly from the uniform distribution function \(\mathtt {U [0.1, 0.5]}\). The scalar training parameters \(\eta _k\) in Eq. (24) are initialized randomly from the uniform distribution \(\mathtt {U [0.1, 0.4]}\). We set the hyperparameters \(\zeta \) and \(\gamma \) in Eq. (25) to 0.9 and 0.1, respectively. The formulated \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) are trained to approximate the reduced state vector representation obtained from least-squares fits. Specifically, we collect a set of best reduced state vector representation \(\tilde{\mathbf {y}}_s^*\) of the saturation state vector using \(\tilde{\mathbf {y}}_s^* = {\mathbf {U}_s^r}^{\top }~\mathbf {y}_s\). The collected set of reduced state vectors is then used to train the parameters of the DR-RNN by minimizing the loss function defined in Eq. (27).

5.4 Evaluation Metrics

We evaluate the performance of \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) using two time specific error metrics defined by

$$\begin{aligned} \begin{array}{ll} L_{2_{l,t}} &{}= \Vert \left( \mathbf {y}_{t} - \mathbf {y}_{t}^{\tiny {\text {(RM)}}} \right) ^l \Vert _2 \\ L_{\infty _{l,t}} &{}= \Vert \left( \mathbf {y}_{t} - \mathbf {y}_{t}^{\tiny {\text {(RM)}}} \right) ^l \Vert _{\infty } \end{array} \end{aligned}$$
(29)

where l is the realization index, and \(\mathbf {y}_{t}^{\tiny {\text {(RM)}}}\) is computed from the reduced model. Additionally, we utilize two relative error metrics defined as

$$\begin{aligned} \begin{array}{ll} L_2^{\tiny {\text {rel}}} &{}= \frac{1}{L \times T}\sum _{\ell =1}^L \sum _{t=1}^{T} \left\| \left( \frac{ \mathbf {y}_{t} - \mathbf {y}_{t}^{\tiny {\text {(RM)}}} }{\mathbf {y}_{t}} \right) ^l \right\| _2 \\ L_{2\tiny {\text {,max}}}^{\tiny {\text {rel}}} &{}= \max \limits _{l,t=1~\text {to}~L,T} \left\| \left( \frac{ \mathbf {y}_{t} - \mathbf {y}_{t}^{\tiny {\text {(RM)}}} }{\mathbf {y}_{t}} \right) ^l \right\| _2 \end{array} \end{aligned}$$
(30)

where all the time snapshots of saturation vectors in all realizations are used.

5.5 Numerical Test Case 1

In this test case, water is injected at the lower left corner (0, 0) of the domain and a mixture of oil and water is produced at the top right corner of the domain (1, 1). We set the injection rate \(q=0.05\) at (0, 0) and \(q=-0.05\) at (1, 1) as defined in Eq. (4). We impose a no flow boundary condition in all the four sides of the domain. We fix the number of pressure POD basis to 5 and obtain all the ROMs for a set of different number of saturation POD basis functions (\(r=10, 20\)). The configuration of the problem domain is shown in top left panel of Fig. 2, where the blue spot in the lower left corner (0, 0) corresponds to the injector well and the blue spot in the upper right corner (1, 1) corresponds to the production well. Figure 2 shows the singular values of the pressure snapshot matrix \(\mathbf {X}_p\) in the top right panel, the saturation snapshot matrix \(\mathbf {X}_s\) in the bottom left panel, and the nonlinear function snapshot matrix \(\mathbf {X}_f\) in the bottom right panel.

Fig. 2
figure 2

Top Left: Computational porous media domain in test case 1. The blue dot in the lower left corresponds to the injector well, and the blue dot in the upper right corner corresponds to the production well. The red dots represented in numbers from 1 to 5 correspond to the locations where the PDF and the water saturation are investigated. Top Right: Singular values of the pressure snapshot matrix \(\mathbf {X}_p\). Bottom Left: Singular values of the saturation snapshot matrix \(\mathbf {X}_s\). Bottom Right: Singular values of the nonlinear function snapshot matrix \(\mathbf {X}_f\)

Fig. 3
figure 3

Time plots of mean water saturation obtained from all the ROMs and the full-order model for test case 1. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\). The plots in each row are arranged as per the numerical notation of the spatial points plotted in Fig. 2 (top left panel)

The mean water saturation plots over the simulation time are shown in Fig. 3, where the results in the top row correspond to using 10 POD basis and the results in the bottom row correspond to using 20 POD basis. The subplots in Fig. 3 are arranged from left to right following the numbering of the spatial points shown in Fig. 2. From these results, it is clear that \(\hbox {DR-DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) results are very close to the least-square solutions (LS fit). In Fig. 3, POD–Galerkin reduced model yields extremely inaccurate and unstable results. We attribute the small errors in \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) results to the insufficient number of POD basis vectors, and we note that the error magnitude is equivalent to the optimal values obtained by least-squares projection.

Fig. 4
figure 4

Comparison of mean water saturation field at time \( = 0.3\ \hbox {PVI}\) for test case 1. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Fig. 5
figure 5

Comparison of standard deviation of the water saturation field at time \(= 0.3\ \hbox {PVI}\) for test case 1. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Figures 45, and 6 show the results for the first (mean) and second (standard deviation) moments of the saturation field at time \(= 0.3\ \hbox {PVI}\) obtained from the full model and from the various ROMs. In these Figs. 45, and 6, results for 10 POD basis are shown in the top row and results for 20 POD basis are shown in the bottom row. As shown in Fig. 4, the mean saturation obtained from DR-RNN ROMs is almost indistinguishable from the reference results. However, the mean saturation field obtained from POD reduced model (left panels of Fig. 6) deviates significantly from the reference mean saturation.

In Fig. 5, we observe small discrepancy of standard deviation results obtained in the DR-RNN ROMs in comparison with the full model results especially near the location of the mean water saturation front. Figure 6 (right panels) shows the standard deviation results obtained by POD reduced model which show significant inaccuracies that could be indicative to instabilities of the obtained solutions. We note that the white spots in Fig. 6 correspond to out of limits shown in colorbar.

Fig. 6
figure 6

Plot of saturation mean and standard deviation of the water saturation field at time \(= 0.3\ \hbox {PVI}\) obtained from the POD reduced model for test case 1. Left: saturation mean. Right: standard deviation. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Fig. 7
figure 7

Comparison of kernel density estimated probability density function (PDF) at time \(= 0.3\ \hbox {PVI}\) for test case 1. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\). The plots in each row are arranged as per the numerical notation of the spatial points plotted in Fig. 2 (top left panel)

Fig. 8
figure 8

Comparison of \(\log (L_{2_{l,t}})\) and \(\log (L_{\infty _{l,t}})\) error estimators [Eq. (29)] at time \(= 0.3\ \hbox {PVI}\) for test case 1. The number of POD basis used \(=10\)

Figure 7 compares the saturation PDF estimated from the ensemble of numerical solutions (ROMs and the full model). Figure 7 settings are similar to the one adopted in Fig. 3. In Fig. 7, we can see that all the plots obtained from \(\hbox {DR-DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) are indistinguishable from the plots obtained from the LS fit (the best approximation). Further, we observe that the saturation PDF obtained from \(\hbox {DR-DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) follows nearly the same trend of saturation PDF obtained from the full model when the reference distribution is unimodal. However, we observe some discrepancy when the distributions are multimodal. Please note that similar discrepancy is also observed in the PDF obtained from LS fit. Hence, we postulate that these discrepancies are attributed to the limited number of POD basis vectors utilized. In Fig. 7, POD reduced model yields very inaccurate approximation of the saturation PDF irrespective of the number of POD basis.

Fig. 9
figure 9

Comparison of \(\log (L_{2_{l,t}})\) and \(\log (L_{\infty _{l,t}})\) error estimators [Eq. (29)] at time \(= 0.3\ \hbox {PVI}\) for test case 1. The number of POD basis used \(=20\)

Figures 8 and 9 display samples of \(\log (L_{2_{l,t}})\) and \(\log (L_{\infty _{l,t}})\) errors at time \(0.3\ \hbox {PVI}\) obtained from all the ROMs. All the ROMs use 10 POD basis to display the errors in Fig. 8 and likewise 20 POD basis to display the errors in Fig. 9. From these figures, we can see that the POD reduced model approximation errors are at least an order of magnitude more than the least-squares solution errors [Eq. (11)], whereas the errors obtained from \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) are nearly indistinguishable from the least-squares projection errors.

Table 1 Performance chart of all the ROMs employed for test case 1. \(L_2^{\tiny {\text {rel}}}\) and \(L_{2\tiny {\text {,max}}}^{\tiny {\text {rel}}}\) error estimators are defined in Eq. (30). The number of POD basis used \(=10\) and 20
Fig. 10
figure 10

Top Left: Computational porous media domain in test case 2. The blue arrows in the left side correspond to the injection of water, and the brown arrows in the right side correspond to the production of oil and water. The red dots represented in numbers from 1 to 5 correspond to the locations where the PDF and the water saturation are investigated. Top Right: Singular values of the pressure snapshot matrix \(\mathbf {X}_p\). Bottom Left: Singular values of the saturation snapshot matrix \(\mathbf {X}_s\). Bottom Right: Singular values of the nonlinear function snapshot matrix \(\mathbf {X}_f\)

We further list in Table 1, the \(L_2^{\tiny {\text {rel}}}\) and \(L_{2\tiny {\text {,max}}}^{\tiny {\text {rel}}}\) errors for the saturation field. From Table 1, we can see that the approximation errors obtained from \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) have the same order of magnitude as the least-squares (best approximation) errors. Further, in Table 1, the approximation errors obtained from all ROMs except POD reduced model decrease when we increase the number of POD basis. These results conform with the decay of singular values of the saturation snapshot matrix. In Table 1, the approximation errors obtained from POD reduced model are at least an order of magnitude larger than other methods. Also, we observe that POD reduced model results might be worst when we include more basis functions. These results conform with the results presented in He (2010), where it was shown that selecting large number of basis vectors based on singular values may not lead to stable POD–Galerkin reduced model. Further, it was presented in He (2010) that the relation between the stability property of POD–Galerkin reduced model and the number of basis vectors used in POD–Galerkin projection is somewhat random and that the use of more POD basis vectors do not necessarily lead to improved stability.

5.6 Numerical Test Case 2

In this test case, the boundary conditions are set to no flow boundary conditions on the two sides aligned in the horizontal direction (top and bottom). Water is injected from the left side of the domain boundary, and fluids are produced from the right side boundary of the domain. The total inflow rate from the left side is set to 0.05 and the total outflow rate from the right side to 0.05 as the problem is incompressible. Similar to test case 1, we evaluate all the ROMs for two different number of saturation POD basis functions (\(r=10, 20\)). Also, we fix the number of POD basis for the pressure state vector to 5. Figure 10 shows the setup for test case 2 and the corresponding singular values of the snapshot matrices \(\mathbf {X}_p\), \(\mathbf {X}_s\), and \(\mathbf {X}_f\).

Fig. 11
figure 11

Time plots of mean water saturation obtained from all the ROMs and the full-order model in test case 2. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\). The plots in each row are arranged as per the numerical notation of the spatial points plotted in Fig. 10

Figure 11 shows the time plot of mean water saturation obtained from all the ROMs and from the full model. The display settings in Fig. 11 are the same as defined in Fig. 3. In Fig. 11, we can see that all the results obtained from \(\hbox {DR-RNN}^{\text {p}}\), \(\hbox {DR-RNN}^{\text {pd}}\), and the LS fit (the best approximation) closely approximate the full model whereas POD reduced model yields extremely inaccurate results regardless of the number of utilized POD basis.

Fig. 12
figure 12

Comparison of mean water saturation field at time \( = 0.4\ \hbox {PVI}\) for test case 2. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Fig. 13
figure 13

Comparison of standard deviation of the water saturation field at time \(= 0.4\ \hbox {PVI}\) for test case 2. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Figures 1213, and 14 show the results for the mean and standard deviation of the saturation field at \(0.4\ \hbox {PVI}\). From these figures, we can conclude that all the plots obtained from DR-RNN ROMs are almost indistinguishable from the LS fit (the best approximation) results, whereas the plots obtained from POD reduced model (Fig. 14) exhibit significant discrepancy when compared to the plots shown in Fig. 12. Again, we note that the white spots displayed in Fig. 14 are the regions whose values are out of the limits marked in the respective colorbar.

Fig. 14
figure 14

Plot of saturation mean and standard deviation of the water saturation field at time \(= 0.4\ \hbox {PVI}\) obtained from the POD reduced model for test case 2. Left: saturation mean. Right: standard deviation. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\)

Fig. 15
figure 15

Comparison of kernel density estimated probability density function (PDF) at time \(= 0.4\ \hbox {PVI}\) obtained from all ROMs w.r.t. true PDF obtained from the full-order model for test case 2. Top Row: number of POD basis used \(=10\). Bottom Row: number of POD basis used \(=20\). The plots in each row are arranged as per the numerical notation of the spatial points plotted in Fig. 10

Figure 15 compares the saturation PDF estimated from the ensemble of numerical solutions obtained from all the ROMs and the full model. The plotted results show that \(\hbox {DR-RNN}^{\text {p}}\), \(\hbox {DR-RNN}^{\text {pd}}\) predictions are nearly indistinguishable from the plots obtained from the full model and are very close to the best possible approximation using LS fit. Further, Fig. 15 shows that all the saturation PDFs obtained from full model are unimodal distribution. Similar to test case 1, POD reduced model yields inaccurate approximation of the saturation PDFs.

Table 2 Performance chart of all the ROMs employed for test case 2. \(L_2^{\tiny {\text {rel}}}\) and \(L_{2\tiny {\text {,max}}}^{\tiny {\text {rel}}}\) error estimators are defined in Eq. (30). The number of POD basis used \(=10\) and 20

We further list in Table 2, the error metrics \(L_2^{\tiny {\text {rel}}}\) and \(L_{2\tiny {\text {,max}}}^{\tiny {\text {rel}}}\) for the saturation fields. From Table 2, we can see that the approximation errors obtained from \(\hbox {DR-RNN}^{\text {p}}\) and \(\hbox {DR-RNN}^{\text {pd}}\) are almost close to the least-squares (best approximation) approximation errors. However, the POD reduced model yields very inaccurate results due to numerical instabilities.

6 Conclusion

In this work, we extended the DR-RNN introduced in Nagoor Kani and Elsheikh (2017) into nonlinear multi-phase flow problem with distributed uncertain parameters. In this extended formulation, DR-RNN based on the reduced residual obtained from POD–DEIM reduced model is used to construct the reduced-order model termed \(\hbox {DR-RNN}^{\text {pd}}\). We evaluated the proposed \(\hbox {DR-RNN}^{\text {pd}}\) on two forward uncertainty quantification problems involving two-phase flow in subsurface porous media. The uncertainty parameter is the permeability field modeled as log-normal distribution. In the two test cases, full-order model and ROMs are solved for 2000 random permeability realizations to estimate an ensemble-based statistics using Monte Carlo method. Full model and POD reduced model used implicit time stepping method as the time step size violates the numerical stability condition. However, \(\hbox {DR-RNN}^{\text {pd}}\) architecture employs explicit time stepping procedure for the same step size used in full model and POD reduced model. Hence, \(\hbox {DR-RNN}^{\text {pd}}\) had a limited computational complexity \(\mathcal {O}(K \times r^2)\) instead of \(\mathcal {O}(p \times r^3)\) per saturation update, where r is the dimension of the POD reduced model, \(K \ll p\) is the number of stacked network layers in DR-RNN and p is the average number of Newton iterations used in the standard POD–DEIM reduced model. The obtained numerical results show that \(\hbox {DR-RNN}^{\text {pd}}\) provides accurate and stable approximations of the full model in comparison with the standard POD reduced model.

Future work should consider the development of accurate and stable \(\hbox {DR-RNN}^{\text {pd}}\) for UQ tasks involving subsurface flow simulations with the additional effects including the capillary pressure, compressibility, and the gravitational effects. In addition, it will be of interest to explore the applicability of \(\hbox {DR-RNN}^{\text {pd}}\) for UQ tasks with the permeability fields that has randomly oriented channels or barriers. The use of \(\hbox {DR-RNN}^{\text {pd}}\) for history matching (Elsheikh et al. 2012, 2013), where we minimize the mismatch between simulated and field observation data by adjusting the geological model parameters, is also expected to show significant reduction of the computational cost.