Top

Complex & Intelligent Systems

Published in:

Open Access 25-04-2022 | Original Article

A Lagrangian dual-based theory-guided deep neural network

Authors: Miao Rong, Dongxiao Zhang, Nanzhe Wang

Published in: Complex & Intelligent Systems | Issue 6/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

The theory-guided neural network (TgNN) is a kind of method which improves the effectiveness and efficiency of neural network architectures by incorporating scientific knowledge or physical information. Despite its great success, the theory-guided (deep) neural network possesses certain limits when maintaining a tradeoff between training data and domain knowledge during the training process. In this paper, the Lagrangian dual-based TgNN (TgNN-LD) is proposed to improve the effectiveness of the training process. We convert the original loss function into a constrained form with several items, in which partial differential equations (PDEs), engineering controls (ECs), and expert knowledge (EK) are regarded as constraints, with one Lagrangian variable per constraint. These Lagrangian variables are incorporated to achieve an equitable trade-off between observation data and corresponding constraints, to improve prediction accuracy and training efficiency. To investigate the performance of the proposed method, the original TgNN model with a set of optimized weight values adjusted by ad-hoc procedures is compared on a subsurface flow problem, with their L2 error, R square (R2), and computational time being analyzed. Experimental results demonstrate the superiority of the Lagrangian dual-based TgNN.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

The deep neural network (DNN) has achieved significant breakthroughs in various scientific and industrial fields [1‐4]. Like most data-driven models in artificial intelligence [5], they are also dependent on a large amount of training data. However, the cost and the difficulty of collecting data in some areas, especially in energy-related fields, hinder the development of (deep) neural networks. To further increase their generalization, theory-guided data science models, which bridge scientific problems and complex physical phenomena, have gained increased popularity in recent years [6‐8].

As a successful representative, the theory-guided neural network framework, also called a physical-informed neural network framework or an informed deep learning framework, which incorporates the theory (e.g., governing equations, other physical constraints, engineering controls, and expert knowledge) into (deep) neural network training, has been applied to construct the prediction model, especially in industries with limited training data [6, 7]. Herein, the theory may refer to scientific laws and engineering theories [6], which can be regarded as a kind of prior knowledge. Such knowledge is combined with training data to improve training performance during the learning process. In the loss function, they are usually transformed into regularization terms and added up with the training term [9]. Owing to the existence of theory, predictions obtained by TgNN take physical feasibility and knowledge beyond the regimes covered with the training data into account. As a result, TgNN can obtain a training model with better generalization, and can achieve higher accuracy than the traditional DNN [1, 7].

Although the introduction of theory expands the application of data-driven models, the trade-off between observation data and theory should be equitable. Herein, we first provide its theory-incorporated mathematical formulation, as shown in Eq. (1)

https://static-content.springer.com/image/art%3A10.1007%2Fs40747-022-00738-1/MediaObjects/40747_2022_738_Equ1_HTML.png

(1)

where $\lambda _i$ and ${\text {MSE}}_i$ denote the weight and mean square error for ith term, respectively; i refers to each concept among DATA, IC, BC, PDE, EC, and EK; and $\lambda = \left[ {\lambda _{\text {DATA}} ,\lambda _{\text {IC}} ,\lambda _{\text {BC}} ,\lambda _{\text {PDE}} ,\lambda _{\text {EC}} ,\lambda _{\text {EK}} } \right] $, where the term DATA refer to the observation data or training data. The remaining terms are the added theory in the (D)NN model. The governing equations consist of terms PDE, IC, BC, EC, and EK, referring to partial differential equations, initial conditions, boundary conditions, engineering control, and engineering knowledge, respectively. Each weight term represents the importance of the corresponding term in the loss function. In addition, not only might the values of these terms be at different scales, but their physical meanings and dimensional units can also be distinct. Therefore, balancing the trade-off among these terms is critical.

If viewing these weight variables as neural architecture parameters, the gradient of $\lambda _i$ is calculated as ${\text {MSE}}_i$ from Eq. (1), i.e., a constant nonnegative value, making $\lambda _i$ continuously decrease until negative infinity with the increase of iterations at the stage of back-propagation [10, 11]. Therefore, due to the existence of theory, i.e., regularization terms in the loss function, it is difficult to determine the weights of each term in comparison with the training data term. If set inappropriately, it is highly possible to increase the training time, or even impede the convergence of the optimizer, contributing to an inaccurate training model. Consequently, the adjustment of these weight values is essential. In most existing literature, researchers often adjust these values by experience [6, 8, 9]. However, if these weights are not at the same scale, this will inevitably create a heavy burden on researchers and place a constraint on the ability to conserve human time.

Recently, babysitting or evolutionary computation based techniques [12], such as grid search [13, 14] and genetic algorithm [15, 16], have achieved rising popularity for hyper-parameter optimization [17‐19]. If TgNN first generates an initial set of weights, followed by comparing and repeating this searching process until the most suitable set of weight values is found or the stopping criterion is met, the training time will be absolutely extended. In contrast, if the search for optimized weight values can be incorporated into the training process, the training time may be shortened.

In recent years, Lagrangian dual approaches have been widely combined with the (deep) neural network to improve the latter’s ability when dealing with problems with constraints [20‐23]. Ferdinando et al. pointed out that Lagrangian duality can bring significant benefits for applications in which the learning task must enforce constraints on the predictor itself, such as in energy systems, gas networks, transprecision computing, among others [20]. Walker et al. incorporated the Lagrangian dual approach into laboratory and prospective observational studies [24]. Gan et al. developed a Lagrangian dual-based DNN framework for trajectory simulation [23]. Pundir and Raman proposed a dual deep learning method for image-based smoke detection [25]. The above contributions can improve the performance of the original deep neural network framework and provide more accurate predictive training models.

Lagrangian dual combined with (D)NN network has been proved as a good tool when constructing predictive models with constraints. In TgNN network, there exist several constraint terms to be considered apart from the data term, posing great challenge to training process. Ad-hoc procedures for adjusting the weights of these constraint terms will not only demand high computational time and resources, but also deteriorate the predictive accuracy. Additionally, the ad-hoc procedure is strongly dependent on prior knowledge and experience. Having realized this, we propose the Lagrangian dual-based TgNN (TgNN-LD) to provide theoretical guidance for the adjustment of weight values in the loss function of the theory-guided neural network framework. In our method, the Lagrangian dual framework is incorporated into the TgNN training model, and controls the update of weights with the purpose of automatically changing the weight values and producing accurate predictive results within limited training time. Moreover, to better set forth our approach, we select a subsurface flow problem as a test case in the experiment.

The reminder of this paper proceeds as follows. The section “Proposed method” briefly describes the mathematical formulation of the TgNN model, followed by details of the proposed method. The experimental settings with the investigation of corresponding results are provided in the sections “Experimental settings” and “Comparisons and results”, respectively. Finally, the section “Conclusion” concludes the paper and suggests directions for future research.

Proposed method

In this paper, we consider the following optimization problem:

https://static-content.springer.com/image/art%3A10.1007%2Fs40747-022-00738-1/MediaObjects/40747_2022_738_Equ2_HTML.png

(2)

since the initial condition (IC) and boundary condition (BC), which impose restrictions for the decision space, can be regarded as a part of the data term. The values of $\lambda _{\text {PDE}}$, $\lambda _{\text {EC}}$, and $\lambda _{\text {EK}}$ have a great impact on the optimization results. As discussed previously, not only might the values of these terms be at different scales, but their physical meanings and dimensional units can also be dissimilar. As a consequence, by introducing the formation of Eq. (2), $\lambda _i$ is expected to achieve a normalized balance between training data and the other terms. If it is inappropriately assigned, however, the prediction accuracy will be diminished, and both training time and computational cost will markedly increase. To theoretically determine the weight values and maintain the balance of each term in TgNN, we propose the Lagrangian dual-based TgNN framework.

Problem description

We first introduce the following mathematical descriptions of governing equations of the underlying physical problem:

$$\begin{aligned}&\begin{array}{l} L_p u\left( {x,t} \right) = l\left( {x,t} \right) ,x \in \varOmega ,t \in \left( {0,T} \right] \end{array} \end{aligned}$$

(3)

$$\begin{aligned}&\begin{array}{l} Iu\left( {x,0} \right) = q\left( {x,t} \right) ,x \in \varOmega \end{array} \end{aligned}$$

(4)

$$\begin{aligned}&\begin{array}{l} Bu\left( {x,t} \right) = p\left( x \right) ,x \in \partial \varOmega ,t \in \left( {0,T} \right] , \end{array} \end{aligned}$$

(5)

where $\varOmega \subset R^d$ and $t \in \left( {0,T} \right) $ denote the spatial and temporal domains, respectively, with $\partial \varOmega $ as the spatial boundaries; $L_p$ and $L_p u$ represent a differential operator and its spatial derivatives, respectively, of u; l is a forcing term; and I and B are two other operators which define the initial and boundary conditions, respectively.

As discussed in Eq. (1), TgNN incorporates theory into DNN by the summation of corresponding terms. It is constructed based on the following six general parts shown in Eq. (6):

$$\begin{aligned} \left\{ {\begin{array}{*{20}l} {{\text {MSE}}_{\text {DATA}} = \frac{1}{{n_d }}\sum \limits _{i=1}^{n_d} {\left| {N_{u} \left( {x_{}^i ,y_{}^i } \right) - u_i \left( {x_{}^i ,y_{}^i } \right) } \right| ^2 } } \\ {\text {MSE}_{\text {PDE}} = \frac{1}{{n_f }}\sum \limits _{i=1}^{n_f } {\left| {f\left( {t_f^i ,x_f^i ,y_f^i } \right) } \right| ^2 } } \\ {\text {MSE}_{\text {IC}} = \frac{1}{{n_{IC} }}\sum \limits _{i=1}^{n_{IC} } {\left| {N_{u} \left( {x_i ,0} \right) - u_{I} \left( {x_i ,0} \right) } \right| ^2 } } \\ {\text {MSE}_{\text {BC}} = \frac{1}{{n_{BC} }}\sum \limits _{i=1}^{n_{BC} } {\left| {N_{u} \left( {\partial \varOmega _i ,t^i } \right) - u_{B} \left( {\partial \varOmega _i ,t^i } \right) } \right| ^2 } } \\ {\text {MSE}_{\text {EC}} = \frac{1}{{n_{\text {EC}} }}\sum \limits _{i=1}^{n_{EC} } {\left| ReLU \left( {\text {EC}} \left( {x_{}^i ,y_{}^i, t_{}^i } \right) \right) \right| ^2 } } \\ {\text {MSE}_{\text {EK}} = \frac{1}{{n_{\text {EK}} }}\sum \limits _{i=1}^{n_{EK} } {\left| ReLU \left( {\text {EK}} \left( {x_{}^i ,y_{}^i, t_{}^i } \right) \right) \right| ^2 }, } \\ \end{array}} \right. \end{aligned}$$

(6)

where $f: = L_p N_{u} \left( {x,t} \right) - l\left( {x,t,y} \right) $ needs to approach to 0, representing the residual of the partial differential according to Eq. (3); $\left\{ {t_f^i ,x_f^i ,y_f^i } \right\} _{i = 1}^{n_f }$ denotes the collocation points of the residual function with the size of $n_f$, which can be randomly chosen, because no labels are needed for these points; $N_{u}$ is the approximation of a solution u obtained by the (deep) neural network; $n_{d}$ represents the numbers of training data; $n_{\text {IC}}$ and $n_{\text {BC}}$ denote the collocation points for the evaluation of initial and boundary conditions, respectively; and $n_{\text {EC}}$ and $n_{\text {EK}}$ denote the collocation points of engineering control and knowledge, respectively.

Problem transformation

Having realized the mathematical description of each term, we then convert the original loss function of the TgNN model, which incorporates scientific knowledge and engineering controls, to be re-written as Eq. (7)

$$\begin{aligned} \begin{array}{l} \min \,L\left( \theta \right) = \text {MSE}_{\text {DATA}} + \text {MSE}_{\text {IC}} + \text {MSE}_{\text {BC}} \\ s.t. \, \left\{ {\begin{array}{*{20}l} {f = 0} \\ {{\text {EC}} \le 0} \\ {{\text {EK}} \le 0} \\ \end{array}} \right. \\ \end{array} \end{aligned}$$

(7)

Herein, $L(\theta )$ is the converted writing form of $L(\theta )$ in Eqs. (1) and (2) and they are the same.

Following this, a Lagrangian duality framework [20], which incorporates a Lagrangian dual approach into the learning task, is employed to learn this constrained optimization problem and approximate minimizer $L\left( \theta \right) $. Given three multipliers, $\lambda _1$, $\lambda _2$, and $\lambda _3$, corresponding to per constraint, consider the Lagrangian loss function

$$\begin{aligned} L_\lambda \left( \theta \right) = L\left( \theta \right) + \lambda _1 \nu \left( {f } \right) + \lambda _2 \nu \left( {EC } \right) + \lambda _3 \nu \left( {\text {EK} } \right) , \end{aligned}$$

(8)

where $\nu $ can be written as

$$\begin{aligned} \nu \left( {g\left( x \right) } \right) = \mathrm{ReLU}\left( {g\left( x \right) } \right) . \end{aligned}$$

(9)

According to [20], the previous Lagrangian loss function with respect to multipliers $\lambda _i \left( {i = 1,\ldots ,3} \right) $, can then be transformed into a function with the purpose of finding the $\omega $, which can minimize the Lagrangian loss function, solving the optimization problem shown as follows:

$$\begin{aligned} \omega ^* \left( {\lambda _i } \right) = \mathop {\arg \min }\limits _\omega L_{\lambda _i } \left( {\mathrm{M}\left[ \omega \right] } \right) . \end{aligned}$$

(10)

Herein, we denote an approximation of the approximated optimizer $\mathrm{O}$ as $\tilde{\mathrm{O}}_\lambda = {M}\left[ {\omega ^* \left( \lambda \right) } \right] \left( {\lambda = \left\{ {\lambda _i \left| {i = 1,\ldots ,3} \right. } \right\} } \right) $, which can be produced by an optimizer (in our paper, we use Adam) during the training process.

Next, the above problem is transformed by the Lagrangian dual approach into searching the required optimal multipliers for a max-min problem shown as follows:

$$\begin{aligned} \lambda ^* = \arg \mathop {\max }\limits _{\lambda _i \left( {i = 1,\ldots ,3} \right) } \mathop {\min }\limits _\omega \sum \limits _{j = 1}^n {L_\lambda \left( {{M}\left[ {\omega ^* \left( \lambda \right) } \right] , j} \right) }. \end{aligned}$$

(11)

The same as [20], we denote this approximation as $\tilde{\mathrm{O}}_\lambda = { M}\left[ {\omega ^* \left( {\lambda _{}^* } \right) } \right] $.

To summarize [20], the repeated calculation and search process by the Lagrangian dual framework adhere to the following steps:

(a)

Learn $\tilde{\mathrm{O}}^k _\lambda $;

(b)

Let $y_j^k = \tilde{\mathrm{O}}_\lambda \left( {d_j } \right) $, where $y_j^k$ denotes the jth training sample, and $d_j^{}$ refers to the label;

(c)

$\lambda _i^{k + 1} = \lambda _i^k + s_k \sum \nolimits _{j = 1}^n {v\left( {g\left( {y_j^k ,d_j } \right) \le 0} \right) } ,\mathrm{{ }}i = 1,\ldots ,3$,

where $s_k$ and k refer to the update step size and the current iteration, respectively. In our problem, we recommend $s_k \in \left[ {1.1,1.4} \right] $ carefully acquired by extensive experiments, and utilize $s_k = 1.25$ in latter parts.

Experimental settings

In this section, we take a 2-D unsteady-state single-phase subsurface flow problem [9] as the test case to investigate the performance of the proposed Lagrangian dual-based TgNN.

Parameter settings and scenario description

The governing equation of the subsurface flow problem in our experiment can be written as Eq. (12)

$$\begin{aligned} \begin{aligned} S_s \frac{{\partial h\left( {t,x,y} \right) }}{{\partial t}} =&\frac{\partial }{{\partial x}}\left( {K\left( {x,y} \right) \cdot \frac{{\partial h\left( {t,x,y} \right) }}{{\partial x}}} \right) \\&- \frac{\partial }{{\partial y}}\left( {K\left( {x,y} \right) \cdot \frac{{\partial h\left( {t,x,y} \right) }}{{\partial y}}} \right) , \end{aligned} \end{aligned}$$

(12)

where h is the hydraulic head that needs to be predicted; and $S_s = 0.0001$ and K are the specific storage and the hydraulic conductivity field, respectively. When h is approximate with the neural network, ${N_h \left( {t,x,y;\theta } \right) }$, the residual of the governing equation of flow can be written as

$$\begin{aligned} \begin{aligned} f: =&S_s \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial t}} \\&- \frac{\partial }{{\partial x}}\left( {K\left( {x,y} \right) \cdot \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial x}}} \right) \\&- \frac{\partial }{{\partial y}}\left( {K\left( {x,y} \right) \cdot \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial y}}} \right) . \end{aligned} \end{aligned}$$

(13)

In Eq.(13), the partial derivatives of ${\partial N_h \left( {t,x,y;\theta } \right) }$ can be calculated in the network, while the partial derivatives of ${K\left( {x,y} \right) }$ are, in general, required to be computed by numerical difference. Herein, we view the hydraulic conductivity field as a heterogeneous parameter field, i.e., a random field following a specific distribution with corresponding covariance [9]. Since its covariance is known, the Karhunen–Loeve expansion (KLE) is utilized to parameterize this kind of heterogeneous model. As a result, the residual of the governing equation can be re-written as Eq. (14)

$$\begin{aligned} \begin{aligned} f: =&S_s \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial t}} \\&- \frac{\partial }{{\partial x}}\left( {e^{\bar{Z}\left( {x,y} \right) + \sum \limits _{i = 1}^n {\sqrt{\lambda _i } f_i \left( {x,y} \right) \xi _i \left( \tau \right) } } \cdot \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial x}}} \right) \\&- \frac{\partial }{{\partial y}}\left( {e^{\bar{Z}\left( {x,y} \right) + \sum \limits _{i = 1}^n {\sqrt{\lambda _i } f_i \left( {x,y} \right) \xi _i \left( \tau \right) } } \cdot \frac{{\partial N_h \left( {t,x,y;\theta } \right) }}{{\partial y}}} \right) , \\ \end{aligned} \end{aligned}$$

(14)

where ${\bar{Z}\left( {x,y} \right) + \sum \limits _{i = 1}^n {\sqrt{\lambda _i } f_i \left( {x,y} \right) \xi _i \left( \tau \right) } }$ represents the hydraulic conductivity field $Z\left( {x,y} \right) = \ln K\left( {x,\tau } \right) $ with ${\xi _i \left( \tau \right) }$ as the ith independent random variable of the field $Z\left( {x,y} \right) $. For the sake of fairness, the settings of our experiments remain the same as those in [9], listed in Table 1. In the experimental part, MODFLOW, a commercial software in hydrological field, is adopted to perform the simulations to obtain the required training dataset. All training samples in the experimental part are generated by it.

Table 1

Scenario settings

A square domain	Evenly divided into $51\times 51$ grid blocks
Length in both directions of domain	1020[Len], Len denotes any consistent length unit
Specific storage $S_s$	$0.0001\left[ {Len^{ - 1} } \right] $
The total simulation time	$10\left[ T \right] $ ($\left[ T \right] $ denotes any consistent time unit)
Each time step	$0.2\left[ T \right] $ (50 time steps in total)
The correlation length of the field	$\eta = 408\left[ {Len} \right] $
Hydraulic conductivity field settings	Parameterized through KLE with 20 terms retained in the expansion, i.e., 20 random variables represent this field
Initial conditions	$H_{t = 0,x = 0} = 1\left[ {Len} \right] $ $H_{t = 0,x \ne 0} = 0\left[ {Len} \right] $
Prescribed heads	The left boundary: $H_{x = 0} = 1\left[ {Len} \right] $ The right boundary: $H_{x = 1020} = 0\left[ {Len} \right] $ two lateral boundaries: no-flow boundaries
The log hydraulic conductivity	mean: $\left\langle {\ln K} \right\rangle = 0$ variance:$\sigma _K^2 = 1.0$

Compared methods

Wang et al. determined a set of $\lambda $ values for TgNN via an ad-hoc procedure, which is $\lambda =\left[ \lambda _{\text {DATA}} ,\lambda _{\text {IC}} ,\lambda _{\text {BC}} ,\lambda _{\text {PDE}} ,\lambda _{\text {EC}} ,\lambda _{\text {EK}} \right] =[1,100,1,1,1,1]$ [9], for this particular problem. To investigate whether there are improvements in predictive accuracy, we utilize this weight setting as one of the compared methods and denote this case as TgNN. In addition, for comparison, we also take a naive approach by setting the weights equally as $\lambda =[1,1,1,1,1,1]$ and denote this case as TgNN-1.

Comparisons and results

This section first evaluates the predictive accuracy of the proposed Lagrangian dual-based TgNN framework on a subsurface flow problem, in comparison with TgNN and TgNN-1. Subsequently, we reduce the training epochs to observe whether the efficiency can be improved with less training time. Changes of Lagrangian multipliers are then recorded with their final values being assigned into the loss function to compare predictive performances obtained by dynamic adjustment and fixed values. Furthermore, different levels of noise are added into the training data to observe the effect on the predictive results caused by noise. Finally, the stopping criterion is substituted with a dynamic epoch, which has a relationship with changes of loss values, to control the training process.

To begin with, the distribution of the hydraulic conductivity field is provided in Fig. 1a with the reference hydraulic head at $t=50$ in Fig. 1b.

Predictive accuracy

We first compare the above three methods with the number of iterations of 2000. Table 2 provides the results of error L2 and R2, as well as the training time, with the best result per measure metric being marked in bold. It should be pointed out that results in experimental parts are the best over 20 independent runs.

Table 2

Results of error L2, R2, and training time obtained by TgNN-LD, TgNN, and TgNN-1

	Error L2	R2	Training time (s)
TgNN-LD	2.0833E−04	9.9532E−01	204.3223
TgNN	3.5648E−04	9.8735E−01	195.3986
TgNN-1	4.5596E−04	9.7931E−01	186.2980

As shown in Table 2, TgNN-LD can obtain the best-error L2 and R2 results, which are obviously smaller and larger than the other two, respectively. It is worth noting that its training time seems lightly inferior among all three methods, probably due to the extra computation brought by the introduction of three Lagrangian multipliers. However, in comparison with TgNN-1 with naive settings, our proposed TgNN-LD achieves much better error L2 and R2 results. Moreover, compared to TgNN, in which the weights are adjusted by expertise, our method can not only save babysitting time in the preliminary stage of determining a better set of weights, but produce superior results, as well.

To observe the changing trend of each loss, we plot the loss values versus iterations for each method, as shown in Fig. 2. Since TgNN places more emphasis on the PDE term, herein, we only provide changes of the total loss (denoted as loss), the data term (denoted as $f1\_loss$), and the PDE term (denoted as $f2\_loss$).

It can be seen from Fig. 2 that TgNN-LD can obtain losses with less fluctuation, and TgNN takes second place, and TgNN-1 achieves the worst results. The most remarkable difference lies in changes of $f2\_loss$, which corresponds to the PDE term. The proposed TgNN-LD exhibits much more stable states, whereas there are many shocks in both TgNN and TgNN-1. Indeed, in terms of both smoothness and the number of iterations, TgNN-LD achieves superior performance.

Figure 3 compares the correlation between the reference and predicted hydraulic head at t=50 with the iteration number of 2000. In Fig.3, the horizontal and vertical coordinates have the same scales and ranges, suggesting that the closer the distribution gets to the diagonal of the coordinate axis, the more accurate the prediction results are. It can be obviously seen that TgNN-LD outperforms its counterparts, since it has much more evenly distributed predicted results closer to reference.

Figure 4 illustrates predicted hydraulic head at different locations at t=50 obtained by all three algorithms drawn in red dotted lines with reference hydraulic head in blue. TgNN-LD is capable of achieving predicted hydraulic head closer to the reference, almost covering the reference at y=620 and y=920, while the other two algorithms have significant discrepancy. Although there exists a slight difference obtained by TgNN-LD at y=320, results of its counterparts at the same locations are much more inferior, as shown in Fig. 4b, c. As a result, it can still be inferred that TgNN-LD has remarkable superiority.

Figure 5 provides the distribution of predicted hydraulic head at t=50 obtained by TgNN-LD and its counterparts. For better observation, we draw a set of heatmaps to depict their difference with reference hydraulic head, denoted as $\delta H$, on the right. For sub-figures on the right column, the larger and the more evenly the distribution of the lighter area, the better the result. As shown in Fig. 5, TgNN-LD has a much wider light-colored area than its counterparts, indicating that its predicted H is the most similar to the reference.

From these figures, it can be clearly seen that the prediction of TgNN-LD matches the reference values well and is superior to the predictions of the other two.

Reduced training epochs

From Fig. 2a–c, it can be observed that when the number of iterations is approximately 1750, three losses are approaching to converge, suggesting that the number of iterations could be reduced to shorten the training time. To verify its effect, we set different numbers of iterations to test our method. The related results are presented in Table 3.

Table 3

Results versus different numbers of iterations

Number of iterations	Error L2	R2	Training time (s)
1500	3.5398E−04	9.8753E−01	149.6769
1700	2.4923E−04	9.9382E-01	159.2913
1750	1.9887E−04	9.9606E−01	164.3937
1800	2.3961E−04	9.9429E−01	168.9544
2000	2.0833E−04	9.9532E−01	204.3223

Table 4

Values of Lagrangian multipliers at the final iteration versus different iteration number

Number of iterations	$\lambda _{\text {PDE}}$	$\lambda _{\text {EC}}$	$\lambda _{\text {EK}}$
1700	7.7076E+00	8.5914E−01	1.1813E+00
1750	9.9854E+00	8.6531E−01	2.6842E+00
1800	8.8720E+00	8.6241E−01	9.9864E−01
2000	8.1127E+00	2.8652E−01	1.0940E+00

As shown in Table 3, when the number of iteration is 1750, even though the training time is 15s longer than that obtained with the iteration number of 1500, the error L2 and R2 results exhibit the best performance among the compared sets. Especially, in comparison with a result iteration number of 2000, the training time almost decreases by 20%, while the error L2 and R2 results are superior.

Changes of Lagrangian multipliers

To further investigate the effect of Lagrangian multipliers under different iteration values, we plot their changes with iterations, as shown in Fig. 6, where Lambda, Lambda1, and Lambda2 refer to $\lambda _{\text {PDE}}$, $\lambda _{\text {EC}}$, and $\lambda _{\text {EK}}$, respectively.

It can be seen from Fig. 6 that these Lagrangian multipliers cannot converge to a fixed value, irrespective of the number of iterations, which is rooted in randomly selected seeds leading to various initial values. Table 4 lists their final values with respect to different settings of iterations.

We then take one set of multipliers (8.1127E+00, 2.8652E−01, and 1.0940E+00) into Eq. (1) and keep them the same during the whole training stage to further verify the above conclusion. Related results are shown in Fig. (7). From Fig. 7, it can be seen that the predictive results obtained by fixed multipliers’ values are not as good as the ones obtained by dynamic changing multipliers.

Table 5

Prediction results with different noise percentages obtained by TgNN-LD, TgNN, and TgNN-1

	noise level	L2 error	R2	Training time (s)
TgNN-LD	$\alpha \% = 5\%$	2.2848E−04	9.9481E−01	206.9471
TgNN	$\alpha \% = 5\%$	4.9887E−04	9.7524E−01	204.6485
TgNN-1	$\alpha \% = 5\%$	6.4963E−04	9.5802E−01	201.8772
TgNN-LD	$\alpha \% = 10\%$	2.6835E−04	9.9285E−01	208.6079
TgNN	$\alpha \% = 10\%$	4.9018E−04	9.7614E−01	202.5091
TgNN-1	$\alpha \% = 10\% $	5.6168E−04	9.6867E−01	205.8537
TgNN-LD	$\alpha \% = 20\% $	3.6974E-04	9.8652E−01	204.9997
TgNN	$\alpha \% = 20\% $	5.5094E−04	9.7007E−01	204.1725
TgNN-1	$\alpha \% = 20\%$	7.2383E−04	9.4833E−01	204.9317

Predicting the future response from noisy data

To investigate the robustness of our proposed method, we add noise with the following formulation into the training data [9]:

$$\begin{aligned} h^*\left( {t,x,y} \right) = h\left( {t,x,y} \right) + h_{\text {diff}} \left( {x,y} \right) \times \alpha \% \times \varepsilon , \end{aligned}$$

(15)

where $h_{\text {diff}} \left( {x,y} \right) $, $\alpha \%$, and $\varepsilon $ denote the maximal difference obtained at location $\left( {x,y} \right) $ during the entire monitoring process, the noise level, and a uniform variable ranging from − 1 to 1.

Figures 8, , 9, and 10 show the predictive results obtained by TgNN-LD, TgNN, and TgNN-1 under noise levels of 5%, 10%, and 20%, respectively, with their error L2 and R2 results listed in Table5. From Figs.8, 9, and 10 and Table 5, it can be found that TgNN-LD can always obtain the best results among the three methods on correlation between the reference and predicted hydraulic head when noise exists. Moreover, it can also achieve the best results on the error L2 and R2 results. Although the training time obtained by TgNN-LD under different noise levels seems slightly longer than the other two, comparisons between the prediction and reference demonstrate the improvement of incorporating the Lagrangian dual approach.

Table 6

A number of good prediction results under the dynamic epoch obtained by TgNN-LD

Stopping epoch	Error L2	R2	Training time (s)
1712	2.0178E−04	0.995948184	164.2439
1810	2.0807E−04	0.99569132	204.4629
1829	2.1342E−04	0.995467097	203.6778

Training under the dynamic epoch

To investigate the predictive performance more deeply, we substitute the stopping criterion, i.e., a fixed number of iterations, with a dynamic epoch, which has a close relationship with changes of loss values. From Fig. 2, it can be seen that the training process has usually already converged with the iteration number of 2000. Therefore, we set the total number of epoch, denoted as $n_{\text {total}}$, as 2000. The dynamic epoch is denoted as $n_D$.

In our experiment, we maintain a time window with the length of $L_C$. For per obtained loss value, we compare whether loss values in the current time window are less than a threshold, $\beta $. If this criterion is satisfied, we stop the iteration and output the predictive results. Herein, we recommend $L_C=10$ and $\beta =0.006$. Table 6 lists a number of good prediction results under the dynamic epoch obtained by TgNN-LD, with their results on correlation between reference and prediction and prediction versus reference being presented in Fig. 11. From the numerical results in Table 6, it seems that TgNN-LD can come to convergence by a dynamic stopping epoch.

Conclusion

In this paper, we propose a Lagrangian dual-based TgNN framework to assist in balancing training data and theory in the TgNN model. It provides theoretical guidance for the update of weights for the theory-guided neural network framework. Lagrangian duality is incorporated into TgNN to automatically determine the weight values for each term and maintain an excellent trade-off between them. The subsurface flow problem is investigated as a test case. Experimental results demonstrate that the proposed method can increase the predictive accuracy and produce a superior training model compared to that obtained by an ad-hoc procedure within limited computational time.

In the future, we would like to combine the proposed Lagrangian dual-based TgNN framework with more informed deep learning approaches, such as TgNN with weak-form constraints. It can also be utilized to solve more application problems, such as the two-phase flow problem in energy engineering, to enhance the training ability of TgNN and achieve accurate predictions.

Acknowledgements

This work was jointly supported by National Natural Science Foundation of China (No. 62103255).

Declarations

Conflict of interest

The authors declare that they have no conflict of interest.

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article A wavelet convolutional capsule network with modified super resolution generative adversarial network for fault diagnosis and classification

next article Strengthening intrusion detection system for adversarial attacks: improved handling of imbalance classification problem

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489CrossRef

Chen Y, Sun X, Jin Y (2019) Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2953131CrossRef

Chen Y, Zhang J, Yeo CK (2019) Network anomaly detection using federated deep autoencoding gaussian mixture model. Int Conf Mach Learn Netw. https://doi.org/10.1007/978-3-030-45778-5_1CrossRef

Tan M, Chen B, Pang R, Vasudevan V, Sandier M, Howard A, Le QV (2019) MnasNet: platform-aware neural architecture search for mobile. In: IEEE conference on computer vision and pattern recognition, pp 2815–2823

Hong T, Wang Z, Luo X, Zhang W (2020) State-of-the-art on research and applications of machine learning in the building life cycle. Energy Build. https://doi.org/10.1016/j.enbuild.2020.109831CrossRef

Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, Shekhar S, Samatova N, Kumar V (2017) Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans Knowl Data Eng 29(10):2318–2331CrossRef

Karpatne A, Watkins W, Read J, Kumar V (2017) Physics-guided neural networks (pgnn): an application in lake temperature modeling. arXiv:1710.11431

Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707MathSciNetCrossRefMATH

Wang N, Zhang D, Chang H, Li H (2020) Deep learning of subsurface flow via theory-guided neural network. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.124700CrossRef

10.

Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: IEEE conference on computer vision and pattern recognition, pp 8697–8710

11.

Chen Y, Sun X, Hu Y (2019) Federated learning assisted interactive eda with dual probabilistic models for personalized search. In: International conference on swarm intelligence, pp 374–383

12.

Wang X, Jin Y, Schmitt S, Olhofer M (2020) An adaptive Bayesian approach to surrogate-assisted evolutionary multi-objective optimization. Inf Sci 519:317–331MathSciNetCrossRefMATH

13.

Fayed HA, Atiya AF (2019) Speed up grid-search for parameter selection of support vector machines. Appl Soft Comput 80:202–210CrossRef

14.

Chen H, Liu Z, Cai K, Xu L, Chen A (2018) Grid search parametric optimization for FT-NIR quantitative analysis of solid soluble content in strawberry samples. Vib Spectrosc 94:7-15CrossRef

15.

Han J-H, Choi D-J, Park S-U, Hong S-K (2020) Hyperparameter optimization using a genetic algorithm considering verification time in a convolutional neural network. J Electric Eng Technol 15(2):721–726CrossRef

16.

Martinez-de Pison FJ, Gonzalez-Sendino R, Aldama A, Ferreiro-Cabello J, Fraile-Garcia E (2019) Hybrid methodology based on Bayesian optimization and GA-PARSIMONY to search for parsimony models by combining hyperparameter optimization and feature selection. Neurocomputing 354(SI):20–26CrossRef

17.

Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology. https://doi.org/10.1016/j.geomorph.2020.107201CrossRef

18.

Yao Y, Cao J, Ma Z (2018) A cost-effective deadline-constrained scheduling strategy for a hyperparameter optimization workflow for machine learning algorithms. In: International conference on service-oriented computing, pp 870–878

19.

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305MathSciNetMATH

20.

Fioretto F, Hentenryck PV, Mak TW, Tran C, Baldo F, Lombardi M (2020) Lagrangian duality for constrained deep learning. arXiv:2001.09394

21.

Lombardi M, Baldo F, Borghesi A, Milano M (2020) An analysis of regularized approaches for constrained machine learning. arXiv:2005.10674

22.

Borghesi A, Baldo F, Milano M (2020) Improving deep learning models via constraint-based domain knowledge: a brief survey. arXiv:2005.10691

23.

Gan J, Liu P, Chakrabarty RK (2020) Deep learning enabled Lagrangian particle trajectory simulation. J Aerosol Sci. https://doi.org/10.1016/j.jaerosci.2019.105468CrossRef

24.

Walker BN, Rehg JM, Kalra A, Winters RM, Drews P, Dascalu J, David EO, Dascalu A (2019) Dermoscopy diagnosis of cancerous lesions utilizing dual deep learning algorithms via visual and audio (sonification) outputs: laboratory and prospective observational studies. Ebiomedicine 40:176–183CrossRef

25.

Pundir AS, Raman B (2019) Dual deep learning model for image based smoke detection. Fire Technol 55(6):2419–2442CrossRef

Title: A Lagrangian dual-based theory-guided deep neural network
Authors: Miao Rong
Dongxiao Zhang
Nanzhe Wang
Publication date: 25-04-2022
Publisher: Springer International Publishing
Published in: Complex & Intelligent Systems / Issue 6/2022
Print ISSN: 2199-4536
Electronic ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-022-00738-1

Springer Professional

A Lagrangian dual-based theory-guided deep neural network

Abstract

Publisher's Note

Introduction

Proposed method

Problem description

Problem transformation

Experimental settings

Parameter settings and scenario description

Compared methods

Comparisons and results

Predictive accuracy

Reduced training epochs

Changes of Lagrangian multipliers

Predicting the future response from noisy data

Training under the dynamic epoch

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Premium Partner

A square domain	Evenly divided into \(51\times 51\) grid blocks
Length in both directions of domain	1020[Len], Len denotes any consistent length unit
Specific storage \(S_s\)	\(0.0001\left[ {Len^{ - 1} } \right] \)
The total simulation time	\(10\left[ T \right] \) (\(\left[ T \right] \) denotes any consistent time unit)
Each time step	\(0.2\left[ T \right] \) (50 time steps in total)
The correlation length of the field	\(\eta = 408\left[ {Len} \right] \)
Hydraulic conductivity field settings	Parameterized through KLE with 20 terms retained in the expansion, i.e., 20 random variables represent this field
Initial conditions	\(H_{t = 0,x = 0} = 1\left[ {Len} \right] \) \(H_{t = 0,x \ne 0} = 0\left[ {Len} \right] \)
Prescribed heads	The left boundary: \(H_{x = 0} = 1\left[ {Len} \right] \) The right boundary: \(H_{x = 1020} = 0\left[ {Len} \right] \) two lateral boundaries: no-flow boundaries
The log hydraulic conductivity	mean: \(\left\langle {\ln K} \right\rangle = 0\) variance:\(\sigma _K^2 = 1.0\)

	noise level	L2 error	R2	Training time (s)
TgNN-LD	\(\alpha \% = 5\%\)	2.2848E−04	9.9481E−01	206.9471
TgNN	\(\alpha \% = 5\%\)	4.9887E−04	9.7524E−01	204.6485
TgNN-1	\(\alpha \% = 5\%\)	6.4963E−04	9.5802E−01	201.8772
TgNN-LD	\(\alpha \% = 10\%\)	2.6835E−04	9.9285E−01	208.6079
TgNN	\(\alpha \% = 10\%\)	4.9018E−04	9.7614E−01	202.5091
TgNN-1	\(\alpha \% = 10\% \)	5.6168E−04	9.6867E−01	205.8537
TgNN-LD	\(\alpha \% = 20\% \)	3.6974E-04	9.8652E−01	204.9997
TgNN	\(\alpha \% = 20\% \)	5.5094E−04	9.7007E−01	204.1725
TgNN-1	\(\alpha \% = 20\%\)	7.2383E−04	9.4833E−01	204.9317

Springer Professional

Abstract

Publisher's Note

Introduction

Proposed method

Problem description

Problem transformation

Experimental settings

Parameter settings and scenario description

Compared methods

Comparisons and results

Predictive accuracy

Reduced training epochs

Changes of Lagrangian multipliers

Predicting the future response from noisy data

Training under the dynamic epoch

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Other articles of this Issue 6/2022

Guest editorial on “data-driven operations management”

Generalized fuzzy ideals in ordered semirings

A competitive swarm optimizer with probabilistic criteria for many-objective optimization problems

A novel method to estimate incomplete PLTS information based on knowledge-match degree with reliability and its application in LSGDM problem

Self-attention-guided scale-refined detector for pedestrian detection

A system for electric vehicle’s energy-aware routing in a transportation network through real-time prediction of energy consumption

Premium Partner