Introduction
The deep neural network (DNN) has achieved significant breakthroughs in various scientific and industrial fields [
1‐
4]. Like most data-driven models in artificial intelligence [
5], they are also dependent on a large amount of training data. However, the cost and the difficulty of collecting data in some areas, especially in energy-related fields, hinder the development of (deep) neural networks. To further increase their generalization, theory-guided data science models, which bridge scientific problems and complex physical phenomena, have gained increased popularity in recent years [
6‐
8].
As a successful representative, the theory-guided neural network framework, also called a physical-informed neural network framework or an informed deep learning framework, which incorporates the theory (e.g., governing equations, other physical constraints, engineering controls, and expert knowledge) into (deep) neural network training, has been applied to construct the prediction model, especially in industries with limited training data [
6,
7]. Herein, the theory may refer to scientific laws and engineering theories [
6], which can be regarded as a kind of prior knowledge. Such knowledge is combined with training data to improve training performance during the learning process. In the loss function, they are usually transformed into regularization terms and added up with the training term [
9]. Owing to the existence of theory, predictions obtained by TgNN take physical feasibility and knowledge beyond the regimes covered with the training data into account. As a result, TgNN can obtain a training model with better generalization, and can achieve higher accuracy than the traditional DNN [
1,
7].
Although the introduction of theory expands the application of data-driven models, the trade-off between observation data and theory should be equitable. Herein, we first provide its theory-incorporated mathematical formulation, as shown in Eq. (
1)
where
\(\lambda _i\) and
\({\text {MSE}}_i\) denote the weight and mean square error for
ith term, respectively;
i refers to each concept among DATA, IC, BC, PDE, EC, and EK; and
\(\lambda = \left[ {\lambda _{\text {DATA}} ,\lambda _{\text {IC}} ,\lambda _{\text {BC}} ,\lambda _{\text {PDE}} ,\lambda _{\text {EC}} ,\lambda _{\text {EK}} } \right] \), where the term DATA refer to the observation data or training data. The remaining terms are the added theory in the (D)NN model. The governing equations consist of terms PDE, IC, BC, EC, and EK, referring to partial differential equations, initial conditions, boundary conditions, engineering control, and engineering knowledge, respectively. Each weight term represents the importance of the corresponding term in the loss function. In addition, not only might the values of these terms be at different scales, but their physical meanings and dimensional units can also be distinct. Therefore, balancing the trade-off among these terms is critical.
If viewing these weight variables as neural architecture parameters, the gradient of
\(\lambda _i\) is calculated as
\({\text {MSE}}_i\) from Eq. (
1), i.e., a constant nonnegative value, making
\(\lambda _i\) continuously decrease until negative infinity with the increase of iterations at the stage of back-propagation [
10,
11]. Therefore, due to the existence of theory, i.e., regularization terms in the loss function, it is difficult to determine the weights of each term in comparison with the training data term. If set inappropriately, it is highly possible to increase the training time, or even impede the convergence of the optimizer, contributing to an inaccurate training model. Consequently, the adjustment of these weight values is essential. In most existing literature, researchers often adjust these values by experience [
6,
8,
9]. However, if these weights are not at the same scale, this will inevitably create a heavy burden on researchers and place a constraint on the ability to conserve human time.
Recently, babysitting or evolutionary computation based techniques [
12], such as grid search [
13,
14] and genetic algorithm [
15,
16], have achieved rising popularity for hyper-parameter optimization [
17‐
19]. If TgNN first generates an initial set of weights, followed by comparing and repeating this searching process until the most suitable set of weight values is found or the stopping criterion is met, the training time will be absolutely extended. In contrast, if the search for optimized weight values can be incorporated into the training process, the training time may be shortened.
In recent years, Lagrangian dual approaches have been widely combined with the (deep) neural network to improve the latter’s ability when dealing with problems with constraints [
20‐
23]. Ferdinando et al. pointed out that Lagrangian duality can bring significant benefits for applications in which the learning task must enforce constraints on the predictor itself, such as in energy systems, gas networks, transprecision computing, among others [
20]. Walker et al. incorporated the Lagrangian dual approach into laboratory and prospective observational studies [
24]. Gan et al. developed a Lagrangian dual-based DNN framework for trajectory simulation [
23]. Pundir and Raman proposed a dual deep learning method for image-based smoke detection [
25]. The above contributions can improve the performance of the original deep neural network framework and provide more accurate predictive training models.
Lagrangian dual combined with (D)NN network has been proved as a good tool when constructing predictive models with constraints. In TgNN network, there exist several constraint terms to be considered apart from the data term, posing great challenge to training process. Ad-hoc procedures for adjusting the weights of these constraint terms will not only demand high computational time and resources, but also deteriorate the predictive accuracy. Additionally, the ad-hoc procedure is strongly dependent on prior knowledge and experience. Having realized this, we propose the Lagrangian dual-based TgNN (TgNN-LD) to provide theoretical guidance for the adjustment of weight values in the loss function of the theory-guided neural network framework. In our method, the Lagrangian dual framework is incorporated into the TgNN training model, and controls the update of weights with the purpose of automatically changing the weight values and producing accurate predictive results within limited training time. Moreover, to better set forth our approach, we select a subsurface flow problem as a test case in the experiment.
The reminder of this paper proceeds as follows. The section “
Proposed method” briefly describes the mathematical formulation of the TgNN model, followed by details of the proposed method. The experimental settings with the investigation of corresponding results are provided in the sections “
Experimental settings” and “
Comparisons and results”, respectively. Finally, the section “
Conclusion” concludes the paper and suggests directions for future research.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.