Abstract
We propose new and original mathematical connections between Hamilton–Jacobi (HJ) partial differential equations (PDEs) with initial data and neural network architectures. Specifically, we prove that some classes of neural networks correspond to representation formulas of HJ PDE solutions whose Hamiltonians and initial data are obtained from the parameters of the neural networks. These results do not rely on universal approximation properties of neural networks; rather, our results show that some classes of neural network architectures naturally encode the physics contained in some HJ PDEs. Our results naturally yield efficient neural network-based methods for evaluating solutions of some HJ PDEs in high dimension without using grids or numerical approximations. We also present some numerical results for solving some inverse problems involving HJ PDEs using our proposed architectures.
Similar content being viewed by others
References
Aaibid, M., Sayah, A.: A direct proof of the equivalence between the entropy solutions of conservation laws and viscosity solutions of Hamilton–Jacobi equations in one-space variable. JIPAM J. Inequal. Pure Appl. Math. 7(2), 11 (2006)
Akian, M., Bapat, R., Gaubert, S.: Max-plus algebra. Handbook of Linear Algebra 39, (2006)
Akian, M., Gaubert, S., Lakhoua, A.: The max-plus finite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM J. Control Optim. 47(2), 817–848 (2008)
Alla, A., Falcone, M., Saluzzi, L.: An efficient DP algorithm on a tree-structure for finite horizon optimal control problems. SIAM J. Sci. Comput. 41(4), A2384–A2406 (2019)
Alla, A., Falcone, M., Volkwein, S.: Error analysis for POD approximations of infinite horizon problems via the dynamic programming approach. SIAM J. Control Optim. 55(5), 3091–3115 (2017)
Arnol’d, V.I.: Mathematical methods of classical mechanics. Graduate Texts in Mathematics, vol. 60. Springer, New York (1989). Translated from the 1974 Russian original by K. Vogtmann and A. Weinstein, Corrected reprint of the second (1989) edition
Bachouch, A., Huré, C., Langrené, N., Pham, H.: Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications. arXiv preprint arXiv:1812.05916 (2018)
Banerjee, K., Georganas, E., Kalamkar, D., Ziv, B., Segal, E., Anderson, C., Heinecke, A.: Optimizing deep learning RNN topologies on intel architecture. Supercomput. Front. Innov. 6(3), 64–85 (2019)
Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Syst. Control Found. Appl. Birkhäuser Boston, Inc., Boston, MA (1997). https://doi.org/10.1007/978-0-8176-4755-1. With appendices by Maurizio Falcone and Pierpaolo Soravia
Bardi, M., Evans, L.: On Hopf’s formulas for solutions of Hamilton–Jacobi equations. Nonlinear Anal. Theory, Methods Appl. 8(11), 1373–1381 (1984). https://doi.org/10.1016/0362-546X(84)90020-8
Barles, G.: Solutions de viscosité des équations de Hamilton–Jacobi. Mathématiques et Applications. Springer, Berlin (1994)
Barles, G., Tourin, A.: Commutation properties of semigroups for first-order Hamilton–Jacobi equations and application to multi-time equations. Indiana Univ. Math. J. 50(4), 1523–1544 (2001)
Barron, E., Evans, L., Jensen, R.: Viscosity solutions of Isaacs’ equations and differential games with Lipschitz controls. J. Differ. Equ. 53(2), 213–233 (1984). https://doi.org/10.1016/0022-0396(84)90040-8
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. (2019). arXiv preprint arXiv:1907.03452
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. (2018). arXiv preprint arXiv:1806.00421
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. 29(4), 1563–1619 (2019)
Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Berg, J., Nyström, K.: A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41 (2018). https://doi.org/10.1016/j.neucom.2018.06.056
Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont (2019)
Bokanowski, O., Garcke, J., Griebel, M., Klompmaker, I.: An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton–Jacobi Bellman equations. J. Sci. Comput. 55(3), 575–605 (2013)
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, New York (2000). https://doi.org/10.1007/978-1-4612-1394-9
Brenier, Y., Osher, S.: Approximate Riemann solvers and numerical flux functions. SIAM J. Numer. Anal. 23(2), 259–273 (1986)
Brenier, Y., Osher, S.: The discrete one-sided Lipschitz condition for convex scalar conservation laws. SIAM J. Numer. Anal. 25(1), 8–23 (1988). https://doi.org/10.1137/0725002
Buckdahn, R., Cardaliaguet, P., Quincampoix, M.: Some recent aspects of differential game theory. Dyn. Games Appl. 1(1), 74–114 (2011). https://doi.org/10.1007/s13235-010-0005-0
Carathéodory, C.: Calculus of variations and partial differential equations of the first order. Part I: Partial differential equations of the first order. Translated by Robert B. Dean and Julius J. Brandstatter. Holden-Day, Inc., San Francisco-London-Amsterdam (1965)
Carathéodory, C.: Calculus of variations and partial differential equations of the first order. Part II: Calculus of variations. Translated from the German by Robert B. Dean, Julius J. Brandstatter, translating editor. Holden-Day, Inc., San Francisco-London-Amsterdam (1967)
Cardin, F., Viterbo, C.: Commuting Hamiltonians and Hamilton–Jacobi multi-time equations. Duke Math. J. 144(2), 235–284 (2008). https://doi.org/10.1215/00127094-2008-036
Caselles, V.: Scalar conservation laws and Hamilton–Jacobi equations in one-space variable. Nonlinear Anal. Theory Methods Appl. 18(5), 461–469 (1992). https://doi.org/10.1016/0362-546X(92)90013-5
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019)
Chen, T., van Gelder, J., van de Ven, B., Amitonov, S.V., de Wilde, B., Euler, H.C.R., Broersma, H., Bobbert, P.A., Zwanenburg, F.A., van der Wiel, W.G.: Classification with a disordered dopant-atom network in silicon. Nature 577(7790), 341–345 (2020)
Cheng, T., Lewis, F.L.: Fixed-final time constrained optimal control of nonlinear systems using neural network HJB approach. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 3016–3021 (2006). https://doi.org/10.1109/CDC.2006.377523
Corrias, L., Falcone, M., Natalini, R.: Numerical schemes for conservation laws via Hamilton–Jacobi equations. Math. Comput. 64(210), 555–580, S13–S18 (1995). https://doi.org/10.2307/2153439
Courant, R., Hilbert, D.: Methods of mathematical physics. Vol. II. Wiley Classics Library. Wiley: New York (1989). Partial differential equations, Reprint of the 1962 original, A Wiley-Interscience Publication
Crandall, M.G., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Am. Math. Soc. 27(1), 1–67 (1992). https://doi.org/10.1090/S0273-0979-1992-00266-5
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274
Dafermos, C.M.: Polygonal approximations of solutions of the initial value problem for a conservation law. J. Math. Anal. Appl. 38(1), 33–41 (1972). https://doi.org/10.1016/0022-247X(72)90114-X
Dafermos, C.M.: Hyperbolic conservation laws in continuum physics, Grundlehren der Mathematischen Wissenschaften, vol. 325, 4th Edn. Springer, Berlin (2016). https://doi.org/10.1007/978-3-662-49451-6
Darbon, J.: On convex finite-dimensional variational methods in imaging sciences and Hamilton–Jacobi equations. SIAM J. Imaging Sci. 8(4), 2268–2293 (2015). https://doi.org/10.1137/130944163
Darbon, J., Meng, T.: On decomposition models in imaging sciences and multi-time Hamilton-Jacobi partial differential equations. (2019). arXiv preprint arXiv:1906.09502
Darbon, J., Osher, S.: Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere. Res. Math. Sci. 3(1), 19 (2016). https://doi.org/10.1186/s40687-016-0068-7
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994). https://doi.org/10.1002/cnm.1640100303
Djeridane, B., Lygeros, J.: Neural approximation of PDE solutions: An application to reachability computations. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 3034–3039 (2006). https://doi.org/10.1109/CDC.2006.377184
Dockhorn, T.: A discussion on solving partial differential equations using neural networks. (2019). arXiv preprint arXiv:1904.07200
Dolgov, S., Kalise, D., Kunisch, K.: A tensor decomposition approach for high-dimensional Hamilton-Jacobi-Bellman equations. (2019). arXiv preprint arXiv:1908.01533
Dower, P.M., McEneaney, W.M., Zhang, H.: Max-plus fundamental solution semigroups for optimal control problems. In: 2015 Proceedings of the Conference on Control and its Applications, pp. 368–375. SIAM (2015)
Elliott, R.J.: Viscosity solutions and optimal control, Pitman research notes in mathematics series, vol. 165. Longman Scientific & Technical, Harlow; Wiley, New York (1987)
Evans, L.C.: Partial differential equations, Graduate Studies in Mathematics, vol. 19, second edn. American Mathematical Society, Providence, RI (2010). https://doi.org/10.1090/gsm/019
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Textbooks in Mathematics, revised edn. CRC Press, Boca Raton (2015)
Evans, L.C., Souganidis, P.E.: Differential games and representation formulas for solutions of Hamilton–Jacobi–Isaacs equations. Indiana Univ. Math. J. 33(5), 773–797 (1984)
Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Large-scale fpga-based convolutional networks. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Farabet, C., poulet, C., Han, J., LeCun, Y.: CNP: An FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications. IEEE, Prague (2009)
Farabet, C., Poulet, C., LeCun, Y.: An FPGA-based stream processor for embedded real-time vision with convolutional networks. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 878–885. IEEE Computer Society, Los Alamitos, CA, USA (2009). https://doi.org/10.1109/ICCVW.2009.5457611
Farimani, A.B., Gomes, J., Pande, V.S.: Deep Learning the Physics of Transport Phenomena. arXiv e-prints (2017)
Fleming, W., McEneaney, W.: A max-plus-based algorithm for a Hamilton–Jacobi–Bellman equation of nonlinear filtering. SIAM J. Control Optim. 38(3), 683–710 (2000). https://doi.org/10.1137/S0363012998332433
Fleming, W.H., Rishel, R.W.: Deterministic and stochastic optimal control. Bull. Am. Math. Soc. 82, 869–870 (1976)
Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions, vol. 25. Springer, New York (2006)
Folland, G.B.: Real Analysis: Modern Techniques and Their Spplications. Wiley, Hoboken (2013)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia-Pacific Financ. Mark. 26(3), 391–408 (2019). https://doi.org/10.1007/s10690-019-09271-7
Garcke, J., Kröner, A.: Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids. J. Sci. Comput. 70(1), 1–28 (2017)
Gaubert, S., McEneaney, W., Qu, Z.: Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 1054–1061. IEEE (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, New York (2016)
Grohs, P., Jentzen, A., Salimova, D.: Deep neural network approximations for Monte Carlo algorithms. (2019). arXiv preprint arXiv:1908.10828
Grüne, L.: Overcoming the curse of dimensionality for approximating lyapunov functions with deep neural networks under a small-gain condition. (2020). arXiv preprint arXiv:2001.08423
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018). https://doi.org/10.1073/pnas.1718942115
Han, J., Zhang, L., E, W.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 108929 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, New York (1993)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods, vol. 306. Springer, New York (1993)
Hirjibehedin, C.: Evolution of circuits for machine learning. Nature 577, 320–321 (2020). https://doi.org/10.1038/d41586-020-00002-x
Hopf, E.: Generalized solutions of non-linear equations of first order. J. Math. Mech. 14, 951–973 (1965)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Horowitz, M.B., Damle, A., Burdick, J.W.: Linear Hamilton Jacobi Bellman equations in high dimensions. In: 53rd IEEE Conference on Decision and Control, pp. 5880–5887. IEEE (2014)
Hsieh, J.T., Zhao, S., Eismann, S., Mirabella, L., Ermon, S.: Learning neural PDE solvers with convergence guarantees. In: International Conference on Learning Representations (2019)
Hu, C., Shu, C.: A discontinuous Galerkin finite element method for Hamilton–Jacobi equations. SIAM J. Sci. Comput. 21(2), 666–690 (1999). https://doi.org/10.1137/S1064827598337282
Huré, C., Pham, H., Bachouch, A., Langrené, N.: Deep neural networks algorithms for stochastic control problems on finite horizon, part I: convergence analysis. (2018). arXiv preprint arXiv:1812.04300
Huré, C., Pham, H., Warin, X.: Some machine learning schemes for high-dimensional nonlinear PDEs. (2019). arXiv preprint arXiv:1902.01599
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1(10), (2020)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (2018)
Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks (2019)
Hutzenthaler, M., Kruse, T.: Multilevel picard approximations of high-dimensional semilinear parabolic differential equations with gradient-dependent nonlinearities. SIAM J. Numer. Anal. 58(2), 929–961 (2020). https://doi.org/10.1137/17M1157015
Ishii, H.: Representation of solutions of Hamilton–Jacobi equations. Nonlinear Anal. Theory, Methods Appl. 12(2), 121–146 (1988). https://doi.org/10.1016/0362-546X(88)90030-2
Jiang, F., Chou, G., Chen, M., Tomlin, C.J.: Using neural networks to compute approximate and guaranteed feasible Hamilton–Jacobi–Bellman PDE solutions. (2016). arXiv preprint arXiv:1611.03158
Jiang, G., Peng, D.: Weighted ENO schemes for Hamilton–Jacobi equations. SIAM J. Sci. Comput. 21(6), 2126–2143 (2000). https://doi.org/10.1137/S106482759732455X
Jianyu, L., Siwei, L., Yingjian, Q., Yaping, H.: Numerical solution of elliptic partial differential equation using radial basis function neural networks. Neural Netw. 16(5–6), 729–734 (2003)
Jin, S., Xin, Z.: Numerical passage from systems of conservation laws to Hamilton–Jacobi equations, and relaxation schemes. SIAM J. Numer. Anal. 35(6), 2385–2404 (1998). https://doi.org/10.1137/S0036142996314366
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, pp. 1–12. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3079856.3080246
Kalise, D., Kundu, S., Kunisch, K.: Robust feedback control of nonlinear PDEs by numerical approximation of high-dimensional Hamilton–Jacobi–Isaacs equations. (2019). arXiv preprint arXiv:1905.06276
Kalise, D., Kunisch, K.: Polynomial approximation of high-dimensional Hamilton–Jacobi–Bellman equations and applications to feedback control of semilinear parabolic PDEs. SIAM J. Sci. Comput. 40(2), A629–A652 (2018)
Kang, W., Wilcox, L.C.: Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and HJB equations. Comput. Optim. Appl. 68(2), 289–315 (2017)
Karlsen, K., Risebro, H.: A note on front tracking and the equivalence between viscosity solutions of Hamilton–Jacobi equations and entropy solutions of scalar conservation laws. Nonlinear Anal. (2002). https://doi.org/10.1016/S0362-546X(01)00753-2
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. (2017). arXiv preprint arXiv:1707.03351
Khoo, Y., Lu, J., Ying, L.: Solving for high-dimensional committor functions using artificial neural networks. Res. Math. Sci. 6(1), 1 (2019)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015) (2015)
Kružkov, S.N.: Generalized solutions of nonlinear first order equations with several independent variables II. Math. USSR-Sbornik 1(1), 93–116 (1967). https://doi.org/10.1070/sm1967v001n01abeh001969
Kundu, A., Srinivasan, S., Qin, E.C., Kalamkar, D., Mellempudi, N.K., Das, D., Banerjee, K., Kaul, B., Dubey, P.: K-tanh: Hardware efficient activations for deep learning (2019)
Kunisch, K., Volkwein, S., Xie, L.: HJB-POD-based feedback design for the optimal control of evolution problems. SIAM J. Appl. Dyn. Syst. 3(4), 701–722 (2004)
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998). https://doi.org/10.1109/72.712178
Lagaris, I.E., Likas, A.C., Papageorgiou, D.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 11(5), 1041–1049 (2000). https://doi.org/10.1109/72.870037
Lambrianides, P., Gong, Q., Venturi, D.: A new scalable algorithm for computational optimal control under uncertainty. (2019). arXiv preprint arXiv:1909.07960
Landau, L., Lifschic, E.: Course of theoretical physics. vol. 1: Mechanics. Oxford, (1978)
LeCun, Y.: 1.1 deep learning hardware: Past, present, and future. In: 2019 IEEE International Solid-State Circuits Conference—(ISSCC), pp. 12–19 (2019). https://doi.org/10.1109/ISSCC.2019.8662396
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)
Lions, P.L., Rochet, J.C.: Hopf formula and multitime Hamilton–Jacobi equations. Proc. Am. Math. Soc. 96(1), 79–84 (1986)
Lions, P.L., Souganidis, P.E.: Convergence of MUSCL and filtered schemes for scalar conservation laws and Hamilton–Jacobi equations. Numerische Mathematik 69(4), 441–470 (1995). https://doi.org/10.1007/s002110050102
Long, Z., Lu, Y., Dong, B.: PDE-net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019). https://doi.org/10.1016/j.jcp.2019.108925
Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-net: Learning PDEs from data. (2017). arXiv preprint arXiv:1710.09668
Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. (2019). arXiv preprint arXiv:1903.03040
McEneaney, W.: Max-Plus Methods for Nonlinear Control and Estimation. Springer, New York (2006)
McEneaney, W.: A curse-of-dimensionality-free numerical method for solution of certain HJB PDEs. SIAM J. Control Optim. 46(4), 1239–1276 (2007). https://doi.org/10.1137/040610830
McEneaney, W.M., Deshpande, A., Gaubert, S.: Curse-of-complexity attenuation in the curse-of-dimensionality-free method for HJB PDEs. In: 2008 American Control Conference, pp. 4684–4690. IEEE (2008)
McEneaney, W.M., Kluberg, L.J.: Convergence rate for a curse-of-dimensionality-free method for a class of HJB PDEs. SIAM J. Control Optim. 48(5), 3052–3079 (2009)
McFall, K.S., Mahan, J.R.: Artificial neural network method for solution of boundary value problems with exact satisfaction of arbitrary boundary conditions. IEEE Trans. Neural Netw. 20(8), 1221–1233 (2009). https://doi.org/10.1109/TNN.2009.2020735
Meade, A., Fernandez, A.: The numerical solution of linear ordinary differential equations by feedforward neural networks. Math. Comput. Modell. 19(12), 1–25 (1994). https://doi.org/10.1016/0895-7177(94)90095-7
Meng, X., Karniadakis, G.E.: A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. (2019). arXiv preprint arXiv:1903.00104
Meng, X., Li, Z., Zhang, D., Karniadakis, G.E.: PPINN: Parareal physics-informed neural network for time-dependent PDEs. (2019). arXiv preprint arXiv:1909.10145
van Milligen, B.P., Tribaldos, V., Jiménez, J.A.: Neural network differential equation and plasma equilibrium solver. Phys. Rev. Lett. 75, 3594–3597 (1995). https://doi.org/10.1103/PhysRevLett.75.3594
Motta, M., Rampazzo, F.: Nonsmooth multi-time Hamilton–Jacobi systems. Indiana Univ. Math. J. 55(5), 1573–1614 (2006)
Niarchos, K.N., Lygeros, J.: A neural approximation to continuous time reachability computations. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 6313–6318 (2006). https://doi.org/10.1109/CDC.2006.377358
Osher, S., Shu, C.: High-order essentially nonoscillatory schemes for Hamilton–Jacobi equations. SIAM J. Numer. Anal. 28(4), 907–922 (1991). https://doi.org/10.1137/0728049
Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 41(4), A2603–A2626 (2019)
Pham, H., Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. (2019). arXiv preprint arXiv:1908.00412
Pinkus, A.: Approximation theory of the MLP model in neural networks. In: Acta numerica, 1999, Acta Numer., vol. 8, pp. 143–195. Cambridge University Press, Cambridge (1999)
Plaskacz, S., Quincampoix, M.: Oleinik–Lax formulas and multitime Hamilton–Jacobi systems. Nonlinear Anal. Theory, Methods Appl. 51(6), 957–967 (2002). https://doi.org/10.1016/S0362-546X(01)00871-9
Raissi, M.: Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(1), 932–955 (2018)
Raissi, M.: Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations. (2018). arXiv preprint arXiv:1804.07010
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. (2017). arXiv preprint arXiv:1711.10561
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part ii): Data-driven discovery of nonlinear partial differential equations. (2017). arXiv preprint arXiv:1711.10566
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. (2019). arXiv preprint arXiv:1903.06652
Rochet, J.: The taxation principle and multi-time Hamilton–Jacobi equations. J. Math. Econ. 14(2), 113–128 (1985). https://doi.org/10.1016/0304-4068(85)90015-1
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Royo, V.R., Tomlin, C.: Recursive regression with neural networks: Approximating the HJI PDE solution. (2016). arXiv preprint arXiv:1611.02739
Rudd, K., Muro, G.D., Ferrari, S.: A constrained backpropagation approach for the adaptive solution of partial differential equations. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 571–584 (2014). https://doi.org/10.1109/TNNLS.2013.2277601
Ruthotto, L., Osher, S., Li, W., Nurbekyan, L., Fung, S.W.: A machine learning framework for solving high-dimensional mean field game and mean field control problems. (2019). arXiv preprint arXiv:1912.01825
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). https://doi.org/10.1016/j.neunet.2014.09.003
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018). https://doi.org/10.1016/j.jcp.2018.08.029
Tang, W., Shan, T., Dang, X., Li, M., Yang, F., Xu, S., Wu, J.: Study on a Poisson’s equation solver based on deep learning technique. In: 2017 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS), pp. 1–3 (2017). https://doi.org/10.1109/EDAPS.2017.8277017
Tassa, Y., Erez, T.: Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Trans. Neural Netw. 18(4), 1031–1041 (2007). https://doi.org/10.1109/TNN.2007.899249
Tho, N.: Hopf-Lax-Oleinik type formula for multi-time Hamilton–Jacobi equations. Acta Math. Vietnamica 30, 275–287 (2005)
Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. 106(28), 11478–11483 (2009)
Uchiyama, T., Sonehara, N.: Solving inverse problems in nonlinear PDEs by recurrent neural networks. In: IEEE International Conference on Neural Networks, pp. 99–102. IEEE (1993)
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel picard iterations for solving smooth semilinear parabolic heat equations (2016)
Widder, D.V.: The Heat Equation, vol. 67. Academic Press, New York (1976)
Yadav, N., Yadav, A., Kumar, M.: An introduction to neural network methods for differential equations. SpringerBriefs in Applied Sciences and Technology. Springer, Dordrecht (2015). https://doi.org/10.1007/978-94-017-9816-7
Yang, L., Zhang, D., Karniadakis, G.E.: Physics-informed generative adversarial networks for stochastic differential equations. (2018). arXiv preprint arXiv:1811.02033
Yang, Y., Perdikaris, P.: Adversarial uncertainty quantification in physics-informed neural networks. J. Comput. Phys. 394, 136–152 (2019)
Yegorov, I., Dower, P.M.: Perspectives on characteristics based curse-of-dimensionality-free numerical approaches for solving Hamilton–Jacobi equations. Appl. Math. Optim. 1–49 (2017)
Zhang, D., Guo, L., Karniadakis, G.E.: Learning in modal space: solving time-dependent stochastic PDEs using physics-informed neural networks. (2019). arXiv preprint arXiv:1905.01205
Zhang, D., Lu, L., Guo, L., Karniadakis, G.E.: Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. J. Comput. Phys. 397, 108850 (2019)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research supported by NSF DMS 1820821. Authors’ names are given in last/family name alphabetical order.
Appendices
A Proofs of lemmas in Section 3.1
1.1 A.1 Proof of Lemma 3.1
Proof of (i): The convex and lower semicontinuous function \(J^*\) satisfies Eq. (12) by [68, Prop. X.3.4.1]. It is also finite and continuous over its polytopal domain \({\mathrm {dom}~}J^* = {\mathrm {conv}~}{\left( \{\textit{\textbf{p}}_i\}_{i=1}^{m}\right) }\) [133, Thms. 10.2 and 20.5], and moreover, the subdifferential \(\partial J^*(\textit{\textbf{p}})\) is non-empty by [133, Thm. 23.10].
Proof of (ii): First, suppose the vector \((\alpha _1, \dots , \alpha _m)\in \mathbb {R}^m\) satisfies the constraints (a)–(c). Since \(\textit{\textbf{x}}\in \partial J^*(\textit{\textbf{p}})\), there holds \(J^*(\textit{\textbf{p}}) = \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - J(\textit{\textbf{x}})\) [68, Cor. X.1.4.4], and using the definition of the set \(I_{\textit{\textbf{x}}}\) (11) and constraints (a)–(c) we deduce that
Therefore, \((\alpha _1,\dots , \alpha _m)\) is a minimizer in Eq. (12). Second, let \((\alpha _1,\dots , \alpha _m)\) be a minimizer in Eq. (12). Then, (a)–(b) follow directly from the constraints in Eq. (12). A similar argument as above yields
But \(J(\textit{\textbf{x}})= \max _{i\in \{1,\dots ,m\}}\{\left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}\right\rangle -\gamma _{i}\}\) by definition, and so there holds \(\alpha _i = 0\) whenever \(J(\textit{\textbf{x}})\ne \langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle - \gamma _i\). In other words, \(\alpha _i=0\) whenever \(i\not \in I_{\textit{\textbf{x}}}\).
Proof of (iii): Let \((\beta _1,\dots , \beta _m) \in \varLambda _m\) satisfy \(\sum _{i=1}^m\beta _i \textit{\textbf{p}}_i = \textit{\textbf{p}}_k\). By assumption (A2), we have \(\gamma _k=g(\textit{\textbf{p}}_k)\) with g convex, and hence, Jensen’s inequality yields
Therefore, the vector \((\delta _{1k},\dots ,\delta _{mk})\) is a minimizer in Eq. (12) at the point \(\textit{\textbf{p}}_k\), and \(J^*(\textit{\textbf{p}}_k) = \gamma _k\) follows.
1.2 A.2 Proof of Lemma 3.2
Proof of (i): Let \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\). The set \(\mathcal {A}(\textit{\textbf{p}}) \subseteq \varLambda _m\) is non-empty and bounded by Lemma 3.1(i), and it is closed since \(\mathcal {A}(\textit{\textbf{p}})\) is the solution set to the linear programming problem (12). Hence, \(\mathcal {A}(\textit{\textbf{p}})\) is compact. As a result, we immediately have that \(H(\textit{\textbf{p}})<+\infty \). Moreover, for each \((\alpha _1,\dots ,\alpha _m)\in \mathcal {A}(\textit{\textbf{p}})\) there holds
from which we conclude that H is a bounded function on \({\mathrm {dom}~}J^*\). Since the target function in the minimization problem (14) is continuous, existence of a minimizer follows by compactness of \(\mathcal {A}(\textit{\textbf{p}})\).
Proof of (ii): We have already shown in the proof of (i) that the restriction of H to \({\mathrm {dom}~}J^*\) is bounded, and so it remains to prove its continuity. For any \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\), we have that \((\alpha _1,\dots ,\alpha _m)\in \mathcal {A}(\textit{\textbf{p}})\) if and only if \((\alpha _1,\dots , \alpha _m)\in \varLambda _m\), \(\sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}\), and \(\sum _{i=1}^m \alpha _i\gamma _i = J^*(\textit{\textbf{p}})\). As a result, we have
Define the function \(h:\ \mathbb {R}^{n+1}\rightarrow \mathbb {R}\cup \{+\infty \}\) by
for any \(\textit{\textbf{p}}\in \mathbb {R}^n\) and \(r\in \mathbb {R}\). Using the same argument as in the proof of Lemma 3.1(i), we conclude that h is a convex lower semicontinuous function, and in fact continuous over its domain \({\mathrm {dom}~}h = {\mathrm {conv}~}{\{(\textit{\textbf{p}}_i, \gamma _i)\}_{i=1}^m}\). Comparing Eq. (32) and the definition of h in (33), we deduce that \(H(\textit{\textbf{p}}) = h(\textit{\textbf{p}}, J^*(\textit{\textbf{p}}))\) for any \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\). Continuity of H in \({\mathrm {dom}~}J^*\) then follows from the continuity of h and \(J^*\) in their own domains.
Proof of (iii): Let \(k\in \{1,\dots ,m\}\). On the one hand, Lemma 3.1(iii) implies \((\delta _{1k},\dots , \delta _{mk})\in \mathcal {A}(\textit{\textbf{p}}_k)\), so that
On the other hand, let \((\alpha _1,\dots , \alpha _m)\in \mathcal {A}(\textit{\textbf{p}}_k)\) be a vector different from \((\delta _{k1},\dots , \delta _{km})\). Then, \((\alpha _1,\dots , \alpha _m) \in \varLambda _m\) satisfies \(\sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}\), \(\sum _{i=1}^m \alpha _i\gamma _i = J^*(\textit{\textbf{p}})\), and \(\alpha _k<1\). Define \((\beta _1,\dots , \beta _m) \in \varLambda _m\) by
A straightforward computation using the properties of \((\alpha _1,\dots , \alpha _m)\), Lemma 3.1(iii), and the definition of \((\beta _1,\dots , \beta _m)\) yields
In other words, Eq. (9) holds at index k, which, by assumption (A3), implies that \(\sum _{i\ne k}\beta _i \theta _i > \theta _k\). As a result, we have
Taken together with Eq. (34), we conclude that \((\delta _{1k},\dots , \delta _{mk})\) is the unique minimizer in (14), and hence, we obtain \(H(\textit{\textbf{p}}_k) = \theta _k\).
B Proof of Theorem 3.1
To prove this theorem, we will use three lemmas whose statements and proofs are given in Sect. B.1, B.2, and B.3, respectively. The proof of Theorem 3.1 is given in Sect. B.4.
1.1 B.1 Statement and proof of Lemma B.1
Lemma B.1
Suppose the parameters \(\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}\) satisfy assumptions (A1)-(A3). Let J and H be the functions defined in Eqs. (10) and (14), respectively. Let \(\tilde{H}:\mathbb {R}^n\rightarrow \mathbb {R}\) be a continuous function satisfying \(\tilde{H}(\textit{\textbf{p}}_i) = H(\textit{\textbf{p}}_i)\) for each \(i\in \{1,\dots ,m\}\) and \(\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})\) for all \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\). Then, the neural network f defined in Eq. (8) satisfies
Proof
Let \(\textit{\textbf{x}}\in \mathbb {R}^n\) and \(t\geqslant 0\). Since \(\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})\) for every \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\), we get
Let \((\alpha _1,\dots , \alpha _m)\) be a minimizer in (14). By Eqs. (12), (13), and (14), we have
Combining Eqs. (36), (37), and (8), we get
where the second inequality follows from the constraint \((\alpha _1,\dots ,\alpha _m)\in \varLambda _m\). Since \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\) is arbitrary, we obtain
Now, by Lemmas 3.1(iii), 3.2(iii), and the assumptions on \(\tilde{H}\), we have
for each \(k\in \{1,\dots , m\}\). A straightforward computation yields
where the inequality holds since \(\textit{\textbf{p}}_i\in {\mathrm {dom}~}J^*\) for every \(i\in \{1,\dots ,m\}\). The conclusion then follows from Eqs. (38) and (39). \(\square \)
1.2 B.2 Statement and proof of Lemma B.2
Lemma B.2
Suppose the parameters \(\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}\) satisfy assumptions (A1)-(A3). For every \(k\in \{1,\dots , m\}\), there exist \(\textit{\textbf{x}}\in \mathbb {R}^n\) and \(t>0\) such that \(f(\cdot , t)\) is differentiable at \(\textit{\textbf{x}}\) and \(\nabla _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \textit{\textbf{p}}_k\).
Proof
Since f is the supremum of a finite number of affine functions by definition (8), it is finite-valued and convex for \(t\geqslant 0\). As a result, \(\nabla _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \textit{\textbf{p}}_k\) is equivalent to \(\partial (f(\cdot , t))(\textit{\textbf{x}}) = \{\textit{\textbf{p}}_k\}\), and so it suffices to prove that \(\partial (f(\cdot , t))(\textit{\textbf{x}}) = \{\textit{\textbf{p}}_k\}\) for some \(\textit{\textbf{x}}\in \mathbb {R}^n\) and \(t>0\). To simplify the notation, we use \(\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)\) to denote the subdifferential of \(f(\cdot , t)\) at \(\textit{\textbf{x}}\).
By [67, Thm. VI.4.4.2], the subdifferential of \(f(\cdot , t)\) at \(\textit{\textbf{x}}\) is the convex hull of the \(\textit{\textbf{p}}_i\)’s whose indices i’s are maximizers in (8), that is,
It suffices then to prove the existence of \(\textit{\textbf{x}}\in \mathbb {R}^n\) and \(t>0\) such that
First, consider the case when there exists \(\textit{\textbf{x}}\in \mathbb {R}^n\) such that \(\langle \textit{\textbf{p}}_k, \textit{\textbf{x}}\rangle -\gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -\gamma _i\) for every \(i\ne k\). In that case, by continuity, there exists small \(t>0\) such that \(\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\) for every \(i\ne k\) and so (40) holds.
Now, consider the case when there does not exist \(\textit{\textbf{x}}\in \mathbb {R}^n\) such that \(\langle \textit{\textbf{p}}_k, \textit{\textbf{x}}\rangle -\gamma _k > \max _{i\ne k}\{\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -\gamma _i\}\). In other words, we assume
We apply Lemma 3.1(i) to the formula above and obtain
Let \(\textit{\textbf{x}}_0\in \partial J^*(\textit{\textbf{p}}_k)\). Denote by \(I_{\textit{\textbf{x}}_0}\) the set of maximizers in Eq. (41) at the point \(\textit{\textbf{x}}_0\), i.e.,
Note that we have \(k\not \in I_{\textit{\textbf{x}}_0}\) by definition of \(I_{\textit{\textbf{x}}_0}\). Define a function \(h:\mathbb {R}^n\rightarrow \mathbb {R}\cup \{+\infty \}\) by
Denote the convex lower semicontinuous envelope of h by \(\overline{{\mathrm {co}}}~h\). Since \(\textit{\textbf{x}}_0\in \partial J^*(\textit{\textbf{p}}_k)\), we can use [67, Thm. VI.4.4.2] and the definition of \(I_{\textit{\textbf{x}}_0}\) and h in Eqs. (43) and (44) to deduce
Hence, the point \(\textit{\textbf{p}}_k\) is in the domain of the polytopal convex function \(\overline{{\mathrm {co}}}~h\). Then, [133, Thm. 23.10] implies \(\partial (\overline{{\mathrm {co}}}~h) (\textit{\textbf{p}}_k)\ne \emptyset \). Let \(\textit{\textbf{v}}_0 \in \partial (\overline{{\mathrm {co}}}~h)(\textit{\textbf{p}}_k)\) and \(\textit{\textbf{x}}=\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0\). It remains to choose suitable positive t such that (40) holds. Letting \(\textit{\textbf{x}}=\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0\) in (40) yields
Now, we consider two situations, the first when \(i\not \in I_{\textit{\textbf{x}}_0} \cup \{k\}\) and the second when \(i\in I_{\textit{\textbf{x}}_0}\). It suffices to prove (40) hold in each case for small enough positive t.
If \(i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}\), then i is not a maximizer in Eq. (41) at the point \(\textit{\textbf{x}}_0\). By (45), \(\textit{\textbf{p}}_k\) is a convex combination of the set \(\{\textit{\textbf{p}}_i: i\in I_{\textit{\textbf{x}}_0}\}\). In other words, there exists \((c_1,\dots ,c_m)\in \varLambda _m\) such that \(\sum _{j=1}^mc_j\textit{\textbf{p}}_j=\textit{\textbf{p}}_k\) and \(c_j=0\) whenever \(j\not \in I_{\textit{\textbf{x}}_0}\). Taken together with assumption (A2) and Eqs. (10), (41), (43), we have
Thus, the inequalities become equalities in the equation above. As a result, we have
where the inequality holds because \(i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}\) by assumption. This inequality implies that the constant \(\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle -\gamma _k - (\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0\rangle - \gamma _i)\) is positive, and taken together with (46), we conclude that the inequality in (40) holds for \(i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}\) when t is small enough.
If \(i\in I_{\textit{\textbf{x}}_0}\), then both i and k are maximizers in Eq. (10) at \(\textit{\textbf{x}}_0\), and hence, we have
Together with Eq. (46) and the definition of h in Eq. (44), we obtain
In addition, since \(\textit{\textbf{v}}_0\in \partial (\overline{{\mathrm {co}}}~h)(\textit{\textbf{p}}_k)\), we have
Combining Eqs. (48) and (49), we obtain
To prove the result, it suffices to show \(\overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k)>\theta _k\). As \(\textit{\textbf{p}}_k \in \overline{{\mathrm {co}}}~h\) (as shown before in Eq. (45)), then according to [68, Prop. X.1.5.4] we have
for some \((\alpha _1,\dots ,\alpha _m)\in \varLambda _m\) satisfying \(\textit{\textbf{p}}_k = \sum _{j=1}^m \alpha _j \textit{\textbf{p}}_j\) and \(\alpha _j=0\) whenever \(j\not \in I_{\textit{\textbf{x}}_0}\). Then, by Lemma 3.1(ii) \((\alpha _1,\dots ,\alpha _m)\) is a minimizer in Eq. (42), that is,
Hence, Eq. (9) holds for the index k. By assumption (A3), we have \(\theta _k < \sum _{j\ne k}\alpha _j \theta _j\). Taken together with the fact that \(\alpha _j=0\) whenever \(j\not \in I_{\textit{\textbf{x}}_0}\) and Eq. (51), we find
Hence, the right-hand side of Eq. (50) is strictly positive, and we conclude that \(\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\) for \(t>0\) if \(i\in I_{\textit{\textbf{x}}_0}\).
Therefore, in this case, when \(t>0\) is small enough and \(\textit{\textbf{x}}\) is chosen as above, we have \(\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\) for every \(i\ne k\), and the proof is complete. \(\square \)
1.3 B.3 Statement and proof of Lemma B.3
Lemma B.3
Suppose the parameters \(\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}\) satisfy assumptions (A1)-(A3). Define a function \(F:\ \mathbb {R}^{n+1}\rightarrow \mathbb {R}\cup \{+\infty \}\) by
for all \(\textit{\textbf{p}}\in \mathbb {R}^n\) and \(E^-\in \mathbb {R}\). Then, the convex envelope of F is given by
where the constraint set \(C(\textit{\textbf{p}},E^-)\) is defined by
Proof
First, we compute the convex hull of \({\mathrm {epi}~}F\), which we denote by \({{\mathrm {co}}~}({\mathrm {epi}~}F)\). Let \((\textit{\textbf{p}}, E^-,r)\in {{\mathrm {co}}~}({\mathrm {epi}~}F)\), where \(\textit{\textbf{p}}\in \mathbb {R}^n\) and \(E^-,r\in \mathbb {R}\). Then there exist \(k\in \mathbb {N}\), \((\beta _1, \dots , \beta _k)\in \varLambda _k\) and \((\textit{\textbf{q}}_i, E_i^-,r_i)\in {\mathrm {epi}~}F\) for each \(i\in \{1,\dots , k\}\) such that \((\textit{\textbf{p}}, E^-,r) = \sum _{i=1}^k \beta _i (\textit{\textbf{q}}_i, E_i^-,r_i)\). By definition of F in Eq. (53), \((\textit{\textbf{q}}_i, E_i^-,r_i)\in {\mathrm {epi}~}F\) holds if and only if \(\textit{\textbf{q}}_i\in {\mathrm {dom}~}J^*\), \(E_i^-+H(\textit{\textbf{q}}_i)\leqslant 0\) and \(r_i\geqslant J^*(\textit{\textbf{q}}_i)\). In conclusion, we have
For each i, since we have \(\textit{\textbf{q}}_i\in {\mathrm {dom}~}J^*\), by Lemma 3.2(i) the minimization problem in (14) evaluated at \(\textit{\textbf{q}}_i\) has at least one minimizer. Let \((\alpha _{i1}, \dots , \alpha _{im})\) be such a minimizer. Using Eqs. (12), (14), and \((\alpha _{i1}, \dots , \alpha _{im})\in \varLambda _m\), we have
Define the real number \(c_j{:}{=}\sum _{i=1}^k \beta _i \alpha _{ij}\) for any \(j\in \{1,\dots ,m\}\). Combining Eqs. (55) and (56), we get that \(c_j\geqslant 0\) for any j and
We continue the computation using Eq. (55) and get
Therefore, we conclude that \((c_1,\dots , c_m)\in \varLambda _m\) and
As a consequence, \({{\mathrm {co}}~}({\mathrm {epi}~}F)\subseteq {{\mathrm {co}}~}\left( \cup _{j=1}^m \left( \{\textit{\textbf{p}}_j\}\times (-\infty , -\theta _j]\times [\gamma _j, +\infty )\right) \right) \). Now, Lemmas 3.1(iii) and 3.2(iii) imply \(\{\textit{\textbf{p}}_j\}\times (-\infty , -\theta _j]\times [\gamma _j, +\infty ) \subseteq {\mathrm {epi}~}F\) for each \(j\in \{1,\dots ,m\}\). Therefore, we have
By [68, Def. IV.2.5.3 and Prop. IV.2.5.1], we have
The conclusion then follows from Eqs. (57) and (58). \(\square \)
1.4 B.4 Proof of Theorem 3.1
Proof of (i): First, the neural network f is the pointwise maximum of m affine functions in \((\textit{\textbf{x}},t)\) and therefore is jointly convex in these variables. Second, as the function H is continuous and bounded in \({\mathrm {dom}~}J^*\) by Lemma 3.2(ii), there exists a continuous and bounded function defined in \(\mathbb {R}^n\) whose restriction to \({\mathrm {dom}~}J^*\) coincides with H [57, Thm. 4.16]. Then, statement (i) follows by substituting this function for \(\tilde{H}\) in statement (ii), and so it suffices to prove the latter.
Proof of (ii) (sufficiency): Suppose \(\tilde{H}(\textit{\textbf{p}}_i) = H(\textit{\textbf{p}}_i)\) for every \(i\in \{1,\dots ,m\}\) and \(\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})\) for every \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\). Since \(\tilde{H}\) is continuous on \(\mathbb {R}^n\) and J is convex and Lipschitz continuous with Lipschitz constant \(L= \max _{i\in \{1,\dots ,m\}} \Vert \textit{\textbf{p}}_i\Vert \), [10, Thm. 3.1] implies that \((\textit{\textbf{x}},t) \mapsto \sup _{\textit{\textbf{p}}\in {\mathrm {dom}~}J^*}\left\{ \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})\right\} \) is the unique uniformly continuous viscosity solution to the HJ equation (16). But this function is equivalent to the neural network f by Lemma B.1, and therefore, both sufficiency and statement (i) follow.
Proof of (ii) (necessity): Suppose the neural network f is the unique uniformly continuous viscosity solution to (16). First, we prove that \(\tilde{H}(\textit{\textbf{p}}_k) = H(\textit{\textbf{p}}_k)\) for every \(k\in \{1,\dots , m\}\). Fix \(k\in \{1,\dots , m\}\). By Lemma B.2, there exist \(\textit{\textbf{x}}\in \mathbb {R}^n\) and \(t>0\) satisfying \(\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \{\textit{\textbf{p}}_k\}\). Use Lems. 3.1(iii) and 3.2(iii) to write the maximization problem in Eq. (8) as
where \((\textit{\textbf{p}},t) \mapsto \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - tH(\textit{\textbf{p}})-J^*(\textit{\textbf{p}})\) is continuous in \((\textit{\textbf{p}},t)\) and differentiable in t. As the feasible set \(\{\textit{\textbf{p}}_1,\dots ,\textit{\textbf{p}}_m\}\) is compact, f is also differentiable with respect to t [21, Prop. 4.12], and its derivative equals
Since \(\textit{\textbf{x}}\) and t satisfy \(\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)=\{\textit{\textbf{p}}_k\}\), [67, Thm. VI.4.4.2] implies that the only maximizer in Eq. (59) is \(\textit{\textbf{p}}_k\). As a result, there holds
Since f is convex on \(\mathbb {R}^n\), its subdifferential \(\partial f(\textit{\textbf{x}},t)\) is non-empty and satisfies
In other words, the subdifferential \(\partial f(\textit{\textbf{x}},t)\) contains only one element, and therefore, f is differentiable at \((\textit{\textbf{x}},t)\) and its gradient equals \((\textit{\textbf{p}}_k, -H(\textit{\textbf{p}}_k))\) [133, Thm. 21.5]. Using (16) and (60), we obtain
As \(k\in \{1,\dots , m\}\) is arbitrary, we find that \(H(\textit{\textbf{p}}_k)=\tilde{H}(\textit{\textbf{p}}_k)\) for every \(k\in \{1,\dots , m\}\).
Next, we prove by contradiction that \(\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})\) for every \(\textit{\textbf{p}}\in {\mathrm {dom}~}J^*\). It is enough to prove the property only for every \(\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*\) by continuity of both \(\tilde{H}\) and H (where continuity of H is proved in Lemma 3.2(ii)). Assume \(\tilde{H}(\textit{\textbf{p}}) < H(\textit{\textbf{p}})\) for some \(\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*\). Define two functions F and \(\tilde{F}\) from \(\mathbb {R}^n\times \mathbb {R}\) to \(\mathbb {R}\cup \{+\infty \}\) by
for any \(\textit{\textbf{q}}\in \mathbb {R}^n\) and \(E^-\in \mathbb {R}\). Denoting the convex envelope of F by \({{\mathrm {co}}~}F\), Lemma B.3 implies
Let \(E_1^- \in \left( -H(\textit{\textbf{p}}), -\tilde{H}(\textit{\textbf{p}})\right) \). Now, we want to prove that \({{\mathrm {co}}~}F(\textit{\textbf{p}}, E_1^-)\leqslant J^*(\textit{\textbf{p}})\); this inequality will lead to a contradiction with the definition of H.
Using statement (i) of this theorem and the supposition that f is the unique viscosity solution to the HJ equation (16), we have that
Furthermore, a similar calculation as in the proof of [39, Prop. 3.1] yields
where \(\overline{{\mathrm {co}}}~F\) and \(\overline{{\mathrm {co}}}~\tilde{F}\) denotes the convex lower semicontinuous envelopes of F and \(\tilde{F}\), respectively. On the one hand, since \(f^* = \overline{{\mathrm {co}}}~\tilde{F}\), the definition of \(\tilde{F}\) in Eq. (61) implies
Recall that \(\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*\) and \(E_1^-<-\tilde{H}(\textit{\textbf{p}})\), so that \((\textit{\textbf{p}}, E_1^-) \in {\mathrm {ri}}~{\mathrm {dom}~}f^*\). As a result, we get
On the other hand, since \(f^* = {{\mathrm {co}}~}F\), we have \({\mathrm {ri}}~{\mathrm {dom}~}f^* = {\mathrm {ri}}~{\mathrm {dom}~}({{\mathrm {co}}~}F)\) and \(f^* = {{\mathrm {co}}~}F\) in \({\mathrm {ri}}~{\mathrm {dom}~}f^*\). Taken together with Eq. (64) and the continuity of \(f^*\), there holds
Note that \({{\mathrm {co}}~}F(\textit{\textbf{p}},\cdot )\) is monotone non-decreasing. Indeed, if \(E_2^-\) is a real number such that \(E_2^- > E_1^-\), by the definition of the set C in Eq. (62) there holds \(C(\textit{\textbf{p}}, E_2^-)\subseteq C(\textit{\textbf{p}}, E_1^-)\), which implies \({{\mathrm {co}}~}F(\textit{\textbf{p}},E_2^-)\geqslant {{\mathrm {co}}~}F(\textit{\textbf{p}},E_1^-)\). Recalling that \(E_1^- < -\tilde{H}(\textit{\textbf{p}})\), monotonicity of \({{\mathrm {co}}~}F(\textit{\textbf{p}},\cdot )\) and Eq. (65) imply
Combining Eqs. (63) and (66), we get
As a result, the set \(C(\textit{\textbf{p}}, E_1^-)\) is non-empty. Since it is also compact, there exists a minimizer in Eq. (62) evaluated at the point \((\textit{\textbf{p}}, E_1^-)\). Let \((c_1,\dots , c_m)\) be such a minimizer. By Eqs. (62) and (67) and the assumption that \(E_1^- \in \left( -H(\textit{\textbf{p}}),-\tilde{H}(\textit{\textbf{p}})\right) \), there holds
Comparing the first three statements in Eq. (68) and the formula of \(J^*\) in Eq. (12), we deduce that \((c_1,\dots , c_m)\) is a minimizer in Eq. (12), i.e., \((c_1,\dots , c_m)\in \mathcal {A}(\textit{\textbf{p}})\). By definition of H in Eq. (14), we have
which contradicts the last inequality in Eq. (68). Therefore, we conclude that \(\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})\) for any \(\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*\) and the proof is finished.
C Connections between the neural network (17) and the viscous HJ PDE (18)
Let \(f_\epsilon \) be the neural network defined by Eq. (17) with parameters \(\{(\textit{\textbf{p}}_{i}, \theta _i, \gamma _{i})\}_{i=1}^m\) and \(\epsilon > 0\), which is illustrated in Fig. 3. We will show in this appendix that when the parameter \(\theta _{i} = -\frac{1}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}\) for \(i\in \{1,\dots ,m\}\), then the neural network \(f_\epsilon \) corresponds to the unique, jointly convex smooth solution to the viscous HJ PDE (18). This result will follow immediately from the following lemma.
Lemma C.1
Let \(\{(\textit{\textbf{p}}_i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\) and \(\epsilon > 0\). Then, the function \(w_{\epsilon } : \mathbb {R}^n \mapsto \mathbb {R}\) defined by
is the unique, jointly log-convex and smooth solution to the Cauchy problem
Proof
A short calculation shows that the function \(w_\epsilon \) defined in Eq. (69) solves the Cauchy problem (70), and uniqueness holds by strict positiveness of the initial data (see [147, Chap. VIII, Thm. 2.2] and note that the uniqueness result can easily be generalized to \(n > 1\)).
Now, let \(\lambda \in [0,1]\) and \((\textit{\textbf{x}}_{1},t_{1})\) and \((\textit{\textbf{x}}_{2},t_{2})\) be such that \(\textit{\textbf{x}}=\lambda \textit{\textbf{x}}_{1}+(1-\lambda )\textit{\textbf{x}}_{2}\) and \(t=\lambda t_{1}+(1-\lambda )t_{2}\). Then, the Hölder’s inequality (see, e.g., [57, Thm. 6.2]) implies
and we find \(w_{\epsilon }(\textit{\textbf{x}},t)\leqslant \left( w_{\epsilon }(\textit{\textbf{x}}_{1},t_{1})\right) ^{\lambda }\left( w_{\epsilon }(\textit{\textbf{x}}_{2},t_{2})\right) ^{1-\lambda }\), which implies that \(w_\epsilon \) is jointly log-convex in \((\textit{\textbf{x}},t)\). \(\square \)
Thanks to Lemma C.1 and the Cole–Hopf transformation \(f_\epsilon (\textit{\textbf{x}},t) = \epsilon \log \left( w_\epsilon (\textit{\textbf{x}},t)\right) \) (see, e.g., [47], Sect. 4.4.1), a short calculation immediately implies that the neural network \(f_\epsilon \) solves the viscous HJ PDE (18), and it is also its unique solution because \(w_{\epsilon }\) is the unique solution to the Cauchy problem (70). Joint convexity in \((\textit{\textbf{x}},t)\) follows from log-convexity of \((\textit{\textbf{x}},t)\mapsto w_{\epsilon }(\textit{\textbf{x}},t)\) for every \(\epsilon >0\).
D Proof of Proposition 3.1
To prove this proposition, we will use three lemmas whose statements and proofs are given in Sect. D.1, D.2, and D.3, respectively. The proof of Prop. 3.1 is given in Sect. D.4.
1.1 D.1 Statement and proof of Lemma D.1
Lemma D.1
Consider the one-dimensional case, i.e., \(n=1\). Let \(p_1,\dots ,p_m\in \mathbb {R}\) satisfy \(p_1<\dots <p_m\) and define the function J using Eq. (10). Suppose assumptions (A1)-(A2) hold. Let \(x\in \mathbb {R}\), \(p\in \partial J(x)\), and suppose \(p\ne p_i\) for any \(i\in \{1,\dots ,m\}\). Then, there exists \(k\in \{1,\dots ,m\}\) such that \(p_k< p < p_{k+1}\) and
Proof
Let \(I_x\) denotes the set of maximizers in Eq. (11) at x. Since \(p\in \partial J(x)\), \(p \ne p_i\) for \(i \in \{1,\dots ,m\}\), and \(\partial J(x) = {{\mathrm {co}}~}\{p_i: i\in I_i\}\) by [67, Thm. VI.4.4.2], there exist \(j,l\in I_x\) such that \(p_j< p < p_l\). Moreover, there exists k with \(j\leqslant k<k+1 \leqslant l\) such that \(p_j \leqslant p_k< p <p_{k+1}\leqslant p_l\). We will show that \(k,k+1 \in I_x\). We only prove \(k\in I_x\); the case for \(k+1\) is similar.
If \(p_j=p_k\), then \(k=j\in I_x\) and the conclusion follows directly. Hence suppose \(p_j< p_k < p_l\). Then, there exists \(\alpha \in (0,1)\) such that \(p_k = \alpha p_j + (1-\alpha ) p_l\). Using that \(j,l\in I_x\), assumption (A2), and Jensen inequality, we get
which implies that \(k \in I_x\). A similar argument shows that \(k+1\in I_x\), which completes the proof. \(\square \)
1.2 D.2 Statement and proof of Lemma D.2
Lemma D.2
Consider the one-dimensional case, i.e., \(n=1\). Let \(p_1,\dots ,p_m\in \mathbb {R}\) satisfy \(p_1<\cdots <p_m\) and define the function H using Eq. (14). Suppose assumptions (A1)–(A3) hold. Let \(u_0\in \mathbb {R}\) and \(p_k< u_0< p_{k+1}\) for some index k. Then, there holds
where
Proof
Let \(\varvec{\beta }{:}{=}(\beta _1,\dots ,\beta _m) \in \varLambda _{m}\) satisfy
and \(\beta _i = 0\) for every \(i\in \{1,\dots ,m\}\setminus \{k,k+1\}\). We will prove that \(\varvec{\beta }\) is a minimizer in Eq. (14) evaluated at \(u_0\), that is,
where
First, we show that \(\varvec{\beta } \in \mathcal {A}(u_0)\). By definition of \(\varvec{\beta }\) and Lemma 3.1(ii) with \(p = u_0\), the statement holds provided \(k,k+1\in I_x\), where the set \(I_x\) contains the maximizers in Eq. (10) evaluated at \(x \in \partial J^*(u_0)\). But if \(x\in \partial J^*(u_0)\), we have \(u_0\in \partial J(x)\), and Lemma D.1 implies \(k,k+1\in I_x\). Hence, \(\varvec{\beta } \in \mathcal {A}(u_0)\).
Now, suppose that \(\varvec{\beta }\) is not a minimizer in Eq. (14) evaluated at \(u_0\). By Lemma 3.2(i), there exists a minimizer in Eq. (14) evaluated at the point \(u_0\), which we denote by \((\alpha _1,\dots , \alpha _m)\). Then there holds
Since \(\alpha _i \geqslant 0\) for every i and \(\beta _i = 0\) for every \(i\in \{1,\dots ,m\}\setminus \{k,k+1\}\), we have \(\alpha _k + \alpha _{k+1} \leqslant 1 = \beta _k + \beta _{k+1}\). As \(\varvec{\alpha }\ne \varvec{\beta }\), then one or both of the inequalities \(\alpha _k < \beta _k\) and \(\alpha _{k+1} < \beta _{k+1}\) hold. This leaves three possible cases, and we now show that each case leads to a contradiction.
Case 1: Let \(\alpha _k < \beta _k\) and \(\alpha _{k+1} \geqslant \beta _{k+1}\). Define the coefficient \(c_i\) by
The following equations then hold
These equations, however, violate assumption (A3), and so we get a contradiction.
Case 2: Let \(\alpha _k \geqslant \beta _k\) and \(\alpha _{k+1} < \beta _{k+1}\). A similar argument as in case 1 can be applied here by exchanging the indices k and \(k+1\) to derive a contradiction.
Case 3: Let \(\alpha _k < \beta _k\) and \(\alpha _{k+1} < \beta _{k+1}\). From Eq. (74), we obtain
Define two numbers \(q_k\) and \(q_{k+1}\) by
Note that from the first two equations in (74) and the assumption that \(\alpha _k < \beta _k\) and \(\alpha _{k+1} < \beta _{k+1}\), there exist \(i_1<k\) and \(i_2>k+1\) such that \(\alpha _{i_1}\ne 0\) and \(\alpha _{i_2}\ne 0\), and hence, the numbers \(q_k\) and \(q_{k+1}\) are well-defined. By definition, we have \(q_k<p_k<p_{k+1}<q_{k+1}\). Therefore, there exist \(b_k,b_{k+1}\in (0,1)\) such that
A straightforward computation yields
Define the coefficients \(c_i^k\) and \(c_i^{k+1}\) as follows
These coefficients satisfy \(c_i^k, c_i^{k+1}\in [0,1]\) for any i and \(\sum _{i=1}^m c_i^k= \sum _{i=1}^m c_i^{k+1}=1\). In other words, we have
Hence, the first equality in Eq. (9) holds for the coefficients \((c_1^k,\dots , c_m^k)\) with the index k and also for the coefficients \((c_1^{k+1}, \dots , c_m^{k+1})\) with the index \(k+1\). We show next that these coefficients satisfy the second and third equalities in (9) and draw a contradiction with assumption (A3).
Using Eqs. (76), (77), and (79) to write the formulas for \(p_k\) and \(p_{k+1}\) via the coefficients \(c_i^k\) and \(c_i^{k+1}\), we find
where the last equalities in the two formulas above hold because \(c_{k+1}^k=0\) and \(c_k^{k+1} = 0\) by definition. Hence, the second equality in Eq. (9) also holds for both the index k and \(k+1\).
From the third equality in Eq. (75), assumption (A2), Eq. (81), and Jensen’s inequality, we have
We now compute and simplify the coefficients \((\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1}\) in the formula above. First, consider the case when \(i<k\). Eqs. (78) and (79) imply
Applying the first two equalities in Eq. (75) and Eq. (76) to the last formula above, we obtain
The same result for the case when \(i>k+1\) also holds and the proof is similar. Therefore, we have
Combining Eqs. (82) and (83), we have
Since the left side and right side are the same, the inequality above becomes equality, which implies that the inequality in Eq. (82) also becomes equality. In other words, we have
where the last equalities in the two formulas above hold because \(c_{k+1}^k=0\) and \(c_k^{k+1} = 0\) by definition. Hence, the third equality in (9) also holds for both indices k and \(k+1\).
In summary, Eqs. (80), (81), and (84) imply that Eq. (9) holds for the index k with coefficients \((c_1^k,\dots , c_m^k)\) and also for the index \(k+1\) with coefficients \((c_1^{k+1}, \dots , c_m^{k+1})\). Hence, by assumption (A3), we find
Using the inequalities above with Eq. (83) and the fact that \(c_{k+1}^k=0\) and \(c_k^{k+1} = 0\), we find
which contradicts the last inequality in Eq. (75).
In conclusion, we obtain contradictions in all the three cases. As a consequence, we conclude that \(\varvec{\beta }\) is a minimizer in Eq. (14) evaluated at \(u_0\) and Eq. (72) follows from the definition of H in (14). \(\square \)
1.3 D.3 Statement and proof of Lemma D.3
Lemma D.3
Consider the one-dimensional case, i.e., \(n=1\). Let \(p_1,\dots ,p_m\in \mathbb {R}\) satisfy \(p_1<\dots <p_m\). Suppose assumptions (A1)-(A2) hold. Let \(x\in \mathbb {R}\) and \(t>0\). Assume j, k, l are three indices such that \(1\leqslant j\leqslant k<l\leqslant m\) and
Then, there holds
Proof
Note that Eq. (86) holds trivially when \(j=k\), so we only need to consider the case when \(j<k<l\). On the one hand, Eq. (85) implies
which yields
On the other hand, for each \(i\in \{j,j+1,\dots , l-1\}\) let \(q_i\in (p_i, p_{i+1})\) and \(x_i \in \partial J^*(q_i)\). Such \(x_i\) exists because \(q_i \in {\mathrm {int}~}{{\mathrm {dom}~}J^*}\), so that the subdifferential \(\partial J^*(q_i)\) is non-empty. Then, \(q_i\in \partial J(x_i)\) and Lemma D.1 imply
A straightforward computation yields
Combining the two equalities above with Eq. (87), we conclude that
Now, divide the inequality above by \(t(p_l - p_k) > 0\) (because by assumption \(t>0\) and \(l>k\), which implies that \(p_l>p_k\)), divide the equality above by \(t(p_l - p_j) > 0\) (because \(l>j\), which implies that \(t(p_l - p_j) \ne 0\)), and rearrange the terms to obtain
Recall that \(q_j< q_{j+1}< \dots < q_{l-1}\) and \(x_i \in \partial J^*(q_i)\) for any \(j\leqslant i<l\). Since the function \(J^*\) is convex, the subdifferential operator \(\partial J^*\) is a monotone non-decreasing operator [67, Def. IV.4.1.3, and Prop. VI.6.1.1], which yields \(x_j\leqslant x_{j+1}\leqslant \dots \leqslant x_{l-1}\). Using that \(p_1< p_2< \dots < p_m\) and \(j<k<l\), we obtain
To proceed, we now use that fact that if four real numbers \(a,c\in \mathbb {R}\) and \(b,d>0\) satisfy \(\frac{a}{b}\geqslant \frac{c}{d}\), then \(\frac{a}{b}\geqslant \frac{a+c}{b+d}\). Combining this fact with inequality (89), we find
We combine the inequality above with (88) to obtain
which concludes the proof. \(\square \)
1.4 D.4 Proof of Proposition 3.1
Proof of (i): First, note that u is piecewise constant. Second, recall that J is defined as the pointwise maximum of a finite number of affine functions. Therefore, the initial data \(u(\cdot ,0) = \nabla J(\cdot )\) (recall that here, the gradient \(\nabla \) is taken in the sense of distribution) are bounded and of locally bounded variation (see [48, Chap. 5, page 167] for the definition of locally bounded variation). Finally, the flux function H, defined in Eq. (14), is Lipschitz continuous in \({\mathrm {dom}~}J^*\) by Lemma D.2. It can therefore be extended to \(\mathbb {R}\) while preserving its Lipschitz property [57, Thm. 4.16]. Therefore, we can invoke [36, Prop. 2.1] to conclude that u is the entropy solution to the conservation law (21) provided it satisfies the two following conditions. Let \(\bar{x}(t)\) be any smooth line of discontinuity of u. Fix \(t>0\) and define \(u^-\) and \(u^+\) as
Then, the two conditions are:
-
1.
The curve \(\bar{x}(t)\) is a straight line with the slope
$$\begin{aligned} \frac{d\bar{x}}{dt} = \frac{H(u^+)-H(u^-)}{u^+- u^-}. \end{aligned}$$(91) -
2.
For any \(u_0\) between \(u^+\) and \(u^-\), we have
$$\begin{aligned} \frac{H(u^+) - H(u_0)}{u^+- u_0} \leqslant \frac{H(u^+) - H(u^-)}{u^+- u^-}. \end{aligned}$$(92)
First, we prove the first condition and Eq. (91). According to the definition of u in Eq. (20), the range of u is the compact set \(\{p_1,\dots , p_m\}\). As a result, \(u^-\) and \(u^+\) are in the range of u, i.e., there exist indices j and l such that
Let \((\bar{x}(s),s)\) be a point on the curve \(\bar{x}\) which is not one of the endpoints. Since u is piecewise constant, there exists a neighborhood \(\mathcal {N}\) of \((\bar{x}(s),s)\) such that for any \((x^-,t), (x^+,t)\in \mathcal {N}\) satisfying \(x^-< \bar{x}(t) < x^+\), we have \(u(x^-,t) = u^-= p_j\) and \(u(x^+,t) = u^+= p_l\). In other words, if \(x^-, x^+, t\) are chosen as above, according to the definition of u in Eq. (20), we have
Define a sequence \(\{x^-_k\}_{k=1}^{+\infty } \subset (-\infty ,\bar{x}(s))\) such that \((x^-_k, s)\in \mathcal {N}\) for any \(k\in \mathbb {N}\) and \(\lim _{k\rightarrow +\infty }x^-_k = \bar{x}(s)\). By Eq. (94), we have
When k approaches infinity, the above inequality implies
In other words, we have
Similarly, define a sequence \(\{x^+_k\}_{k=1}^{+\infty } \subset (\bar{x}(s), +\infty )\) such that \((x^+_k, s)\in \mathcal {N}\) for any \(k\in \mathbb {N}\) and \(\lim _{k\rightarrow +\infty }x^+_k = \bar{x}(s)\). Using a similar argument as above, we can conclude that
By a continuity argument, Eqs. (95) and (96) also hold for the end points of \(\bar{x}\). In conclusion, for any \((\bar{x}(t), t)\) on the curve \(\bar{x}\), we have
which implies that
Therefore, the curve \(\bar{x}(t)\) lies on the straight line
and Eq. (93) and Lemma 3.2(iii) imply that its slope equals
This proves Eq. (91) and the first condition holds.
It remains to show the second condition. Since u equals \(\nabla _x f\) and f is convex by Theorem 3.1, its corresponding subdifferential operator u is monotone non-decreasing with respect to x [67, Def. IV.4.1.3 and Prop. VI.6.1.1]. As a result, \(u^-< u^+\) and \(u_0\in (u^-, u^+)\), where we still adopt the notation \(u^-= p_j\) and \(u^+= p_l\). Recall that Lemma 3.2(iii) implies \(H(p_i) = \theta _i\) for any i. Then, Eq. (92) in the second condition becomes
Without loss of generality, we may assume that \(p_1< p_2< \dots < p_m\). Then, the fact \(p_j=u^-< u^+= p_l\) implies \(j<l\). We consider the following two cases.
First, if there exists some k such that \(u_0= p_k\), then \(H(u_0) = \theta _k\) by Lemma 3.2(iii). Since \(u^-< u_0< u^+\), we have \(j< k < l\). Recall that Eq. (97) holds. Therefore, the assumptions of Lemma D.3 are satisfied, which implies Eq. (98) holds.
Second, suppose \(u_0\ne p_i\) for every \(i\in \{1,\dots ,m\}\). Then there exists some \(k\in \{j, j+1,\dots , l-1\}\) such that \(p_k< u_0< p_{k+1}\). Lemma D.2 then implies that Eqs. (72) and (73) hold, that is,
Using these three equations, we can write the left-hand side of Eq. (98) as
If \(k+1 = l\), then this equation become
Since \(j\leqslant k<l\) and Eq. (97) hold, then the assumptions of Lemma D.3 are satisfied. This allows us to conclude that Eq. (98) holds.
If \(k+1\ne l\), then using Eq. (97), the inequalities \(j\leqslant k<k+1< l\), and Lemma D.3, we obtain
Note that if \(a_i \in \mathbb {R}\) and \(b_i \in (0,+\infty )\) for \(i\in \{1,2,3\}\) satisfy \(\frac{a_1}{b_1} \leqslant \frac{a_3}{b_3}\) and \(\frac{a_2}{b_2} \leqslant \frac{a_3}{b_3}\), then \(\frac{a_1+a_2}{b_1+b_2}\leqslant \frac{a_3}{b_3}\). Then, since \(\beta _k(p_l - p_k)\), \(\beta _{k+1}(p_l - p_{k+1})\) and \(p_l-p_j\) are positive, we have
Hence, Eq. (98) follows directly from the inequality above and Eq. (99).
Therefore, the two conditions, including Eqs. (91) and (92), are satisfied and we apply [36, Prop 2.1] to conclude that the function u is the entropy solution to the conservation law (21).
Proof of (ii) (sufficiency): Without loss of generality, assume \(p_1<p_2<\dots <p_m\). Let \(C\in \mathbb {R}\). Suppose \(\tilde{H}\) satisfies \(\tilde{H}(p_i)=H(p_i)+C\) for each \(i\in \{1,\dots ,m\}\) and \(\tilde{H}(p)\geqslant H(p)+C\) for any \(p\in [p_1,p_m]\). We want to prove that u is the entropy solution to the conservation law (22).
As in the proof of (i), we apply [36, Prop 2.1] and verify that the two conditions hold through Eqs. (91) and (92). Let \(\bar{x}(t)\) be any smooth line of discontinuity of u, define \(u^-\) and \(u^+\) by Eq. (90) (and recall that \(u^-= p_j\) and \(u^+= p_l\)), and let \(u_0\in (u^-,u^+)\). We proved in the proof of (i) that \(\bar{x}(t)\) is a straight line, and so it suffices to prove that
We start with proving the equality in Eq. (100). By assumption, there holds
We combine Eq. (101) with Eq. (91), (which we proved in the proof of (i)), we obtain
Therefore, the equality in (100) holds.
Next, we prove the inequality in Eq. (100). Since \(u_0\in (u^-,u^+)\subseteq [p_1,p_m]\), by assumption there holds \(\tilde{H}(u_0) \geqslant H(u_0) +C\). Taken together with Eqs. (92) and (101), we get
which shows that the inequality in Eq. (100) holds.
Hence, we can invoke [36, Prop 2.1] to conclude that u is the entropy solution to the conservation law (22).
Proof of (ii) (necessity): Suppose that u is the entropy solution to the conservation law (22). We prove that there exists \(C\in \mathbb {R}\) such that \(\tilde{H}(p_i)=H(p_i)+C\) for any i and \(\tilde{H}(p)\geqslant H(p)+C\) for any \(p\in [p_1,p_m]\).
By Lemma B.2, for each \(i\in \{1,\dots ,m\}\) there exist \(x\in \mathbb {R}\) and \(t>0\) such that
Moreover, the proof of Lemma B.2 implies there exists \(T>0\) such that for any \(0<t<T\), there exists \(x\in \mathbb {R}\) such that Eq. (102) holds. As a result, there exists \(t>0\) such that for each \(i\in \{1,\dots ,m\}\), there exists \(x_i\in \mathbb {R}\) satisfying Eq. (102) at the point \((x_i,t)\), which implies \(u(x_i,t) = p_i\). Note that \(p_i\ne p_j\) implies that \(x_i \ne x_j\). (Indeed, if \(x_i = x_j\), then \(p_i = \nabla _x f(x_i,t) = \nabla _x f(x_j,t) = p_j\) which gives a contradiction since \(p_i \ne p_j\) by assumption (A1).) As mentioned before, the function \(u(\cdot ,t) \equiv \nabla _{x}f\) is a monotone non-decreasing operator and \(p_i\) is increasing with respect to i, and therefore \(x_1<x_2<\dots <x_m\). Since u is piecewise constant, for each \(k\in \{1,\dots ,m-1\}\) there exists a curve of discontinuity of u with \(u = p_k\) on the left-hand side of the curve and \(u=p_{k+1}\) on the right-hand side of the curve. Let \(\bar{x}(s)\) be such a curve and let \(u^-\) and \(u^+\) be the corresponding numbers defined in Eq. (90). The argument above proves that we have \(u^-= p_k\) and \(u^+= p_{k+1}\).
Since u is the piecewise constant entropy solution, we invoke [36, Prop 2.1] to conclude that the two aforementioned conditions hold for the curve \(\bar{x}(s)\), i.e., (100) holds with \(u^-= p_k\) and \(u^+= p_{k+1}\). From the equality in (100) and Eq. (91) proved in (i), we deduce
Since k is an arbitrary index, the equality above implies that \(\tilde{H}(p_{k+1})-\tilde{H}(p_k) = H(p_{k+1})-H(p_k)\) holds for any \(k\in \{1,\dots ,m-1\}\). Therefore, there exists \(C\in \mathbb {R}\) such that
It remains to prove \(\tilde{H}(u_0)\geqslant H(u_0)+C\) for all \(u_0\in [p_k,p_{k+1}]\). If this inequality holds, then the statement follows because k is an arbitrary index. We already proved that \(\tilde{H}(u_0)\geqslant H(u_0)+C\) for \(u_0=p_k\) with \(k\in \{1,\dots ,m\}\). Therefore, we need to prove that \(\tilde{H}(u_0)\geqslant H(u_0)+C\) for all \(u_0\in (p_k,p_{k+1})\). Let \(u_0\in (p_k,p_{k+1})\). By Eq. (103) and the inequality in (100), we have
By Lemma D.2 and a straightforward computation, we also have
Comparing Eqs. (104) and (105), we obtain \(\tilde{H}(u_0)\geqslant H(u_0)+C\). Since k is arbitrary, we conclude that \(\tilde{H}(u_0)\geqslant H(u_0)+C\) holds for all \(u_0\in [p_1,p_m]\) and the proof is complete.
Rights and permissions
About this article
Cite this article
Darbon, J., Langlois, G.P. & Meng, T. Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures. Res Math Sci 7, 20 (2020). https://doi.org/10.1007/s40687-020-00215-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40687-020-00215-6