Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures

Darbon, Jérôme; Langlois, Gabriel P.; Meng, Tingwei

doi:10.1007/s40687-020-00215-6

Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures

Research
Published: 14 July 2020

Volume 7, article number 20, (2020)
Cite this article

Research in the Mathematical Sciences Aims and scope Submit manuscript

1151 Accesses
47 Citations
Explore all metrics

Abstract

We propose new and original mathematical connections between Hamilton–Jacobi (HJ) partial differential equations (PDEs) with initial data and neural network architectures. Specifically, we prove that some classes of neural networks correspond to representation formulas of HJ PDE solutions whose Hamiltonians and initial data are obtained from the parameters of the neural networks. These results do not rely on universal approximation properties of neural networks; rather, our results show that some classes of neural network architectures naturally encode the physics contained in some HJ PDEs. Our results naturally yield efficient neural network-based methods for evaluating solutions of some HJ PDEs in high dimension without using grids or numerical approximations. We also present some numerical results for solving some inverse problems involving HJ PDEs using our proposed architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polynomial Neural Forms Using Feedforward Neural Networks for Solving Differential Equations

Solving Differential Equations Using Feedforward Neural Networks

Computational characteristics of feedforward neural networks for solving a stiff differential equation

Article Open access 20 January 2022

References

Aaibid, M., Sayah, A.: A direct proof of the equivalence between the entropy solutions of conservation laws and viscosity solutions of Hamilton–Jacobi equations in one-space variable. JIPAM J. Inequal. Pure Appl. Math. 7(2), 11 (2006)
MathSciNet MATH Google Scholar
Akian, M., Bapat, R., Gaubert, S.: Max-plus algebra. Handbook of Linear Algebra 39, (2006)
Akian, M., Gaubert, S., Lakhoua, A.: The max-plus finite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM J. Control Optim. 47(2), 817–848 (2008)
Article MathSciNet MATH Google Scholar
Alla, A., Falcone, M., Saluzzi, L.: An efficient DP algorithm on a tree-structure for finite horizon optimal control problems. SIAM J. Sci. Comput. 41(4), A2384–A2406 (2019)
Article MathSciNet MATH Google Scholar
Alla, A., Falcone, M., Volkwein, S.: Error analysis for POD approximations of infinite horizon problems via the dynamic programming approach. SIAM J. Control Optim. 55(5), 3091–3115 (2017)
Article MathSciNet MATH Google Scholar
Arnol’d, V.I.: Mathematical methods of classical mechanics. Graduate Texts in Mathematics, vol. 60. Springer, New York (1989). Translated from the 1974 Russian original by K. Vogtmann and A. Weinstein, Corrected reprint of the second (1989) edition
Bachouch, A., Huré, C., Langrené, N., Pham, H.: Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications. arXiv preprint arXiv:1812.05916 (2018)
Banerjee, K., Georganas, E., Kalamkar, D., Ziv, B., Segal, E., Anderson, C., Heinecke, A.: Optimizing deep learning RNN topologies on intel architecture. Supercomput. Front. Innov. 6(3), 64–85 (2019)
Google Scholar
Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Syst. Control Found. Appl. Birkhäuser Boston, Inc., Boston, MA (1997). https://doi.org/10.1007/978-0-8176-4755-1. With appendices by Maurizio Falcone and Pierpaolo Soravia
Bardi, M., Evans, L.: On Hopf’s formulas for solutions of Hamilton–Jacobi equations. Nonlinear Anal. Theory, Methods Appl. 8(11), 1373–1381 (1984). https://doi.org/10.1016/0362-546X(84)90020-8
Article MathSciNet MATH Google Scholar
Barles, G.: Solutions de viscosité des équations de Hamilton–Jacobi. Mathématiques et Applications. Springer, Berlin (1994)
MATH Google Scholar
Barles, G., Tourin, A.: Commutation properties of semigroups for first-order Hamilton–Jacobi equations and application to multi-time equations. Indiana Univ. Math. J. 50(4), 1523–1544 (2001)
Article MathSciNet MATH Google Scholar
Barron, E., Evans, L., Jensen, R.: Viscosity solutions of Isaacs’ equations and differential games with Lipschitz controls. J. Differ. Equ. 53(2), 213–233 (1984). https://doi.org/10.1016/0022-0396(84)90040-8
Article MathSciNet MATH Google Scholar
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A.: Deep splitting method for parabolic PDEs. (2019). arXiv preprint arXiv:1907.03452
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. (2018). arXiv preprint arXiv:1806.00421
Beck, C., E, W., Jentzen, A.: Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J. Nonlinear Sci. 29(4), 1563–1619 (2019)
Bellman, R.E.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Book MATH Google Scholar
Berg, J., Nyström, K.: A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317, 28–41 (2018). https://doi.org/10.1016/j.neucom.2018.06.056
Article Google Scholar
Bertsekas, D.P.: Reinforcement Learning and Optimal Control. Athena Scientific, Belmont (2019)
Google Scholar
Bokanowski, O., Garcke, J., Griebel, M., Klompmaker, I.: An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton–Jacobi Bellman equations. J. Sci. Comput. 55(3), 575–605 (2013)
Article MathSciNet MATH Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, New York (2000). https://doi.org/10.1007/978-1-4612-1394-9
Book MATH Google Scholar
Brenier, Y., Osher, S.: Approximate Riemann solvers and numerical flux functions. SIAM J. Numer. Anal. 23(2), 259–273 (1986)
Article MathSciNet MATH Google Scholar
Brenier, Y., Osher, S.: The discrete one-sided Lipschitz condition for convex scalar conservation laws. SIAM J. Numer. Anal. 25(1), 8–23 (1988). https://doi.org/10.1137/0725002
Article MathSciNet MATH Google Scholar
Buckdahn, R., Cardaliaguet, P., Quincampoix, M.: Some recent aspects of differential game theory. Dyn. Games Appl. 1(1), 74–114 (2011). https://doi.org/10.1007/s13235-010-0005-0
Article MathSciNet MATH Google Scholar
Carathéodory, C.: Calculus of variations and partial differential equations of the first order. Part I: Partial differential equations of the first order. Translated by Robert B. Dean and Julius J. Brandstatter. Holden-Day, Inc., San Francisco-London-Amsterdam (1965)
Carathéodory, C.: Calculus of variations and partial differential equations of the first order. Part II: Calculus of variations. Translated from the German by Robert B. Dean, Julius J. Brandstatter, translating editor. Holden-Day, Inc., San Francisco-London-Amsterdam (1967)
Cardin, F., Viterbo, C.: Commuting Hamiltonians and Hamilton–Jacobi multi-time equations. Duke Math. J. 144(2), 235–284 (2008). https://doi.org/10.1215/00127094-2008-036
Article MathSciNet MATH Google Scholar
Caselles, V.: Scalar conservation laws and Hamilton–Jacobi equations in one-space variable. Nonlinear Anal. Theory Methods Appl. 18(5), 461–469 (1992). https://doi.org/10.1016/0362-546X(92)90013-5
Article MathSciNet MATH Google Scholar
Chan-Wai-Nam, Q., Mikael, J., Warin, X.: Machine learning for semi linear PDEs. J. Sci. Comput. 79(3), 1667–1712 (2019)
Article MathSciNet MATH Google Scholar
Chen, T., van Gelder, J., van de Ven, B., Amitonov, S.V., de Wilde, B., Euler, H.C.R., Broersma, H., Bobbert, P.A., Zwanenburg, F.A., van der Wiel, W.G.: Classification with a disordered dopant-atom network in silicon. Nature 577(7790), 341–345 (2020)
Article Google Scholar
Cheng, T., Lewis, F.L.: Fixed-final time constrained optimal control of nonlinear systems using neural network HJB approach. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 3016–3021 (2006). https://doi.org/10.1109/CDC.2006.377523
Corrias, L., Falcone, M., Natalini, R.: Numerical schemes for conservation laws via Hamilton–Jacobi equations. Math. Comput. 64(210), 555–580, S13–S18 (1995). https://doi.org/10.2307/2153439
Courant, R., Hilbert, D.: Methods of mathematical physics. Vol. II. Wiley Classics Library. Wiley: New York (1989). Partial differential equations, Reprint of the 1962 original, A Wiley-Interscience Publication
Crandall, M.G., Ishii, H., Lions, P.L.: User’s guide to viscosity solutions of second order partial differential equations. Bull. Am. Math. Soc. 27(1), 1–67 (1992). https://doi.org/10.1090/S0273-0979-1992-00266-5
Article MathSciNet MATH Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274
Article MathSciNet MATH Google Scholar
Dafermos, C.M.: Polygonal approximations of solutions of the initial value problem for a conservation law. J. Math. Anal. Appl. 38(1), 33–41 (1972). https://doi.org/10.1016/0022-247X(72)90114-X
Article MathSciNet MATH Google Scholar
Dafermos, C.M.: Hyperbolic conservation laws in continuum physics, Grundlehren der Mathematischen Wissenschaften, vol. 325, 4th Edn. Springer, Berlin (2016). https://doi.org/10.1007/978-3-662-49451-6
Darbon, J.: On convex finite-dimensional variational methods in imaging sciences and Hamilton–Jacobi equations. SIAM J. Imaging Sci. 8(4), 2268–2293 (2015). https://doi.org/10.1137/130944163
Article MathSciNet MATH Google Scholar
Darbon, J., Meng, T.: On decomposition models in imaging sciences and multi-time Hamilton-Jacobi partial differential equations. (2019). arXiv preprint arXiv:1906.09502
Darbon, J., Osher, S.: Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere. Res. Math. Sci. 3(1), 19 (2016). https://doi.org/10.1186/s40687-016-0068-7
Article MathSciNet MATH Google Scholar
Dissanayake, M.W.M.G., Phan-Thien, N.: Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 10(3), 195–201 (1994). https://doi.org/10.1002/cnm.1640100303
Article MATH Google Scholar
Djeridane, B., Lygeros, J.: Neural approximation of PDE solutions: An application to reachability computations. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 3034–3039 (2006). https://doi.org/10.1109/CDC.2006.377184
Dockhorn, T.: A discussion on solving partial differential equations using neural networks. (2019). arXiv preprint arXiv:1904.07200
Dolgov, S., Kalise, D., Kunisch, K.: A tensor decomposition approach for high-dimensional Hamilton-Jacobi-Bellman equations. (2019). arXiv preprint arXiv:1908.01533
Dower, P.M., McEneaney, W.M., Zhang, H.: Max-plus fundamental solution semigroups for optimal control problems. In: 2015 Proceedings of the Conference on Control and its Applications, pp. 368–375. SIAM (2015)
Elliott, R.J.: Viscosity solutions and optimal control, Pitman research notes in mathematics series, vol. 165. Longman Scientific & Technical, Harlow; Wiley, New York (1987)
Evans, L.C.: Partial differential equations, Graduate Studies in Mathematics, vol. 19, second edn. American Mathematical Society, Providence, RI (2010). https://doi.org/10.1090/gsm/019
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Textbooks in Mathematics, revised edn. CRC Press, Boca Raton (2015)
Book MATH Google Scholar
Evans, L.C., Souganidis, P.E.: Differential games and representation formulas for solutions of Hamilton–Jacobi–Isaacs equations. Indiana Univ. Math. J. 33(5), 773–797 (1984)
Article MathSciNet MATH Google Scholar
Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Large-scale fpga-based convolutional networks. In: Bekkerman, R., Bilenko, M., Langford, J. (eds.) Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Google Scholar
Farabet, C., poulet, C., Han, J., LeCun, Y.: CNP: An FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications. IEEE, Prague (2009)
Farabet, C., Poulet, C., LeCun, Y.: An FPGA-based stream processor for embedded real-time vision with convolutional networks. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 878–885. IEEE Computer Society, Los Alamitos, CA, USA (2009). https://doi.org/10.1109/ICCVW.2009.5457611
Farimani, A.B., Gomes, J., Pande, V.S.: Deep Learning the Physics of Transport Phenomena. arXiv e-prints (2017)
Fleming, W., McEneaney, W.: A max-plus-based algorithm for a Hamilton–Jacobi–Bellman equation of nonlinear filtering. SIAM J. Control Optim. 38(3), 683–710 (2000). https://doi.org/10.1137/S0363012998332433
Article MathSciNet MATH Google Scholar
Fleming, W.H., Rishel, R.W.: Deterministic and stochastic optimal control. Bull. Am. Math. Soc. 82, 869–870 (1976)
Article MathSciNet Google Scholar
Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions, vol. 25. Springer, New York (2006)
MATH Google Scholar
Folland, G.B.: Real Analysis: Modern Techniques and Their Spplications. Wiley, Hoboken (2013)
Google Scholar
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia-Pacific Financ. Mark. 26(3), 391–408 (2019). https://doi.org/10.1007/s10690-019-09271-7
Article MATH Google Scholar
Garcke, J., Kröner, A.: Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids. J. Sci. Comput. 70(1), 1–28 (2017)
Article MathSciNet MATH Google Scholar
Gaubert, S., McEneaney, W., Qu, Z.: Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 1054–1061. IEEE (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, New York (2016)
MATH Google Scholar
Grohs, P., Jentzen, A., Salimova, D.: Deep neural network approximations for Monte Carlo algorithms. (2019). arXiv preprint arXiv:1908.10828
Grüne, L.: Overcoming the curse of dimensionality for approximating lyapunov functions with deep neural networks under a small-gain condition. (2020). arXiv preprint arXiv:2001.08423
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018). https://doi.org/10.1073/pnas.1718942115
Han, J., Zhang, L., E, W.: Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 108929 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, New York (1993)
Book MATH Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods, vol. 306. Springer, New York (1993)
Book MATH Google Scholar
Hirjibehedin, C.: Evolution of circuits for machine learning. Nature 577, 320–321 (2020). https://doi.org/10.1038/d41586-020-00002-x
Article Google Scholar
Hopf, E.: Generalized solutions of non-linear equations of first order. J. Math. Mech. 14, 951–973 (1965)
MathSciNet MATH Google Scholar
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
Article MathSciNet Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Article MATH Google Scholar
Horowitz, M.B., Damle, A., Burdick, J.W.: Linear Hamilton Jacobi Bellman equations in high dimensions. In: 53rd IEEE Conference on Decision and Control, pp. 5880–5887. IEEE (2014)
Hsieh, J.T., Zhao, S., Eismann, S., Mirabella, L., Ermon, S.: Learning neural PDE solvers with convergence guarantees. In: International Conference on Learning Representations (2019)
Hu, C., Shu, C.: A discontinuous Galerkin finite element method for Hamilton–Jacobi equations. SIAM J. Sci. Comput. 21(2), 666–690 (1999). https://doi.org/10.1137/S1064827598337282
Article MathSciNet MATH Google Scholar
Huré, C., Pham, H., Bachouch, A., Langrené, N.: Deep neural networks algorithms for stochastic control problems on finite horizon, part I: convergence analysis. (2018). arXiv preprint arXiv:1812.04300
Huré, C., Pham, H., Warin, X.: Some machine learning schemes for high-dimensional nonlinear PDEs. (2019). arXiv preprint arXiv:1902.01599
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1(10), (2020)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (2018)
Hutzenthaler, M., Jentzen, A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks (2019)
Hutzenthaler, M., Kruse, T.: Multilevel picard approximations of high-dimensional semilinear parabolic differential equations with gradient-dependent nonlinearities. SIAM J. Numer. Anal. 58(2), 929–961 (2020). https://doi.org/10.1137/17M1157015
Article MathSciNet Google Scholar
Ishii, H.: Representation of solutions of Hamilton–Jacobi equations. Nonlinear Anal. Theory, Methods Appl. 12(2), 121–146 (1988). https://doi.org/10.1016/0362-546X(88)90030-2
Article MathSciNet MATH Google Scholar
Jiang, F., Chou, G., Chen, M., Tomlin, C.J.: Using neural networks to compute approximate and guaranteed feasible Hamilton–Jacobi–Bellman PDE solutions. (2016). arXiv preprint arXiv:1611.03158
Jiang, G., Peng, D.: Weighted ENO schemes for Hamilton–Jacobi equations. SIAM J. Sci. Comput. 21(6), 2126–2143 (2000). https://doi.org/10.1137/S106482759732455X
Article MathSciNet MATH Google Scholar
Jianyu, L., Siwei, L., Yingjian, Q., Yaping, H.: Numerical solution of elliptic partial differential equation using radial basis function neural networks. Neural Netw. 16(5–6), 729–734 (2003)
Article Google Scholar
Jin, S., Xin, Z.: Numerical passage from systems of conservation laws to Hamilton–Jacobi equations, and relaxation schemes. SIAM J. Numer. Anal. 35(6), 2385–2404 (1998). https://doi.org/10.1137/S0036142996314366
Article MathSciNet MATH Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, pp. 1–12. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3079856.3080246
Kalise, D., Kundu, S., Kunisch, K.: Robust feedback control of nonlinear PDEs by numerical approximation of high-dimensional Hamilton–Jacobi–Isaacs equations. (2019). arXiv preprint arXiv:1905.06276
Kalise, D., Kunisch, K.: Polynomial approximation of high-dimensional Hamilton–Jacobi–Bellman equations and applications to feedback control of semilinear parabolic PDEs. SIAM J. Sci. Comput. 40(2), A629–A652 (2018)
Article MathSciNet MATH Google Scholar
Kang, W., Wilcox, L.C.: Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and HJB equations. Comput. Optim. Appl. 68(2), 289–315 (2017)
Article MathSciNet MATH Google Scholar
Karlsen, K., Risebro, H.: A note on front tracking and the equivalence between viscosity solutions of Hamilton–Jacobi equations and entropy solutions of scalar conservation laws. Nonlinear Anal. (2002). https://doi.org/10.1016/S0362-546X(01)00753-2
Article MathSciNet MATH Google Scholar
Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. (2017). arXiv preprint arXiv:1707.03351
Khoo, Y., Lu, J., Ying, L.: Solving for high-dimensional committor functions using artificial neural networks. Res. Math. Sci. 6(1), 1 (2019)
Article MathSciNet MATH Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015) (2015)
Kružkov, S.N.: Generalized solutions of nonlinear first order equations with several independent variables II. Math. USSR-Sbornik 1(1), 93–116 (1967). https://doi.org/10.1070/sm1967v001n01abeh001969
Article Google Scholar
Kundu, A., Srinivasan, S., Qin, E.C., Kalamkar, D., Mellempudi, N.K., Das, D., Banerjee, K., Kaul, B., Dubey, P.: K-tanh: Hardware efficient activations for deep learning (2019)
Kunisch, K., Volkwein, S., Xie, L.: HJB-POD-based feedback design for the optimal control of evolution problems. SIAM J. Appl. Dyn. Syst. 3(4), 701–722 (2004)
Article MathSciNet MATH Google Scholar
Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998). https://doi.org/10.1109/72.712178
Article Google Scholar
Lagaris, I.E., Likas, A.C., Papageorgiou, D.G.: Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 11(5), 1041–1049 (2000). https://doi.org/10.1109/72.870037
Article Google Scholar
Lambrianides, P., Gong, Q., Venturi, D.: A new scalable algorithm for computational optimal control under uncertainty. (2019). arXiv preprint arXiv:1909.07960
Landau, L., Lifschic, E.: Course of theoretical physics. vol. 1: Mechanics. Oxford, (1978)
LeCun, Y.: 1.1 deep learning hardware: Past, present, and future. In: 2019 IEEE International Solid-State Circuits Conference—(ISSCC), pp. 12–19 (2019). https://doi.org/10.1109/ISSCC.2019.8662396
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Lee, H., Kang, I.S.: Neural algorithm for solving differential equations. J. Comput. Phys. 91(1), 110–131 (1990)
Article MathSciNet MATH Google Scholar
Lions, P.L., Rochet, J.C.: Hopf formula and multitime Hamilton–Jacobi equations. Proc. Am. Math. Soc. 96(1), 79–84 (1986)
Article MathSciNet MATH Google Scholar
Lions, P.L., Souganidis, P.E.: Convergence of MUSCL and filtered schemes for scalar conservation laws and Hamilton–Jacobi equations. Numerische Mathematik 69(4), 441–470 (1995). https://doi.org/10.1007/s002110050102
Article MathSciNet MATH Google Scholar
Long, Z., Lu, Y., Dong, B.: PDE-net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019). https://doi.org/10.1016/j.jcp.2019.108925
Article MathSciNet Google Scholar
Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-net: Learning PDEs from data. (2017). arXiv preprint arXiv:1710.09668
Lye, K.O., Mishra, S., Ray, D.: Deep learning observables in computational fluid dynamics. (2019). arXiv preprint arXiv:1903.03040
McEneaney, W.: Max-Plus Methods for Nonlinear Control and Estimation. Springer, New York (2006)
MATH Google Scholar
McEneaney, W.: A curse-of-dimensionality-free numerical method for solution of certain HJB PDEs. SIAM J. Control Optim. 46(4), 1239–1276 (2007). https://doi.org/10.1137/040610830
Article MathSciNet MATH Google Scholar
McEneaney, W.M., Deshpande, A., Gaubert, S.: Curse-of-complexity attenuation in the curse-of-dimensionality-free method for HJB PDEs. In: 2008 American Control Conference, pp. 4684–4690. IEEE (2008)
McEneaney, W.M., Kluberg, L.J.: Convergence rate for a curse-of-dimensionality-free method for a class of HJB PDEs. SIAM J. Control Optim. 48(5), 3052–3079 (2009)
Article MathSciNet MATH Google Scholar
McFall, K.S., Mahan, J.R.: Artificial neural network method for solution of boundary value problems with exact satisfaction of arbitrary boundary conditions. IEEE Trans. Neural Netw. 20(8), 1221–1233 (2009). https://doi.org/10.1109/TNN.2009.2020735
Article Google Scholar
Meade, A., Fernandez, A.: The numerical solution of linear ordinary differential equations by feedforward neural networks. Math. Comput. Modell. 19(12), 1–25 (1994). https://doi.org/10.1016/0895-7177(94)90095-7
Article MathSciNet MATH Google Scholar
Meng, X., Karniadakis, G.E.: A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. (2019). arXiv preprint arXiv:1903.00104
Meng, X., Li, Z., Zhang, D., Karniadakis, G.E.: PPINN: Parareal physics-informed neural network for time-dependent PDEs. (2019). arXiv preprint arXiv:1909.10145
van Milligen, B.P., Tribaldos, V., Jiménez, J.A.: Neural network differential equation and plasma equilibrium solver. Phys. Rev. Lett. 75, 3594–3597 (1995). https://doi.org/10.1103/PhysRevLett.75.3594
Article Google Scholar
Motta, M., Rampazzo, F.: Nonsmooth multi-time Hamilton–Jacobi systems. Indiana Univ. Math. J. 55(5), 1573–1614 (2006)
Article MathSciNet MATH Google Scholar
Niarchos, K.N., Lygeros, J.: A neural approximation to continuous time reachability computations. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 6313–6318 (2006). https://doi.org/10.1109/CDC.2006.377358
Osher, S., Shu, C.: High-order essentially nonoscillatory schemes for Hamilton–Jacobi equations. SIAM J. Numer. Anal. 28(4), 907–922 (1991). https://doi.org/10.1137/0728049
Article MathSciNet MATH Google Scholar
Pang, G., Lu, L., Karniadakis, G.E.: fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 41(4), A2603–A2626 (2019)
Article MathSciNet MATH Google Scholar
Pham, H., Pham, H., Warin, X.: Neural networks-based backward scheme for fully nonlinear PDEs. (2019). arXiv preprint arXiv:1908.00412
Pinkus, A.: Approximation theory of the MLP model in neural networks. In: Acta numerica, 1999, Acta Numer., vol. 8, pp. 143–195. Cambridge University Press, Cambridge (1999)
Plaskacz, S., Quincampoix, M.: Oleinik–Lax formulas and multitime Hamilton–Jacobi systems. Nonlinear Anal. Theory, Methods Appl. 51(6), 957–967 (2002). https://doi.org/10.1016/S0362-546X(01)00871-9
Article MathSciNet MATH Google Scholar
Raissi, M.: Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(1), 932–955 (2018)
MathSciNet MATH Google Scholar
Raissi, M.: Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations. (2018). arXiv preprint arXiv:1804.07010
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Article MathSciNet MATH Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. (2017). arXiv preprint arXiv:1711.10561
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part ii): Data-driven discovery of nonlinear partial differential equations. (2017). arXiv preprint arXiv:1711.10566
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. (2019). arXiv preprint arXiv:1903.06652
Rochet, J.: The taxation principle and multi-time Hamilton–Jacobi equations. J. Math. Econ. 14(2), 113–128 (1985). https://doi.org/10.1016/0304-4068(85)90015-1
Article MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Royo, V.R., Tomlin, C.: Recursive regression with neural networks: Approximating the HJI PDE solution. (2016). arXiv preprint arXiv:1611.02739
Rudd, K., Muro, G.D., Ferrari, S.: A constrained backpropagation approach for the adaptive solution of partial differential equations. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 571–584 (2014). https://doi.org/10.1109/TNNLS.2013.2277601
Article Google Scholar
Ruthotto, L., Osher, S., Li, W., Nurbekyan, L., Fung, S.W.: A machine learning framework for solving high-dimensional mean field game and mean field control problems. (2019). arXiv preprint arXiv:1912.01825
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). https://doi.org/10.1016/j.neunet.2014.09.003
Article Google Scholar
Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018). https://doi.org/10.1016/j.jcp.2018.08.029
Article MathSciNet MATH Google Scholar
Tang, W., Shan, T., Dang, X., Li, M., Yang, F., Xu, S., Wu, J.: Study on a Poisson’s equation solver based on deep learning technique. In: 2017 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS), pp. 1–3 (2017). https://doi.org/10.1109/EDAPS.2017.8277017
Tassa, Y., Erez, T.: Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Trans. Neural Netw. 18(4), 1031–1041 (2007). https://doi.org/10.1109/TNN.2007.899249
Article Google Scholar
Tho, N.: Hopf-Lax-Oleinik type formula for multi-time Hamilton–Jacobi equations. Acta Math. Vietnamica 30, 275–287 (2005)
MathSciNet MATH Google Scholar
Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. 106(28), 11478–11483 (2009)
Article MATH Google Scholar
Uchiyama, T., Sonehara, N.: Solving inverse problems in nonlinear PDEs by recurrent neural networks. In: IEEE International Conference on Neural Networks, pp. 99–102. IEEE (1993)
E, W., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Multilevel picard iterations for solving smooth semilinear parabolic heat equations (2016)
Widder, D.V.: The Heat Equation, vol. 67. Academic Press, New York (1976)
Google Scholar
Yadav, N., Yadav, A., Kumar, M.: An introduction to neural network methods for differential equations. SpringerBriefs in Applied Sciences and Technology. Springer, Dordrecht (2015). https://doi.org/10.1007/978-94-017-9816-7
Yang, L., Zhang, D., Karniadakis, G.E.: Physics-informed generative adversarial networks for stochastic differential equations. (2018). arXiv preprint arXiv:1811.02033
Yang, Y., Perdikaris, P.: Adversarial uncertainty quantification in physics-informed neural networks. J. Comput. Phys. 394, 136–152 (2019)
Article MathSciNet Google Scholar
Yegorov, I., Dower, P.M.: Perspectives on characteristics based curse-of-dimensionality-free numerical approaches for solving Hamilton–Jacobi equations. Appl. Math. Optim. 1–49 (2017)
Zhang, D., Guo, L., Karniadakis, G.E.: Learning in modal space: solving time-dependent stochastic PDEs using physics-informed neural networks. (2019). arXiv preprint arXiv:1905.01205
Zhang, D., Lu, L., Guo, L., Karniadakis, G.E.: Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems. J. Comput. Phys. 397, 108850 (2019)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Division of Applied Mathematics, Brown University, Providence, USA
Jérôme Darbon, Gabriel P. Langlois & Tingwei Meng

Authors

Jérôme Darbon
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel P. Langlois
View author publications
You can also search for this author in PubMed Google Scholar
Tingwei Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jérôme Darbon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research supported by NSF DMS 1820821. Authors’ names are given in last/family name alphabetical order.

Appendices

A Proofs of lemmas in Section 3.1

1.1 A.1 Proof of Lemma 3.1

Proof of (i): The convex and lower semicontinuous function $J^*$ satisfies Eq. (12) by [68, Prop. X.3.4.1]. It is also finite and continuous over its polytopal domain ${\mathrm {dom}~}J^* = {\mathrm {conv}~}{\left( \{\textit{\textbf{p}}_i\}_{i=1}^{m}\right) }$ [133, Thms. 10.2 and 20.5], and moreover, the subdifferential $\partial J^*(\textit{\textbf{p}})$ is non-empty by [133, Thm. 23.10].

Proof of (ii): First, suppose the vector $(\alpha _1, \dots , \alpha _m)\in \mathbb {R}^m$ satisfies the constraints (a)–(c). Since $\textit{\textbf{x}}\in \partial J^*(\textit{\textbf{p}})$, there holds $J^*(\textit{\textbf{p}}) = \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - J(\textit{\textbf{x}})$ [68, Cor. X.1.4.4], and using the definition of the set $I_{\textit{\textbf{x}}}$ (11) and constraints (a)–(c) we deduce that

$$\begin{aligned} \begin{aligned} J^*(\textit{\textbf{p}})&= \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - J(\textit{\textbf{x}}) = \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - \sum _{i\in I_{\textit{\textbf{x}}}} \alpha _i J(\textit{\textbf{x}})\\&= \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - \sum _{i\in I_{\textit{\textbf{x}}}} \alpha _i (\langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle - \gamma _i)\\&= \left\langle \textit{\textbf{p}}- \sum _{i\in I_{\textit{\textbf{x}}}} \alpha _i\textit{\textbf{p}}_i, \textit{\textbf{x}}\right\rangle + \sum _{i\in I_{\textit{\textbf{x}}}} \alpha _i \gamma _i = \sum _{i=1}^m \alpha _i\gamma _i. \end{aligned} \end{aligned}$$

Therefore, $(\alpha _1,\dots , \alpha _m)$ is a minimizer in Eq. (12). Second, let $(\alpha _1,\dots , \alpha _m)$ be a minimizer in Eq. (12). Then, (a)–(b) follow directly from the constraints in Eq. (12). A similar argument as above yields

$$\begin{aligned} \begin{aligned} J(\textit{\textbf{x}})&= \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - J^*(\textit{\textbf{p}}) = \left\langle \sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i, \textit{\textbf{x}}\right\rangle - \sum _{i=1}^m \alpha _i\gamma _i = \sum _{i=1}^m \alpha _i\left( \langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle - \gamma _i\right) . \end{aligned} \end{aligned}$$

But $J(\textit{\textbf{x}})= \max _{i\in \{1,\dots ,m\}}\{\left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}\right\rangle -\gamma _{i}\}$ by definition, and so there holds $\alpha _i = 0$ whenever $J(\textit{\textbf{x}})\ne \langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle - \gamma _i$. In other words, $\alpha _i=0$ whenever $i\not \in I_{\textit{\textbf{x}}}$.

Proof of (iii): Let $(\beta _1,\dots , \beta _m) \in \varLambda _m$ satisfy $\sum _{i=1}^m\beta _i \textit{\textbf{p}}_i = \textit{\textbf{p}}_k$. By assumption (A2), we have $\gamma _k=g(\textit{\textbf{p}}_k)$ with g convex, and hence, Jensen’s inequality yields

$$\begin{aligned} \sum _{i=1}^m\delta _{ik}\gamma _i = \gamma _k = g(\textit{\textbf{p}}_k) = g\left( \sum _{i=1}^m \beta _i \textit{\textbf{p}}_i\right) \leqslant \sum _{i=1}^m \beta _i g(\textit{\textbf{p}}_i) = \sum _{i=1}^m \beta _i \gamma _i. \end{aligned}$$

Therefore, the vector $(\delta _{1k},\dots ,\delta _{mk})$ is a minimizer in Eq. (12) at the point $\textit{\textbf{p}}_k$, and $J^*(\textit{\textbf{p}}_k) = \gamma _k$ follows.

1.2 A.2 Proof of Lemma 3.2

Proof of (i): Let $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$. The set $\mathcal {A}(\textit{\textbf{p}}) \subseteq \varLambda _m$ is non-empty and bounded by Lemma 3.1(i), and it is closed since $\mathcal {A}(\textit{\textbf{p}})$ is the solution set to the linear programming problem (12). Hence, $\mathcal {A}(\textit{\textbf{p}})$ is compact. As a result, we immediately have that $H(\textit{\textbf{p}})<+\infty $. Moreover, for each $(\alpha _1,\dots ,\alpha _m)\in \mathcal {A}(\textit{\textbf{p}})$ there holds

$$\begin{aligned} -\infty< \min _{i=\{1,\dots , m\}} \theta _i \leqslant \sum _{i=1}^m \alpha _i \theta _i \leqslant \max _{i=\{1,\dots , m\}} \theta _i < +\infty \end{aligned}$$

from which we conclude that H is a bounded function on ${\mathrm {dom}~}J^*$. Since the target function in the minimization problem (14) is continuous, existence of a minimizer follows by compactness of $\mathcal {A}(\textit{\textbf{p}})$.

Proof of (ii): We have already shown in the proof of (i) that the restriction of H to ${\mathrm {dom}~}J^*$ is bounded, and so it remains to prove its continuity. For any $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$, we have that $(\alpha _1,\dots ,\alpha _m)\in \mathcal {A}(\textit{\textbf{p}})$ if and only if $(\alpha _1,\dots , \alpha _m)\in \varLambda _m$, $\sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}$, and $\sum _{i=1}^m \alpha _i\gamma _i = J^*(\textit{\textbf{p}})$. As a result, we have

$$\begin{aligned} H(\textit{\textbf{p}}) = \min \left\{ \sum _{i=1}^m\alpha _i \theta _i:\ (\alpha _1,\dots ,\alpha _m)\in \varLambda _m, \ \sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}, \ \sum _{i=1}^m \alpha _i\gamma _i = J^*(\textit{\textbf{p}}) \right\} . \end{aligned}$$

(32)

Define the function $h:\ \mathbb {R}^{n+1}\rightarrow \mathbb {R}\cup \{+\infty \}$ by

$$\begin{aligned} \begin{aligned} h(\textit{\textbf{p}},r) {:}{=}\min \left\{ \sum _{i=1}^m\alpha _i \theta _i: (\alpha _1,\dots ,\alpha _m)\in \varLambda _m, \ \sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}, \ \sum _{i=1}^m \alpha _i\gamma _i = r \right\} , \end{aligned} \end{aligned}$$

(33)

for any $\textit{\textbf{p}}\in \mathbb {R}^n$ and $r\in \mathbb {R}$. Using the same argument as in the proof of Lemma 3.1(i), we conclude that h is a convex lower semicontinuous function, and in fact continuous over its domain ${\mathrm {dom}~}h = {\mathrm {conv}~}{\{(\textit{\textbf{p}}_i, \gamma _i)\}_{i=1}^m}$. Comparing Eq. (32) and the definition of h in (33), we deduce that $H(\textit{\textbf{p}}) = h(\textit{\textbf{p}}, J^*(\textit{\textbf{p}}))$ for any $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$. Continuity of H in ${\mathrm {dom}~}J^*$ then follows from the continuity of h and $J^*$ in their own domains.

Proof of (iii): Let $k\in \{1,\dots ,m\}$. On the one hand, Lemma 3.1(iii) implies $(\delta _{1k},\dots , \delta _{mk})\in \mathcal {A}(\textit{\textbf{p}}_k)$, so that

$$\begin{aligned} H(\textit{\textbf{p}}_k)\leqslant \sum _{i=1}^m \delta _{ik}\theta _i = \theta _k. \end{aligned}$$

(34)

On the other hand, let $(\alpha _1,\dots , \alpha _m)\in \mathcal {A}(\textit{\textbf{p}}_k)$ be a vector different from $(\delta _{k1},\dots , \delta _{km})$. Then, $(\alpha _1,\dots , \alpha _m) \in \varLambda _m$ satisfies $\sum _{i=1}^m \alpha _i\textit{\textbf{p}}_i = \textit{\textbf{p}}$, $\sum _{i=1}^m \alpha _i\gamma _i = J^*(\textit{\textbf{p}})$, and $\alpha _k<1$. Define $(\beta _1,\dots , \beta _m) \in \varLambda _m$ by

A straightforward computation using the properties of $(\alpha _1,\dots , \alpha _m)$, Lemma 3.1(iii), and the definition of $(\beta _1,\dots , \beta _m)$ yields

$$\begin{aligned} {\left\{ \begin{array}{ll} \begin{aligned} &{}(\beta _1,\dots , \beta _m)\in \varLambda _m \quad \text { with }\beta _k = 0,\\ &{}\sum _{i\ne k}\beta _i \textit{\textbf{p}}_i = \sum _{i\ne k}\frac{\alpha _i \textit{\textbf{p}}_i}{1-\alpha _k} = \frac{\textit{\textbf{p}}_k - \alpha _k\textit{\textbf{p}}_k}{1-\alpha _k} = \textit{\textbf{p}}_k,\\ &{}\sum _{i\ne k}\beta _i \gamma _i = \sum _{i\ne k}\frac{\alpha _i \gamma _i}{1-\alpha _k} = \frac{J^*(\textit{\textbf{p}}_k) - \alpha _k\gamma _k}{1-\alpha _k}= \frac{\gamma _k - \alpha _k\gamma _k}{1-\alpha _k} =\gamma _k. \end{aligned} \end{array}\right. } \end{aligned}$$

In other words, Eq. (9) holds at index k, which, by assumption (A3), implies that $\sum _{i\ne k}\beta _i \theta _i > \theta _k$. As a result, we have

$$\begin{aligned} \sum _{i=1}^m\alpha _i \theta _i = \alpha _k\theta _k + (1-\alpha _k)\sum _{i\ne k}\beta _i \theta _i > \alpha _k\theta _k + (1-\alpha _k)\theta _k = \theta _k = \sum _{i=1}^m \delta _{ik}\theta _i. \end{aligned}$$

Taken together with Eq. (34), we conclude that $(\delta _{1k},\dots , \delta _{mk})$ is the unique minimizer in (14), and hence, we obtain $H(\textit{\textbf{p}}_k) = \theta _k$.

B Proof of Theorem 3.1

To prove this theorem, we will use three lemmas whose statements and proofs are given in Sect. B.1, B.2, and B.3, respectively. The proof of Theorem 3.1 is given in Sect. B.4.

1.1 B.1 Statement and proof of Lemma B.1

Lemma B.1

Suppose the parameters $\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}$ satisfy assumptions (A1)-(A3). Let J and H be the functions defined in Eqs. (10) and (14), respectively. Let $\tilde{H}:\mathbb {R}^n\rightarrow \mathbb {R}$ be a continuous function satisfying $\tilde{H}(\textit{\textbf{p}}_i) = H(\textit{\textbf{p}}_i)$ for each $i\in \{1,\dots ,m\}$ and $\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})$ for all $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$. Then, the neural network f defined in Eq. (8) satisfies

$$\begin{aligned} f(\textit{\textbf{x}},t) {:}{=}\max _{i\in \{1,\dots ,m\}}\{\left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}\right\rangle -t\theta _{i}-\gamma _{i}\} = \sup _{\textit{\textbf{p}}\in {\mathrm {dom}~}J^*}\left\{ \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})\right\} . \end{aligned}$$

(35)

Proof

Let $\textit{\textbf{x}}\in \mathbb {R}^n$ and $t\geqslant 0$. Since $\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})$ for every $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$, we get

$$\begin{aligned} \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}}) \leqslant \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - tH(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}}). \end{aligned}$$

(36)

Let $(\alpha _1,\dots , \alpha _m)$ be a minimizer in (14). By Eqs. (12), (13), and (14), we have

$$\begin{aligned} \textit{\textbf{p}}= \sum _{i=1}^{m}\alpha _i\textit{\textbf{p}}_i,\quad H(\textit{\textbf{p}}) = \sum _{i=1}^{m}\alpha _i\theta _i, \quad \text { and } \quad J^*(\textit{\textbf{p}})=\sum _{i=1}^{m}\alpha _i\gamma _i. \end{aligned}$$

(37)

Combining Eqs. (36), (37), and (8), we get

$$\begin{aligned} \begin{aligned} \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})&\leqslant \sum _{i=1}^m \alpha _i (\langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle -t \theta _i - \gamma _i)\\&\leqslant \max _{i\in \{1,\dots ,m\}} \{\langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle -t \theta _i - \gamma _i\} = f(\textit{\textbf{x}},t), \end{aligned} \end{aligned}$$

where the second inequality follows from the constraint $(\alpha _1,\dots ,\alpha _m)\in \varLambda _m$. Since $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$ is arbitrary, we obtain

$$\begin{aligned} \sup _{\textit{\textbf{p}}\in {\mathrm {dom}~}J^*}\left\{ \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})\right\} \leqslant f(\textit{\textbf{x}},t). \end{aligned}$$

(38)

Now, by Lemmas 3.1(iii), 3.2(iii), and the assumptions on $\tilde{H}$, we have

$$\begin{aligned} \tilde{H}(\textit{\textbf{p}}_k)= H(\textit{\textbf{p}}_k) = \theta _k \quad \text { and } \quad J^*(\textit{\textbf{p}}_k) = \gamma _k, \end{aligned}$$

for each $k\in \{1,\dots , m\}$. A straightforward computation yields

$$\begin{aligned} \begin{aligned} f(\textit{\textbf{x}},t)&= \max _{i\in \{1,\dots ,m\}} \{\langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle -t \theta _i - \gamma _i\} \\&= \max _{i\in \{1,\dots ,m\}} \left\{ \langle \textit{\textbf{p}}_i, \textit{\textbf{x}}\rangle -t \tilde{H}(\textit{\textbf{p}}_i) - J^*(\textit{\textbf{p}}_i)\right\} \\&\leqslant \sup _{\textit{\textbf{p}}\in {\mathrm {dom}~}J^*} \left\{ \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle -t \tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})\right\} , \end{aligned} \end{aligned}$$

(39)

where the inequality holds since $\textit{\textbf{p}}_i\in {\mathrm {dom}~}J^*$ for every $i\in \{1,\dots ,m\}$. The conclusion then follows from Eqs. (38) and (39). $\square $

1.2 B.2 Statement and proof of Lemma B.2

Lemma B.2

Suppose the parameters $\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}$ satisfy assumptions (A1)-(A3). For every $k\in \{1,\dots , m\}$, there exist $\textit{\textbf{x}}\in \mathbb {R}^n$ and $t>0$ such that $f(\cdot , t)$ is differentiable at $\textit{\textbf{x}}$ and $\nabla _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \textit{\textbf{p}}_k$.

Proof

Since f is the supremum of a finite number of affine functions by definition (8), it is finite-valued and convex for $t\geqslant 0$. As a result, $\nabla _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \textit{\textbf{p}}_k$ is equivalent to $\partial (f(\cdot , t))(\textit{\textbf{x}}) = \{\textit{\textbf{p}}_k\}$, and so it suffices to prove that $\partial (f(\cdot , t))(\textit{\textbf{x}}) = \{\textit{\textbf{p}}_k\}$ for some $\textit{\textbf{x}}\in \mathbb {R}^n$ and $t>0$. To simplify the notation, we use $\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)$ to denote the subdifferential of $f(\cdot , t)$ at $\textit{\textbf{x}}$.

By [67, Thm. VI.4.4.2], the subdifferential of $f(\cdot , t)$ at $\textit{\textbf{x}}$ is the convex hull of the $\textit{\textbf{p}}_i$’s whose indices i’s are maximizers in (8), that is,

$$\begin{aligned} \partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = {{\mathrm {co}}~}\{\textit{\textbf{p}}_i: i \text { is a maximizer in (8)}\}. \end{aligned}$$

It suffices then to prove the existence of $\textit{\textbf{x}}\in \mathbb {R}^n$ and $t>0$ such that

$$\begin{aligned} \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i \quad \text { for every } i\ne k. \end{aligned}$$

(40)

First, consider the case when there exists $\textit{\textbf{x}}\in \mathbb {R}^n$ such that $\langle \textit{\textbf{p}}_k, \textit{\textbf{x}}\rangle -\gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -\gamma _i$ for every $i\ne k$. In that case, by continuity, there exists small $t>0$ such that $\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i$ for every $i\ne k$ and so (40) holds.

Now, consider the case when there does not exist $\textit{\textbf{x}}\in \mathbb {R}^n$ such that $\langle \textit{\textbf{p}}_k, \textit{\textbf{x}}\rangle -\gamma _k > \max _{i\ne k}\{\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -\gamma _i\}$. In other words, we assume

$$\begin{aligned} J(\textit{\textbf{x}}) = \max _{i\ne k} \{\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle - \gamma _i\}\quad \text { for every }\textit{\textbf{x}}\in \mathbb {R}^n. \end{aligned}$$

(41)

We apply Lemma 3.1(i) to the formula above and obtain

$$\begin{aligned} J^*(\textit{\textbf{p}}_k) = \min \left\{ \sum _{i=1}^{m}\alpha _{i}\gamma _{i}: (\alpha _{1},\dots ,\alpha _{m})\in \varLambda _m,\ \sum _{i=1}^{m}\alpha _{i}\textit{\textbf{p}}_{i}=\textit{\textbf{p}}_k,\ \alpha _k = 0\right\} . \end{aligned}$$

(42)

Let $\textit{\textbf{x}}_0\in \partial J^*(\textit{\textbf{p}}_k)$. Denote by $I_{\textit{\textbf{x}}_0}$ the set of maximizers in Eq. (41) at the point $\textit{\textbf{x}}_0$, i.e.,

$$\begin{aligned} I_{\textit{\textbf{x}}_0}:= {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\ne k}} \{\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle - \gamma _i\}. \end{aligned}$$

(43)

Note that we have $k\not \in I_{\textit{\textbf{x}}_0}$ by definition of $I_{\textit{\textbf{x}}_0}$. Define a function $h:\mathbb {R}^n\rightarrow \mathbb {R}\cup \{+\infty \}$ by

$$\begin{aligned} h(\textit{\textbf{p}}) {:}{=}{\left\{ \begin{array}{ll} \theta _i &{}\quad \text {if }\;\;\textit{\textbf{p}}= \textit{\textbf{p}}_i \text { and }i\in I_{\textit{\textbf{x}}_0},\\ +\infty &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

(44)

Denote the convex lower semicontinuous envelope of h by $\overline{{\mathrm {co}}}~h$. Since $\textit{\textbf{x}}_0\in \partial J^*(\textit{\textbf{p}}_k)$, we can use [67, Thm. VI.4.4.2] and the definition of $I_{\textit{\textbf{x}}_0}$ and h in Eqs. (43) and (44) to deduce

$$\begin{aligned} \textit{\textbf{p}}_k \in \partial J(\textit{\textbf{x}}_0) = {{\mathrm {co}}~}\{\textit{\textbf{p}}_i: i\in I_{\textit{\textbf{x}}_0}\} = {\mathrm {dom}~}\overline{{\mathrm {co}}}~h. \end{aligned}$$

(45)

Hence, the point $\textit{\textbf{p}}_k$ is in the domain of the polytopal convex function $\overline{{\mathrm {co}}}~h$. Then, [133, Thm. 23.10] implies $\partial (\overline{{\mathrm {co}}}~h) (\textit{\textbf{p}}_k)\ne \emptyset $. Let $\textit{\textbf{v}}_0 \in \partial (\overline{{\mathrm {co}}}~h)(\textit{\textbf{p}}_k)$ and $\textit{\textbf{x}}=\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0$. It remains to choose suitable positive t such that (40) holds. Letting $\textit{\textbf{x}}=\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0$ in (40) yields

$$\begin{aligned} \begin{aligned}&\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k - \left( \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\right) \\&\quad = \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0\rangle - t\theta _k -\gamma _k - (\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0+t\textit{\textbf{v}}_0\rangle -t\theta _i- \gamma _i)\\&\quad = \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle -\gamma _k - (\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0\rangle - \gamma _i) + t(\theta _i-\theta _k - \langle \textit{\textbf{p}}_i-\textit{\textbf{p}}_k, \textit{\textbf{v}}_0\rangle ). \end{aligned} \end{aligned}$$

(46)

Now, we consider two situations, the first when $i\not \in I_{\textit{\textbf{x}}_0} \cup \{k\}$ and the second when $i\in I_{\textit{\textbf{x}}_0}$. It suffices to prove (40) hold in each case for small enough positive t.

If $i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}$, then i is not a maximizer in Eq. (41) at the point $\textit{\textbf{x}}_0$. By (45), $\textit{\textbf{p}}_k$ is a convex combination of the set $\{\textit{\textbf{p}}_i: i\in I_{\textit{\textbf{x}}_0}\}$. In other words, there exists $(c_1,\dots ,c_m)\in \varLambda _m$ such that $\sum _{j=1}^mc_j\textit{\textbf{p}}_j=\textit{\textbf{p}}_k$ and $c_j=0$ whenever $j\not \in I_{\textit{\textbf{x}}_0}$. Taken together with assumption (A2) and Eqs. (10), (41), (43), we have

$$\begin{aligned} \begin{aligned} J(\textit{\textbf{x}}_0)&\geqslant \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle - \gamma _k = \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle - g(\textit{\textbf{p}}_k) = \left\langle \sum _{j\in I_{\textit{\textbf{x}}_0}} c_j\textit{\textbf{p}}_j,\textit{\textbf{x}}_0\right\rangle - g\left( \sum _{j\in I_{\textit{\textbf{x}}_0}} c_j\textit{\textbf{p}}_j\right) \\&\geqslant \sum _{j\in I_{\textit{\textbf{x}}_0}} c_j(\langle \textit{\textbf{p}}_j,\textit{\textbf{x}}_0\rangle - g(\textit{\textbf{p}}_j)) = \sum _{j\in I_{\textit{\textbf{x}}_0}} c_j J(\textit{\textbf{x}}_0) = J(\textit{\textbf{x}}_0). \end{aligned} \end{aligned}$$

Thus, the inequalities become equalities in the equation above. As a result, we have

$$\begin{aligned} \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle - \gamma _k = J(\textit{\textbf{x}}_0) > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0\rangle - \gamma _i, \end{aligned}$$

where the inequality holds because $i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}$ by assumption. This inequality implies that the constant $\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle -\gamma _k - (\langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0\rangle - \gamma _i)$ is positive, and taken together with (46), we conclude that the inequality in (40) holds for $i\not \in I_{\textit{\textbf{x}}_0}\cup \{k\}$ when t is small enough.

If $i\in I_{\textit{\textbf{x}}_0}$, then both i and k are maximizers in Eq. (10) at $\textit{\textbf{x}}_0$, and hence, we have

$$\begin{aligned} \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}_0\rangle - \gamma _k = J(\textit{\textbf{x}}_0) = \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}_0\rangle - \gamma _i. \end{aligned}$$

(47)

Together with Eq. (46) and the definition of h in Eq. (44), we obtain

$$\begin{aligned} \begin{aligned} \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k - \left( \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\right)&= 0 + t(h(\textit{\textbf{p}}_i)-\theta _k - \langle \textit{\textbf{p}}_i-\textit{\textbf{p}}_k, \textit{\textbf{v}}_0\rangle )\\&\geqslant t(\overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_i)-\theta _k - \langle \textit{\textbf{p}}_i-\textit{\textbf{p}}_k, \textit{\textbf{v}}_0\rangle ). \end{aligned} \end{aligned}$$

(48)

In addition, since $\textit{\textbf{v}}_0\in \partial (\overline{{\mathrm {co}}}~h)(\textit{\textbf{p}}_k)$, we have

$$\begin{aligned} \overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_i)\geqslant \overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k) +\langle \textit{\textbf{p}}_i-\textit{\textbf{p}}_k,\textit{\textbf{v}}_0\rangle . \end{aligned}$$

(49)

Combining Eqs. (48) and (49), we obtain

$$\begin{aligned} \langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k - \left( \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i\right) \geqslant t(\overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k)-\theta _k). \end{aligned}$$

(50)

To prove the result, it suffices to show $\overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k)>\theta _k$. As $\textit{\textbf{p}}_k \in \overline{{\mathrm {co}}}~h$ (as shown before in Eq. (45)), then according to [68, Prop. X.1.5.4] we have

$$\begin{aligned} \overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k) = \sum _{j\in I_{\textit{\textbf{x}}_0}} \alpha _j h(\textit{\textbf{p}}_j) = \sum _{j\in I_{\textit{\textbf{x}}_0}} \alpha _j \theta _j, \end{aligned}$$

(51)

for some $(\alpha _1,\dots ,\alpha _m)\in \varLambda _m$ satisfying $\textit{\textbf{p}}_k = \sum _{j=1}^m \alpha _j \textit{\textbf{p}}_j$ and $\alpha _j=0$ whenever $j\not \in I_{\textit{\textbf{x}}_0}$. Then, by Lemma 3.1(ii) $(\alpha _1,\dots ,\alpha _m)$ is a minimizer in Eq. (42), that is,

$$\begin{aligned} \gamma _k = J^*(\textit{\textbf{p}}_k) = \sum _{j=1}^m \alpha _j \gamma _j = \sum _{j\in I_{\textit{\textbf{x}}_0}} \alpha _i \gamma _i = \sum _{i\ne k}\alpha _i \gamma _i. \end{aligned}$$

Hence, Eq. (9) holds for the index k. By assumption (A3), we have $\theta _k < \sum _{j\ne k}\alpha _j \theta _j$. Taken together with the fact that $\alpha _j=0$ whenever $j\not \in I_{\textit{\textbf{x}}_0}$ and Eq. (51), we find

$$\begin{aligned} \theta _k < \sum _{j\ne k}\alpha _j \theta _j = \sum _{j\in I_{\textit{\textbf{x}}_0}} \alpha _j \theta _j= \overline{{\mathrm {co}}}~h(\textit{\textbf{p}}_k). \end{aligned}$$

(52)

Hence, the right-hand side of Eq. (50) is strictly positive, and we conclude that $\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i$ for $t>0$ if $i\in I_{\textit{\textbf{x}}_0}$.

Therefore, in this case, when $t>0$ is small enough and $\textit{\textbf{x}}$ is chosen as above, we have $\langle \textit{\textbf{p}}_k,\textit{\textbf{x}}\rangle -t\theta _k - \gamma _k > \langle \textit{\textbf{p}}_i,\textit{\textbf{x}}\rangle -t\theta _i - \gamma _i$ for every $i\ne k$, and the proof is complete. $\square $

1.3 B.3 Statement and proof of Lemma B.3

Lemma B.3

Suppose the parameters $\{(\textit{\textbf{p}}_i,\theta _i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}\times \mathbb {R}$ satisfy assumptions (A1)-(A3). Define a function $F:\ \mathbb {R}^{n+1}\rightarrow \mathbb {R}\cup \{+\infty \}$ by

$$\begin{aligned} F(\textit{\textbf{p}},E^{-}) {:}{=}{\left\{ \begin{array}{ll} J^{*}(\textit{\textbf{p}}) &{}\quad \mathrm{if } \;\; E^{-}+H(\textit{\textbf{p}})\leqslant 0,\\ +\infty &{} \quad \mathrm{otherwise,} \end{array}\right. } \end{aligned}$$

(53)

for all $\textit{\textbf{p}}\in \mathbb {R}^n$ and $E^-\in \mathbb {R}$. Then, the convex envelope of F is given by

$$\begin{aligned} {{\mathrm {co}}~}F(\textit{\textbf{p}},E^-) = \inf _{(c_1,\dots , c_m)\in C(\textit{\textbf{p}},E^-) }\sum _{i=1}^{m}c_{i}\gamma _{i}, \end{aligned}$$

(54)

where the constraint set $C(\textit{\textbf{p}},E^-)$ is defined by

$$\begin{aligned} C(\textit{\textbf{p}},E^-){:}{=}\left\{ (c_1, \dots , c_m)\in \varLambda _m:\sum _{i=1}^{m}c_{i}\textit{\textbf{p}}_{i}=\textit{\textbf{p}}, \ \sum _{i=1}^{m}c_{i}\theta _{i}\leqslant -E^- \right\} . \end{aligned}$$

Proof

First, we compute the convex hull of ${\mathrm {epi}~}F$, which we denote by ${{\mathrm {co}}~}({\mathrm {epi}~}F)$. Let $(\textit{\textbf{p}}, E^-,r)\in {{\mathrm {co}}~}({\mathrm {epi}~}F)$, where $\textit{\textbf{p}}\in \mathbb {R}^n$ and $E^-,r\in \mathbb {R}$. Then there exist $k\in \mathbb {N}$, $(\beta _1, \dots , \beta _k)\in \varLambda _k$ and $(\textit{\textbf{q}}_i, E_i^-,r_i)\in {\mathrm {epi}~}F$ for each $i\in \{1,\dots , k\}$ such that $(\textit{\textbf{p}}, E^-,r) = \sum _{i=1}^k \beta _i (\textit{\textbf{q}}_i, E_i^-,r_i)$. By definition of F in Eq. (53), $(\textit{\textbf{q}}_i, E_i^-,r_i)\in {\mathrm {epi}~}F$ holds if and only if $\textit{\textbf{q}}_i\in {\mathrm {dom}~}J^*$, $E_i^-+H(\textit{\textbf{q}}_i)\leqslant 0$ and $r_i\geqslant J^*(\textit{\textbf{q}}_i)$. In conclusion, we have

$$\begin{aligned} {\left\{ \begin{array}{ll} (\beta _1, \dots , \beta _k)\in \varLambda _k,\\ (\textit{\textbf{p}}, E^-,r) = \sum _{i=1}^k \beta _i (\textit{\textbf{q}}_i, E_i^-,r_i),\\ \textit{\textbf{q}}_1,\dots ,\textit{\textbf{q}}_k\in {\mathrm {dom}~}J^*,\\ E_i^-+H(\textit{\textbf{q}}_i)\leqslant 0\quad \text { for each }i\in \{1,\dots , k\},\\ r_i\geqslant J^*(\textit{\textbf{q}}_i) \quad \text { for each }i\in \{1,\dots , k\}. \end{array}\right. } \end{aligned}$$

(55)

For each i, since we have $\textit{\textbf{q}}_i\in {\mathrm {dom}~}J^*$, by Lemma 3.2(i) the minimization problem in (14) evaluated at $\textit{\textbf{q}}_i$ has at least one minimizer. Let $(\alpha _{i1}, \dots , \alpha _{im})$ be such a minimizer. Using Eqs. (12), (14), and $(\alpha _{i1}, \dots , \alpha _{im})\in \varLambda _m$, we have

$$\begin{aligned} \sum _{j=1}^m \alpha _{ij}(1, \textit{\textbf{p}}_j, \theta _j, \gamma _j) = (1,\textit{\textbf{q}}_i, H(\textit{\textbf{q}}_i), J^*(\textit{\textbf{q}}_i)). \end{aligned}$$

(56)

Define the real number $c_j{:}{=}\sum _{i=1}^k \beta _i \alpha _{ij}$ for any $j\in \{1,\dots ,m\}$. Combining Eqs. (55) and (56), we get that $c_j\geqslant 0$ for any j and

$$\begin{aligned} \begin{aligned}&\sum _{j=1}^m c_j(1,\textit{\textbf{p}}_j, \theta _j, \gamma _j) =\sum _{j=1}^m \sum _{i=1}^k \beta _i \alpha _{ij}(1,\textit{\textbf{p}}_j, \theta _j, \gamma _j)\\ =&\sum _{i=1}^k \beta _i \left( \sum _{j=1}^m\alpha _{ij}(1,\textit{\textbf{p}}_j, \theta _j, \gamma _j) \right) = \sum _{i=1}^k \beta _i(1,\textit{\textbf{q}}_i, H(\textit{\textbf{q}}_i), J^*(\textit{\textbf{q}}_i)). \end{aligned} \end{aligned}$$

We continue the computation using Eq. (55) and get

$$\begin{aligned} \begin{aligned}&\sum _{j=1}^m c_j(1,\textit{\textbf{p}}_j) = \sum _{i=1}^k \beta _i(1,\textit{\textbf{q}}_i) = (1,\textit{\textbf{p}});\\&\sum _{j=1}^m c_j\theta _j=\sum _{i=1}^k \beta _iH(\textit{\textbf{q}}_i)\leqslant -\sum _{i=1}^k\beta _iE_i^- = -E^-; \\&\sum _{j=1}^m c_j\gamma _j=\sum _{i=1}^k \beta _iJ^*(\textit{\textbf{q}}_i)\leqslant \sum _{i=1}^k\beta _ir_i=r.\\ \end{aligned} \end{aligned}$$

Therefore, we conclude that $(c_1,\dots , c_m)\in \varLambda _m$ and

$$\begin{aligned} {\left\{ \begin{array}{ll} \textit{\textbf{p}}= \sum _{j=1}^m c_j\textit{\textbf{p}}_j,\\ E^- \leqslant -\sum _{j=1}^m c_j\theta _j,\\ r \geqslant \sum _{j=1}^m c_j\gamma _j. \end{array}\right. } \end{aligned}$$

As a consequence, ${{\mathrm {co}}~}({\mathrm {epi}~}F)\subseteq {{\mathrm {co}}~}\left( \cup _{j=1}^m \left( \{\textit{\textbf{p}}_j\}\times (-\infty , -\theta _j]\times [\gamma _j, +\infty )\right) \right) $. Now, Lemmas 3.1(iii) and 3.2(iii) imply $\{\textit{\textbf{p}}_j\}\times (-\infty , -\theta _j]\times [\gamma _j, +\infty ) \subseteq {\mathrm {epi}~}F$ for each $j\in \{1,\dots ,m\}$. Therefore, we have

$$\begin{aligned} \begin{aligned} {{\mathrm {co}}~}({\mathrm {epi}~}F) = \Bigg \{(\textit{\textbf{p}}, E^-,r)\in \mathbb {R}^n\times \mathbb {R}\times \mathbb {R}: \text {there exists }(c_1,\dots , c_m)\in \varLambda _m\text { s.t. }\quad \quad \\ \textit{\textbf{p}}= \sum _{j=1}^m c_j\textit{\textbf{p}}_j,\ E^- \leqslant -\sum _{j=1}^m c_j\theta _j, \ r \geqslant \sum _{j=1}^m c_j\gamma _j.\Bigg \}. \end{aligned} \end{aligned}$$

(57)

By [68, Def. IV.2.5.3 and Prop. IV.2.5.1], we have

$$\begin{aligned} \begin{aligned} {{\mathrm {co}}~}F(\textit{\textbf{p}},E^-) = \inf \{r\in \mathbb {R}: (\textit{\textbf{p}}, E^-,r)\in {{\mathrm {co}}~}({\mathrm {epi}~}F)\}. \end{aligned} \end{aligned}$$

(58)

The conclusion then follows from Eqs. (57) and (58). $\square $

1.4 B.4 Proof of Theorem 3.1

Proof of (i): First, the neural network f is the pointwise maximum of m affine functions in $(\textit{\textbf{x}},t)$ and therefore is jointly convex in these variables. Second, as the function H is continuous and bounded in ${\mathrm {dom}~}J^*$ by Lemma 3.2(ii), there exists a continuous and bounded function defined in $\mathbb {R}^n$ whose restriction to ${\mathrm {dom}~}J^*$ coincides with H [57, Thm. 4.16]. Then, statement (i) follows by substituting this function for $\tilde{H}$ in statement (ii), and so it suffices to prove the latter.

Proof of (ii) (sufficiency): Suppose $\tilde{H}(\textit{\textbf{p}}_i) = H(\textit{\textbf{p}}_i)$ for every $i\in \{1,\dots ,m\}$ and $\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})$ for every $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$. Since $\tilde{H}$ is continuous on $\mathbb {R}^n$ and J is convex and Lipschitz continuous with Lipschitz constant $L= \max _{i\in \{1,\dots ,m\}} \Vert \textit{\textbf{p}}_i\Vert $, [10, Thm. 3.1] implies that $(\textit{\textbf{x}},t) \mapsto \sup _{\textit{\textbf{p}}\in {\mathrm {dom}~}J^*}\left\{ \langle \textit{\textbf{p}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{p}}) - J^*(\textit{\textbf{p}})\right\} $ is the unique uniformly continuous viscosity solution to the HJ equation (16). But this function is equivalent to the neural network f by Lemma B.1, and therefore, both sufficiency and statement (i) follow.

Proof of (ii) (necessity): Suppose the neural network f is the unique uniformly continuous viscosity solution to (16). First, we prove that $\tilde{H}(\textit{\textbf{p}}_k) = H(\textit{\textbf{p}}_k)$ for every $k\in \{1,\dots , m\}$. Fix $k\in \{1,\dots , m\}$. By Lemma B.2, there exist $\textit{\textbf{x}}\in \mathbb {R}^n$ and $t>0$ satisfying $\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t) = \{\textit{\textbf{p}}_k\}$. Use Lems. 3.1(iii) and 3.2(iii) to write the maximization problem in Eq. (8) as

$$\begin{aligned} f(\textit{\textbf{x}},t) = \max _{\textit{\textbf{p}}\in \{\textit{\textbf{p}}_1,\dots ,\textit{\textbf{p}}_m\}} \{\langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - tH(\textit{\textbf{p}})-J^*(\textit{\textbf{p}})\}, \end{aligned}$$

(59)

where $(\textit{\textbf{p}},t) \mapsto \langle \textit{\textbf{p}}, \textit{\textbf{x}}\rangle - tH(\textit{\textbf{p}})-J^*(\textit{\textbf{p}})$ is continuous in $(\textit{\textbf{p}},t)$ and differentiable in t. As the feasible set $\{\textit{\textbf{p}}_1,\dots ,\textit{\textbf{p}}_m\}$ is compact, f is also differentiable with respect to t [21, Prop. 4.12], and its derivative equals

$$\begin{aligned} \frac{\partial f}{\partial t}(\textit{\textbf{x}},t) = \min \left\{ -H(\textit{\textbf{p}}):\ \textit{\textbf{p}}\text { is a maximizer in Eq. (59)}\right\} . \end{aligned}$$

Since $\textit{\textbf{x}}$ and t satisfy $\partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)=\{\textit{\textbf{p}}_k\}$, [67, Thm. VI.4.4.2] implies that the only maximizer in Eq. (59) is $\textit{\textbf{p}}_k$. As a result, there holds

$$\begin{aligned} \frac{\partial f}{\partial t}(\textit{\textbf{x}},t) = -H(\textit{\textbf{p}}_k). \end{aligned}$$

(60)

Since f is convex on $\mathbb {R}^n$, its subdifferential $\partial f(\textit{\textbf{x}},t)$ is non-empty and satisfies

$$\begin{aligned} \partial f(\textit{\textbf{x}},t)\subseteq \partial _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)\times \partial _t f(\textit{\textbf{x}},t) = \{(\textit{\textbf{p}}_k, -H(\textit{\textbf{p}}_k))\}. \end{aligned}$$

In other words, the subdifferential $\partial f(\textit{\textbf{x}},t)$ contains only one element, and therefore, f is differentiable at $(\textit{\textbf{x}},t)$ and its gradient equals $(\textit{\textbf{p}}_k, -H(\textit{\textbf{p}}_k))$ [133, Thm. 21.5]. Using (16) and (60), we obtain

$$\begin{aligned} 0 = \frac{\partial f}{\partial t}(\textit{\textbf{x}},t) + \tilde{H}(\nabla _{\textit{\textbf{x}}} f(\textit{\textbf{x}},t)) = -H(\textit{\textbf{p}}_k) + \tilde{H}(\textit{\textbf{p}}_k). \end{aligned}$$

As $k\in \{1,\dots , m\}$ is arbitrary, we find that $H(\textit{\textbf{p}}_k)=\tilde{H}(\textit{\textbf{p}}_k)$ for every $k\in \{1,\dots , m\}$.

Next, we prove by contradiction that $\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})$ for every $\textit{\textbf{p}}\in {\mathrm {dom}~}J^*$. It is enough to prove the property only for every $\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*$ by continuity of both $\tilde{H}$ and H (where continuity of H is proved in Lemma 3.2(ii)). Assume $\tilde{H}(\textit{\textbf{p}}) < H(\textit{\textbf{p}})$ for some $\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*$. Define two functions F and $\tilde{F}$ from $\mathbb {R}^n\times \mathbb {R}$ to $\mathbb {R}\cup \{+\infty \}$ by

$$\begin{aligned} F(\textit{\textbf{q}},E^{-}){:}{=}{\left\{ \begin{array}{ll} J^{*}(\textit{\textbf{q}}) &{}\quad \text {if }\;\; {E^{-}+H(\textit{\textbf{q}})\leqslant 0},\\ +\infty &{}\quad \mathrm{otherwise.} \end{array}\right. } \quad \text { and }\quad \tilde{F}(\textit{\textbf{q}},E^{-}){:}{=}{\left\{ \begin{array}{ll} J^{*}(\textit{\textbf{q}}) &{}\quad \text {if }\;\; {E^{-}+\tilde{H}(\textit{\textbf{q}})\leqslant 0},\\ +\infty &{}\quad \mathrm{otherwise.} \end{array}\right. } \end{aligned}$$

(61)

for any $\textit{\textbf{q}}\in \mathbb {R}^n$ and $E^-\in \mathbb {R}$. Denoting the convex envelope of F by ${{\mathrm {co}}~}F$, Lemma B.3 implies

$$\begin{aligned} \begin{aligned}&{{\mathrm {co}}~}F(\textit{\textbf{q}},E^-) = \inf _{(c_1,\dots , c_m)\in C(\textit{\textbf{q}},E^-) }\sum _{i=1}^{m}c_{i}\gamma _{i},\text { where }C \text { is defined by}\\&C(\textit{\textbf{q}},E^-){:}{=}\left\{ (c_1, \dots , c_m)\in \varLambda _m:\ \sum _{i=1}^{m}c_{i}\textit{\textbf{p}}_{i}=\textit{\textbf{q}}, \ \sum _{i=1}^{m}c_{i}\theta _{i}\leqslant -E^- \right\} . \end{aligned} \end{aligned}$$

(62)

Let $E_1^- \in \left( -H(\textit{\textbf{p}}), -\tilde{H}(\textit{\textbf{p}})\right) $. Now, we want to prove that ${{\mathrm {co}}~}F(\textit{\textbf{p}}, E_1^-)\leqslant J^*(\textit{\textbf{p}})$; this inequality will lead to a contradiction with the definition of H.

Using statement (i) of this theorem and the supposition that f is the unique viscosity solution to the HJ equation (16), we have that

$$\begin{aligned} f(\textit{\textbf{x}}, t) = \sup _{\textit{\textbf{q}}\in \mathbb {R}^n} \{\langle \textit{\textbf{q}},\textit{\textbf{x}}\rangle - tH(\textit{\textbf{q}}) - J^*(\textit{\textbf{q}})\} = \sup _{\textit{\textbf{q}}\in \mathbb {R}^n} \{\langle \textit{\textbf{q}},\textit{\textbf{x}}\rangle - t\tilde{H}(\textit{\textbf{q}}) - J^*(\textit{\textbf{q}})\}. \end{aligned}$$

Furthermore, a similar calculation as in the proof of [39, Prop. 3.1] yields

$$\begin{aligned} f = F^* = \tilde{F}^*, \text { which implies } f^* = \overline{{\mathrm {co}}}~F = \overline{{\mathrm {co}}}~\tilde{F}. \end{aligned}$$

where $\overline{{\mathrm {co}}}~F$ and $\overline{{\mathrm {co}}}~\tilde{F}$ denotes the convex lower semicontinuous envelopes of F and $\tilde{F}$, respectively. On the one hand, since $f^* = \overline{{\mathrm {co}}}~\tilde{F}$, the definition of $\tilde{F}$ in Eq. (61) implies

$$\begin{aligned} f^*\left( \textit{\textbf{p}}, -\tilde{H}(\textit{\textbf{p}})\right) \leqslant \tilde{F}\left( \textit{\textbf{p}}, -\tilde{H}(\textit{\textbf{p}})\right) = J^*(\textit{\textbf{p}}) \quad \text { and } \quad \{\textit{\textbf{p}}\}\times \left( -\infty , -\tilde{H}(\textit{\textbf{p}})\right] \subseteq {\mathrm {dom}~}\tilde{F} \subseteq {\mathrm {dom}~}f^*. \end{aligned}$$

(63)

Recall that $\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*$ and $E_1^-<-\tilde{H}(\textit{\textbf{p}})$, so that $(\textit{\textbf{p}}, E_1^-) \in {\mathrm {ri}}~{\mathrm {dom}~}f^*$. As a result, we get

$$\begin{aligned} \left( \textit{\textbf{p}}, \alpha E_1^- +(1-\alpha )(-\tilde{H}(\textit{\textbf{p}}))\right) \in {\mathrm {ri}}~{\mathrm {dom}~}f^* \text { for all }\alpha \in (0,1). \end{aligned}$$

(64)

On the other hand, since $f^* = {{\mathrm {co}}~}F$, we have ${\mathrm {ri}}~{\mathrm {dom}~}f^* = {\mathrm {ri}}~{\mathrm {dom}~}({{\mathrm {co}}~}F)$ and $f^* = {{\mathrm {co}}~}F$ in ${\mathrm {ri}}~{\mathrm {dom}~}f^*$. Taken together with Eq. (64) and the continuity of $f^*$, there holds

$$\begin{aligned} \begin{aligned} f^*\left( \textit{\textbf{p}}, -\tilde{H}(\textit{\textbf{p}})\right)&= \lim _{\begin{array}{c} \alpha \rightarrow 0\\ 0<\alpha<1 \end{array}} f^*\left( \textit{\textbf{p}}, \alpha E_1^- +(1-\alpha )(-\tilde{H}(\textit{\textbf{p}}))\right) \\&= \lim _{\begin{array}{c} \alpha \rightarrow 0\\ 0<\alpha <1 \end{array}} {{\mathrm {co}}~}F\left( \textit{\textbf{p}}, \alpha E_1^- +(1-\alpha )(-\tilde{H}(\textit{\textbf{p}}))\right) . \end{aligned} \end{aligned}$$

(65)

Note that ${{\mathrm {co}}~}F(\textit{\textbf{p}},\cdot )$ is monotone non-decreasing. Indeed, if $E_2^-$ is a real number such that $E_2^- > E_1^-$, by the definition of the set C in Eq. (62) there holds $C(\textit{\textbf{p}}, E_2^-)\subseteq C(\textit{\textbf{p}}, E_1^-)$, which implies ${{\mathrm {co}}~}F(\textit{\textbf{p}},E_2^-)\geqslant {{\mathrm {co}}~}F(\textit{\textbf{p}},E_1^-)$. Recalling that $E_1^- < -\tilde{H}(\textit{\textbf{p}})$, monotonicity of ${{\mathrm {co}}~}F(\textit{\textbf{p}},\cdot )$ and Eq. (65) imply

$$\begin{aligned} \begin{aligned} f^*\left( \textit{\textbf{p}}, -\tilde{H}(\textit{\textbf{p}})\right)&\geqslant \lim _{\begin{array}{c} \alpha \rightarrow 0\\ 0<\alpha <1 \end{array}} {{\mathrm {co}}~}F\left( \textit{\textbf{p}}, \alpha E_1^- +(1-\alpha )E_1^-\right) = {{\mathrm {co}}~}F(\textit{\textbf{p}}, E_1^-). \end{aligned} \end{aligned}$$

(66)

Combining Eqs. (63) and (66), we get

$$\begin{aligned} {{\mathrm {co}}~}F(\textit{\textbf{p}}, E_1^-)\leqslant J^*(\textit{\textbf{p}}) < +\infty . \end{aligned}$$

(67)

As a result, the set $C(\textit{\textbf{p}}, E_1^-)$ is non-empty. Since it is also compact, there exists a minimizer in Eq. (62) evaluated at the point $(\textit{\textbf{p}}, E_1^-)$. Let $(c_1,\dots , c_m)$ be such a minimizer. By Eqs. (62) and (67) and the assumption that $E_1^- \in \left( -H(\textit{\textbf{p}}),-\tilde{H}(\textit{\textbf{p}})\right) $, there holds

$$\begin{aligned} {\left\{ \begin{array}{ll} (c_1, \dots , c_m)\in \varLambda _m,\\ \sum _{i=1}^{m}c_{i}\textit{\textbf{p}}_{i}=\textit{\textbf{p}},\\ \sum _{i=1}^m c_i\gamma _i = {{\mathrm {co}}~}F(\textit{\textbf{p}},E_1^-) \leqslant J^*(\textit{\textbf{p}}),\\ \sum _{i=1}^{m}c_{i}\theta _{i}\leqslant -E_1^- < H(\textit{\textbf{p}}). \end{array}\right. } \end{aligned}$$

(68)

Comparing the first three statements in Eq. (68) and the formula of $J^*$ in Eq. (12), we deduce that $(c_1,\dots , c_m)$ is a minimizer in Eq. (12), i.e., $(c_1,\dots , c_m)\in \mathcal {A}(\textit{\textbf{p}})$. By definition of H in Eq. (14), we have

$$\begin{aligned} H(\textit{\textbf{p}}) =\inf _{\varvec{\alpha }\in \mathcal {A}(\textit{\textbf{p}})} \sum _{i=1}^m \alpha _i \theta _i \leqslant \sum _{i=1}^m c_i\theta _i, \end{aligned}$$

which contradicts the last inequality in Eq. (68). Therefore, we conclude that $\tilde{H}(\textit{\textbf{p}})\geqslant H(\textit{\textbf{p}})$ for any $\textit{\textbf{p}}\in {\mathrm {ri}}~{\mathrm {dom}~}J^*$ and the proof is finished.

C Connections between the neural network (17) and the viscous HJ PDE (18)

Let $f_\epsilon $ be the neural network defined by Eq. (17) with parameters $\{(\textit{\textbf{p}}_{i}, \theta _i, \gamma _{i})\}_{i=1}^m$ and $\epsilon > 0$, which is illustrated in Fig. 3. We will show in this appendix that when the parameter $\theta _{i} = -\frac{1}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}$ for $i\in \{1,\dots ,m\}$, then the neural network $f_\epsilon $ corresponds to the unique, jointly convex smooth solution to the viscous HJ PDE (18). This result will follow immediately from the following lemma.

Lemma C.1

Let $\{(\textit{\textbf{p}}_i,\gamma _i)\}_{i=1}^{m}\subset \mathbb {R}^n \times \mathbb {R}$ and $\epsilon > 0$. Then, the function $w_{\epsilon } : \mathbb {R}^n \mapsto \mathbb {R}$ defined by

$$\begin{aligned} w_{\epsilon }(\textit{\textbf{x}},t){:}{=}\sum _{i=1}^{m}e^{\left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}\right\rangle +\frac{t}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon } \end{aligned}$$

(69)

is the unique, jointly log-convex and smooth solution to the Cauchy problem

(70)

Proof

A short calculation shows that the function $w_\epsilon $ defined in Eq. (69) solves the Cauchy problem (70), and uniqueness holds by strict positiveness of the initial data (see [147, Chap. VIII, Thm. 2.2] and note that the uniqueness result can easily be generalized to $n > 1$).

Now, let $\lambda \in [0,1]$ and $(\textit{\textbf{x}}_{1},t_{1})$ and $(\textit{\textbf{x}}_{2},t_{2})$ be such that $\textit{\textbf{x}}=\lambda \textit{\textbf{x}}_{1}+(1-\lambda )\textit{\textbf{x}}_{2}$ and $t=\lambda t_{1}+(1-\lambda )t_{2}$. Then, the Hölder’s inequality (see, e.g., [57, Thm. 6.2]) implies

$$\begin{aligned}&\sum _{i=1}^{m}e^{\left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}\right\rangle +\frac{t}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon } = \sum _{i=1}^{m}\left( e^{\lambda \left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}_{1}\right\rangle +\frac{t_{1}}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon }e^{(1-\lambda )\left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}_{2}\right\rangle +\frac{t_{2}}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon }\right) \\&\quad \leqslant \left( \sum _{i=1}^{m}e^{\left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}_{1}\right\rangle +\frac{t_{1}}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon }\right) ^{\lambda }\left( \sum _{i=1}^{m}e^{\left( \left\langle \textit{\textbf{p}}_{i},\textit{\textbf{x}}_{2}\right\rangle +\frac{t_{2}}{2}\left\| \textit{\textbf{p}}_{i}\right\| _{2}^{2}-\gamma _{i}\right) /\epsilon }\right) ^{1-\lambda }, \end{aligned}$$

and we find $w_{\epsilon }(\textit{\textbf{x}},t)\leqslant \left( w_{\epsilon }(\textit{\textbf{x}}_{1},t_{1})\right) ^{\lambda }\left( w_{\epsilon }(\textit{\textbf{x}}_{2},t_{2})\right) ^{1-\lambda }$, which implies that $w_\epsilon $ is jointly log-convex in $(\textit{\textbf{x}},t)$. $\square $

Thanks to Lemma C.1 and the Cole–Hopf transformation $f_\epsilon (\textit{\textbf{x}},t) = \epsilon \log \left( w_\epsilon (\textit{\textbf{x}},t)\right) $ (see, e.g., [47], Sect. 4.4.1), a short calculation immediately implies that the neural network $f_\epsilon $ solves the viscous HJ PDE (18), and it is also its unique solution because $w_{\epsilon }$ is the unique solution to the Cauchy problem (70). Joint convexity in $(\textit{\textbf{x}},t)$ follows from log-convexity of $(\textit{\textbf{x}},t)\mapsto w_{\epsilon }(\textit{\textbf{x}},t)$ for every $\epsilon >0$.

D Proof of Proposition 3.1

To prove this proposition, we will use three lemmas whose statements and proofs are given in Sect. D.1, D.2, and D.3, respectively. The proof of Prop. 3.1 is given in Sect. D.4.

1.1 D.1 Statement and proof of Lemma D.1

Lemma D.1

Consider the one-dimensional case, i.e., $n=1$. Let $p_1,\dots ,p_m\in \mathbb {R}$ satisfy $p_1<\dots <p_m$ and define the function J using Eq. (10). Suppose assumptions (A1)-(A2) hold. Let $x\in \mathbb {R}$, $p\in \partial J(x)$, and suppose $p\ne p_i$ for any $i\in \{1,\dots ,m\}$. Then, there exists $k\in \{1,\dots ,m\}$ such that $p_k< p < p_{k+1}$ and

$$\begin{aligned} k,k+1 \in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{xp_{i} -\gamma _{i}\}. \end{aligned}$$

(71)

Proof

Let $I_x$ denotes the set of maximizers in Eq. (11) at x. Since $p\in \partial J(x)$, $p \ne p_i$ for $i \in \{1,\dots ,m\}$, and $\partial J(x) = {{\mathrm {co}}~}\{p_i: i\in I_i\}$ by [67, Thm. VI.4.4.2], there exist $j,l\in I_x$ such that $p_j< p < p_l$. Moreover, there exists k with $j\leqslant k<k+1 \leqslant l$ such that $p_j \leqslant p_k< p <p_{k+1}\leqslant p_l$. We will show that $k,k+1 \in I_x$. We only prove $k\in I_x$; the case for $k+1$ is similar.

If $p_j=p_k$, then $k=j\in I_x$ and the conclusion follows directly. Hence suppose $p_j< p_k < p_l$. Then, there exists $\alpha \in (0,1)$ such that $p_k = \alpha p_j + (1-\alpha ) p_l$. Using that $j,l\in I_x$, assumption (A2), and Jensen inequality, we get

$$\begin{aligned} \begin{aligned} xp_k - \gamma _k&= xp_k - g(p_k) = (\alpha p_j + (1-\alpha )p_l)x - g(\alpha p_j + (1-\alpha )p_l)\\&\geqslant \alpha xp_j + (1-\alpha )xp_l - \alpha g(p_j) - (1-\alpha )g(p_l)\\&= \alpha (xp_j - \gamma _j) + (1-\alpha )(xp_l - \gamma _l)\\&= \max _{i\in \{1,\dots ,m\}}\{xp_{i} -\gamma _{i}\}, \end{aligned} \end{aligned}$$

which implies that $k \in I_x$. A similar argument shows that $k+1\in I_x$, which completes the proof. $\square $

1.2 D.2 Statement and proof of Lemma D.2

Lemma D.2

Consider the one-dimensional case, i.e., $n=1$. Let $p_1,\dots ,p_m\in \mathbb {R}$ satisfy $p_1<\cdots <p_m$ and define the function H using Eq. (14). Suppose assumptions (A1)–(A3) hold. Let $u_0\in \mathbb {R}$ and $p_k< u_0< p_{k+1}$ for some index k. Then, there holds

$$\begin{aligned} H(u_0) = \beta _k\theta _k + \beta _{k+1} \theta _{k+1}, \end{aligned}$$

(72)

where

$$\begin{aligned} \beta _k{:}{=}\frac{p_{k+1} - u_0}{p_{k+1}-p_k} \quad \text { and } \quad \beta _{k+1} {:}{=}\frac{u_0- p_k}{p_{k+1}-p_k}. \end{aligned}$$

(73)

Proof

Let $\varvec{\beta }{:}{=}(\beta _1,\dots ,\beta _m) \in \varLambda _{m}$ satisfy

$$\begin{aligned} \beta _k{:}{=}\frac{p_{k+1} - u_0}{p_{k+1}-p_k} \quad \text { and } \quad \beta _{k+1} {:}{=}\frac{u_0- p_k}{p_{k+1}-p_k}, \end{aligned}$$

and $\beta _i = 0$ for every $i\in \{1,\dots ,m\}\setminus \{k,k+1\}$. We will prove that $\varvec{\beta }$ is a minimizer in Eq. (14) evaluated at $u_0$, that is,

$$\begin{aligned} \varvec{\beta } \in {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\varvec{\alpha }\in \mathcal {A}(u_0)}} \left\{ \sum _{i=1}^m \alpha _i \theta _i\right\} , \end{aligned}$$

where

$$\begin{aligned} \mathcal {A}(u_0) {:}{=}{\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{\begin{array}{c} (\alpha _{1},\dots \alpha _{m})\in \varLambda _m\\ \sum _{i=1}^{m}\alpha _{i}p_{i}=u_0 \end{array}} }\left\{ \sum _{i=1}^{m}\alpha _{i}\gamma _{i}\right\} . \end{aligned}$$

First, we show that $\varvec{\beta } \in \mathcal {A}(u_0)$. By definition of $\varvec{\beta }$ and Lemma 3.1(ii) with $p = u_0$, the statement holds provided $k,k+1\in I_x$, where the set $I_x$ contains the maximizers in Eq. (10) evaluated at $x \in \partial J^*(u_0)$. But if $x\in \partial J^*(u_0)$, we have $u_0\in \partial J(x)$, and Lemma D.1 implies $k,k+1\in I_x$. Hence, $\varvec{\beta } \in \mathcal {A}(u_0)$.

Now, suppose that $\varvec{\beta }$ is not a minimizer in Eq. (14) evaluated at $u_0$. By Lemma 3.2(i), there exists a minimizer in Eq. (14) evaluated at the point $u_0$, which we denote by $(\alpha _1,\dots , \alpha _m)$. Then there holds

$$\begin{aligned} {\left\{ \begin{array}{ll} \sum _{i=1}^m \alpha _i = \sum _{i=1}^m \beta _i = 1,\\ \sum _{i=1}^m \alpha _i p_i = \sum _{i=1}^m \beta _i p_i = u_0,\\ \sum _{i=1}^m \alpha _i \gamma _i = \sum _{i=1}^m \beta _i \gamma _i = J^*(u_0),\\ \sum _{i=1}^m \alpha _i \theta _i < \sum _{i=1}^m \beta _i \theta _i. \end{array}\right. } \end{aligned}$$

(74)

Since $\alpha _i \geqslant 0$ for every i and $\beta _i = 0$ for every $i\in \{1,\dots ,m\}\setminus \{k,k+1\}$, we have $\alpha _k + \alpha _{k+1} \leqslant 1 = \beta _k + \beta _{k+1}$. As $\varvec{\alpha }\ne \varvec{\beta }$, then one or both of the inequalities $\alpha _k < \beta _k$ and $\alpha _{k+1} < \beta _{k+1}$ hold. This leaves three possible cases, and we now show that each case leads to a contradiction.

Case 1: Let $\alpha _k < \beta _k$ and $\alpha _{k+1} \geqslant \beta _{k+1}$. Define the coefficient $c_i$ by

The following equations then hold

$$\begin{aligned} {\left\{ \begin{array}{ll} (c_1,\dots , c_m)\in \varDelta _m \text { with } c_k = 0,\\ \sum _{i\ne k}c_i p_i = p_k,\\ \sum _{i\ne k}c_i \gamma _i = \gamma _k,\\ \sum _{i\ne k}c_i \theta _i < \theta _k. \end{array}\right. } \end{aligned}$$

These equations, however, violate assumption (A3), and so we get a contradiction.

Case 2: Let $\alpha _k \geqslant \beta _k$ and $\alpha _{k+1} < \beta _{k+1}$. A similar argument as in case 1 can be applied here by exchanging the indices k and $k+1$ to derive a contradiction.

Case 3: Let $\alpha _k < \beta _k$ and $\alpha _{k+1} < \beta _{k+1}$. From Eq. (74), we obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} \beta _k - \alpha _k + \beta _{k+1} - \alpha _{k+1} = \sum _{i\ne k,k+1} \alpha _i,\\ (\beta _k - \alpha _k) p_k + (\beta _{k+1} - \alpha _{k+1}) p_{k+1} = \sum _{i\ne k,k+1} \alpha _i p_i,\\ (\beta _k - \alpha _k) \gamma _k + (\beta _{k+1} - \alpha _{k+1}) \gamma _{k+1} = \sum _{i\ne k,k+1} \alpha _i \gamma _i,\\ (\beta _k - \alpha _k)\theta _k + (\beta _{k+1} - \alpha _{k+1})\theta _{k+1} > \sum _{i\ne k,k+1} \alpha _i \theta _i. \end{array}\right. } \end{aligned}$$

(75)

Define two numbers $q_k$ and $q_{k+1}$ by

$$\begin{aligned} \begin{aligned} q_k{:}{=}\frac{\sum _{i<k} \alpha _ip_i}{\sum _{i<k}\alpha _i} \quad \text { and } \quad q_{k+1}{:}{=}\frac{\sum _{i>k+1} \alpha _ip_i}{\sum _{i>k+1}\alpha _i}. \end{aligned} \end{aligned}$$

(76)

Note that from the first two equations in (74) and the assumption that $\alpha _k < \beta _k$ and $\alpha _{k+1} < \beta _{k+1}$, there exist $i_1<k$ and $i_2>k+1$ such that $\alpha _{i_1}\ne 0$ and $\alpha _{i_2}\ne 0$, and hence, the numbers $q_k$ and $q_{k+1}$ are well-defined. By definition, we have $q_k<p_k<p_{k+1}<q_{k+1}$. Therefore, there exist $b_k,b_{k+1}\in (0,1)$ such that

$$\begin{aligned} p_k = b_kq_k+ (1-b_k)q_{k+1}\quad \text { and } \quad p_{k+1} = b_{k+1}q_k+ (1-b_{k+1})q_{k+1}. \end{aligned}$$

(77)

A straightforward computation yields

$$\begin{aligned} b_k= \frac{q_{k+1}- p_k}{q_{k+1}- q_k} \quad \text { and }\quad b_{k+1}= \frac{q_{k+1}- p_{k+1}}{q_{k+1}- q_k}. \end{aligned}$$

(78)

Define the coefficients $c_i^k$ and $c_i^{k+1}$ as follows

(79)

These coefficients satisfy $c_i^k, c_i^{k+1}\in [0,1]$ for any i and $\sum _{i=1}^m c_i^k= \sum _{i=1}^m c_i^{k+1}=1$. In other words, we have

$$\begin{aligned} (c_1^k,\dots , c_m^k)\in \varDelta _m \text { with }c_k^k= 0\quad \text { and }\quad (c_1^{k+1},\dots , c_m^{k+1})\in \varDelta _m \text { with }c_{k+1}^{k+1}= 0. \end{aligned}$$

(80)

Hence, the first equality in Eq. (9) holds for the coefficients $(c_1^k,\dots , c_m^k)$ with the index k and also for the coefficients $(c_1^{k+1}, \dots , c_m^{k+1})$ with the index $k+1$. We show next that these coefficients satisfy the second and third equalities in (9) and draw a contradiction with assumption (A3).

Using Eqs. (76), (77), and (79) to write the formulas for $p_k$ and $p_{k+1}$ via the coefficients $c_i^k$ and $c_i^{k+1}$, we find

$$\begin{aligned} \begin{aligned}&p_k = b_k\frac{\sum _{i<k} \alpha _ip_i}{\sum _{i<k}\alpha _i} + (1-b_k)\frac{\sum _{i>k+1} \alpha _ip_i}{\sum _{i>k+1}\alpha _i} = \sum _{i\ne k,k+1}c_i^kp_i= \sum _{i\ne k}c_i^kp_i,\\&p_{k+1} = b_{k+1}\frac{\sum _{i<k} \alpha _ip_i}{\sum _{i<k}\alpha _i} + (1-b_{k+1})\frac{\sum _{i>k+1} \alpha _ip_i}{\sum _{i>k+1}\alpha _i}=\sum _{i\ne k,k+1} c_i^{k+1}p_i = \sum _{i\ne k+1} c_i^{k+1}p_i, \end{aligned} \end{aligned}$$

(81)

where the last equalities in the two formulas above hold because $c_{k+1}^k=0$ and $c_k^{k+1} = 0$ by definition. Hence, the second equality in Eq. (9) also holds for both the index k and $k+1$.

From the third equality in Eq. (75), assumption (A2), Eq. (81), and Jensen’s inequality, we have

$$\begin{aligned} \begin{aligned}&\sum _{i\ne k,k+1} \alpha _i\gamma _i = (\beta _k - \alpha _k)\gamma _k + (\beta _{k+1} - \alpha _{k+1})\gamma _{k+1}\\&\quad = (\beta _k - \alpha _k)g(p_k) + (\beta _{k+1} - \alpha _{k+1})g(p_{k+1})\\&\quad = (\beta _k - \alpha _k)g\left( \sum _{i\ne k,k+1}c_i^kp_i\right) + (\beta _{k+1} - \alpha _{k+1})g\left( \sum _{i\ne k,k+1}c_i^{k+1}p_i\right) \\&\quad \leqslant (\beta _k - \alpha _k)\left( \sum _{i\ne k,k+1}c_i^kg(p_i)\right) + (\beta _{k+1} - \alpha _{k+1})\left( \sum _{i\ne k,k+1}c_i^{k+1}g(p_i)\right) \\&\quad = \sum _{i\ne k,k+1} ((\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1})g(p_i)\\&\quad = \sum _{i\ne k,k+1} ((\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1})\gamma _i. \end{aligned} \end{aligned}$$

(82)

We now compute and simplify the coefficients $(\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1}$ in the formula above. First, consider the case when $i<k$. Eqs. (78) and (79) imply

$$\begin{aligned}&(\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1}\\&\quad = (\beta _k - \alpha _k) \frac{b_k\alpha _i}{\sum _{\omega<k}\alpha _\omega } + (\beta _{k+1} - \alpha _{k+1})\frac{b_{k+1}\alpha _i}{\sum _{\omega<k}\alpha _\omega }\\&\quad =\frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega } ((\beta _k-\alpha _k)b_k+ (\beta _{k+1} - \alpha _{k+1})b_{k+1})\\&\quad = \frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega }\left( (\beta _k-\alpha _k)\frac{q_{k+1}- p_k}{q_{k+1}- q_k} + (\beta _{k+1} - \alpha _{k+1})\frac{q_{k+1}- p_{k+1}}{q_{k+1}- q_k}\right) \\&\quad = \frac{\alpha _i}{\sum _{\omega <k}\alpha _\omega }\cdot \frac{1}{q_{k+1}- q_k} ((\beta _k - \alpha _k + \beta _{k+1} - \alpha _{k+1})q_{k+1}\\&\quad - (\beta _k - \alpha _k) p_k - (\beta _{k+1} - \alpha _{k+1}) p_{k+1}). \end{aligned}$$

Applying the first two equalities in Eq. (75) and Eq. (76) to the last formula above, we obtain

$$\begin{aligned} \begin{aligned}&(\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1}\\&\quad = \frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega }\cdot \frac{1}{q_{k+1}- q_k} \left( \left( \sum _{i\ne k,k+1}\alpha _i\right) q_{k+1}- \sum _{i\ne k,k+1}\alpha _ip_i\right) \\&\quad =\frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega }\cdot \frac{1}{q_{k+1}- q_k} \left( \sum _{i\ne k,k+1}\alpha _iq_{k+1}- \sum _{i< k}\alpha _ip_i - \sum _{i> k+1}\alpha _ip_i\right) \\&\quad =\frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega }\cdot \frac{1}{q_{k+1}- q_k} \left( \sum _{i\ne k,k+1}\alpha _iq_{k+1}- \left( \sum _{i< k}\alpha _i\right) q_k- \left( \sum _{i> k+1}\alpha _i\right) q_{k+1}\right) \\&\quad = \frac{\alpha _i}{\sum _{\omega<k}\alpha _\omega }\cdot \frac{1}{q_{k+1}- q_k} \left( \sum _{i<k}\alpha _i(q_{k+1}- q_k)\right) \\&\quad = \alpha _i. \end{aligned} \end{aligned}$$

The same result for the case when $i>k+1$ also holds and the proof is similar. Therefore, we have

$$\begin{aligned} (\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1}= \alpha _i \quad \text { for each } i\ne k, k+1. \end{aligned}$$

(83)

Combining Eqs. (82) and (83), we have

$$\begin{aligned} \begin{aligned}&\sum _{i\ne k,k+1} \alpha _i\gamma _i \leqslant \sum _{i\ne k,k+1} ((\beta _k - \alpha _k)c_i^k+ (\beta _{k+1} - \alpha _{k+1})c_i^{k+1})\gamma _i = \sum _{i\ne k,k+1} \alpha _i\gamma _i. \end{aligned} \end{aligned}$$

Since the left side and right side are the same, the inequality above becomes equality, which implies that the inequality in Eq. (82) also becomes equality. In other words, we have

$$\begin{aligned} \begin{aligned} \gamma _k&= g\left( p_k\right) =\sum _{i\ne k,k+1}c_i^kg(p_i) = \sum _{i\ne k,k+1} c_i^k\gamma _i = \sum _{i\ne k} c_i^k\gamma _i,\\ \gamma _{k+1}&= g\left( p_{k+1}\right) = \sum _{i\ne k,k+1}c_i^{k+1}g(p_i) = \sum _{i\ne k,k+1} c_i^{k+1}\gamma _i = \sum _{i\ne k+1} c_i^{k+1}\gamma _i, \end{aligned} \end{aligned}$$

(84)

where the last equalities in the two formulas above hold because $c_{k+1}^k=0$ and $c_k^{k+1} = 0$ by definition. Hence, the third equality in (9) also holds for both indices k and $k+1$.

In summary, Eqs. (80), (81), and (84) imply that Eq. (9) holds for the index k with coefficients $(c_1^k,\dots , c_m^k)$ and also for the index $k+1$ with coefficients $(c_1^{k+1}, \dots , c_m^{k+1})$. Hence, by assumption (A3), we find

$$\begin{aligned} \sum _{i\ne k} c_i^k\theta _i> \theta _k \quad \text { and } \quad \sum _{i\ne k+1} c_i^{k+1}\theta _i > \theta _{k+1}. \end{aligned}$$

Using the inequalities above with Eq. (83) and the fact that $c_{k+1}^k=0$ and $c_k^{k+1} = 0$, we find

$$\begin{aligned} \begin{aligned}&(\beta _k - \alpha _k) \theta _k + (\beta _{k+1} - \alpha _{k+1})\theta _{k+1} < (\beta _k - \alpha _k) \sum _{i\ne k} c_i^k\theta _i + (\beta _{k+1} - \alpha _{k+1} )\sum _{i\ne k+1} c_i^{k+1}\theta _i\\&\quad = \sum _{i\ne k,k+1} ((\beta _k - \alpha _k) c_i^k+ (\beta _{k+1} - \alpha _{k+1} )c_i^{k+1})\theta _i = \sum _{i\ne k, k+1} \alpha _i\theta _i, \end{aligned} \end{aligned}$$

which contradicts the last inequality in Eq. (75).

In conclusion, we obtain contradictions in all the three cases. As a consequence, we conclude that $\varvec{\beta }$ is a minimizer in Eq. (14) evaluated at $u_0$ and Eq. (72) follows from the definition of H in (14). $\square $

1.3 D.3 Statement and proof of Lemma D.3

Lemma D.3

Consider the one-dimensional case, i.e., $n=1$. Let $p_1,\dots ,p_m\in \mathbb {R}$ satisfy $p_1<\dots <p_m$. Suppose assumptions (A1)-(A2) hold. Let $x\in \mathbb {R}$ and $t>0$. Assume j, k, l are three indices such that $1\leqslant j\leqslant k<l\leqslant m$ and

$$\begin{aligned} j, l \in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{xp_{i} -t\theta _{i}-\gamma _{i}\}. \end{aligned}$$

(85)

Then, there holds

$$\begin{aligned} \frac{\theta _l - \theta _k}{p_l - p_k} \leqslant \frac{\theta _l - \theta _j}{p_l - p_j}. \end{aligned}$$

(86)

Proof

Note that Eq. (86) holds trivially when $j=k$, so we only need to consider the case when $j<k<l$. On the one hand, Eq. (85) implies

$$\begin{aligned} xp_j - t\theta _j - \gamma _j = xp_l - t\theta _l - \gamma _l \geqslant xp_k - t\theta _k - \gamma _k, \end{aligned}$$

which yields

$$\begin{aligned} \begin{aligned}&\gamma _l - \gamma _k \leqslant x(p_l - p_k) - t(\theta _l - \theta _k),\\&\gamma _l - \gamma _j = x(p_l - p_j) - t(\theta _l - \theta _j).\\ \end{aligned} \end{aligned}$$

(87)

On the other hand, for each $i\in \{j,j+1,\dots , l-1\}$ let $q_i\in (p_i, p_{i+1})$ and $x_i \in \partial J^*(q_i)$. Such $x_i$ exists because $q_i \in {\mathrm {int}~}{{\mathrm {dom}~}J^*}$, so that the subdifferential $\partial J^*(q_i)$ is non-empty. Then, $q_i\in \partial J(x_i)$ and Lemma D.1 imply

$$\begin{aligned} x_ip_i - \gamma _i = x_ip_{i+1} - \gamma _{i+1} = \max _{\omega \in \{1,\dots ,m\}} \{x_i p_\omega - \gamma _\omega \}. \end{aligned}$$

A straightforward computation yields

$$\begin{aligned} \begin{aligned}&\gamma _l - \gamma _k = \sum _{i=k}^{l-1} (\gamma _{i+1} - \gamma _i) = \sum _{i=k}^{l-1} x_i(p_{i+1} - p_i),\\&\gamma _l - \gamma _j = \sum _{i=j}^{l-1} (\gamma _{i+1} - \gamma _i) = \sum _{i=j}^{l-1} x_i(p_{i+1} - p_i). \end{aligned} \end{aligned}$$

Combining the two equalities above with Eq. (87), we conclude that

$$\begin{aligned} \begin{aligned}&x(p_l - p_k) - t(\theta _l - \theta _k)\geqslant \sum _{i=k}^{l-1} x_i(p_{i+1} - p_i),\\&x(p_l - p_j) - t(\theta _l - \theta _j)= \sum _{i=j}^{l-1} x_i(p_{i+1} - p_i). \end{aligned} \end{aligned}$$

Now, divide the inequality above by $t(p_l - p_k) > 0$ (because by assumption $t>0$ and $l>k$, which implies that $p_l>p_k$), divide the equality above by $t(p_l - p_j) > 0$ (because $l>j$, which implies that $t(p_l - p_j) \ne 0$), and rearrange the terms to obtain

$$\begin{aligned} \begin{aligned}&\frac{\theta _l - \theta _k}{p_l - p_k} \leqslant \frac{x}{t} - \frac{1}{t} \frac{\sum _{i=k}^{l-1} x_i(p_{i+1} - p_i)}{p_l - p_k},\\&\frac{\theta _l - \theta _j}{p_l - p_j} = \frac{x}{t} - \frac{1}{t} \frac{\sum _{i=j}^{l-1} x_i(p_{i+1} - p_i)}{p_l - p_j}. \end{aligned} \end{aligned}$$

(88)

Recall that $q_j< q_{j+1}< \dots < q_{l-1}$ and $x_i \in \partial J^*(q_i)$ for any $j\leqslant i<l$. Since the function $J^*$ is convex, the subdifferential operator $\partial J^*$ is a monotone non-decreasing operator [67, Def. IV.4.1.3, and Prop. VI.6.1.1], which yields $x_j\leqslant x_{j+1}\leqslant \dots \leqslant x_{l-1}$. Using that $p_1< p_2< \dots < p_m$ and $j<k<l$, we obtain

$$\begin{aligned}&\frac{\sum _{i=k}^{l-1} x_i(p_{i+1} - p_i)}{p_l - p_k} \geqslant \frac{\sum _{i=k}^{l-1} x_k(p_{i+1} - p_i)}{p_l - p_k} = x_k\nonumber \\&= \frac{\sum _{i=j}^{k-1} x_k(p_{i+1} - p_i)}{p_k - p_j} \geqslant \frac{\sum _{i=j}^{k-1} x_i(p_{i+1} - p_i)}{p_k - p_j}. \end{aligned}$$

(89)

To proceed, we now use that fact that if four real numbers $a,c\in \mathbb {R}$ and $b,d>0$ satisfy $\frac{a}{b}\geqslant \frac{c}{d}$, then $\frac{a}{b}\geqslant \frac{a+c}{b+d}$. Combining this fact with inequality (89), we find

$$\begin{aligned}&\frac{\sum _{i=k}^{l-1} x_i(p_{i+1} - p_i)}{p_l - p_k} \geqslant \frac{\sum _{i=k}^{l-1} x_i(p_{i+1} - p_i) + \sum _{i=j}^{k-1} x_i(p_{i+1} - p_i)}{p_l - p_k + p_k - p_j}\\&= \frac{\sum _{i=j}^{l-1} x_i(p_{i+1} - p_i)}{p_l - p_j}. \end{aligned}$$

We combine the inequality above with (88) to obtain

$$\begin{aligned} \frac{\theta _l - \theta _k}{p_l - p_k} \leqslant \frac{\theta _l - \theta _j}{p_l - p_j}. \end{aligned}$$

which concludes the proof. $\square $

1.4 D.4 Proof of Proposition 3.1

Proof of (i): First, note that u is piecewise constant. Second, recall that J is defined as the pointwise maximum of a finite number of affine functions. Therefore, the initial data $u(\cdot ,0) = \nabla J(\cdot )$ (recall that here, the gradient $\nabla $ is taken in the sense of distribution) are bounded and of locally bounded variation (see [48, Chap. 5, page 167] for the definition of locally bounded variation). Finally, the flux function H, defined in Eq. (14), is Lipschitz continuous in ${\mathrm {dom}~}J^*$ by Lemma D.2. It can therefore be extended to $\mathbb {R}$ while preserving its Lipschitz property [57, Thm. 4.16]. Therefore, we can invoke [36, Prop. 2.1] to conclude that u is the entropy solution to the conservation law (21) provided it satisfies the two following conditions. Let $\bar{x}(t)$ be any smooth line of discontinuity of u. Fix $t>0$ and define $u^-$ and $u^+$ as

$$\begin{aligned} u^-{:}{=}\lim _{x\rightarrow \bar{x}(t)^-} u(x,t)\quad \text { and } \quad u^+{:}{=}\lim _{x\rightarrow \bar{x}(t)^+} u(x,t). \end{aligned}$$

(90)

Then, the two conditions are:

1.
The curve $\bar{x}(t)$ is a straight line with the slope
$$\begin{aligned} \frac{d\bar{x}}{dt} = \frac{H(u^+)-H(u^-)}{u^+- u^-}. \end{aligned}$$
(91)
2.
For any $u_0$ between $u^+$ and $u^-$, we have
$$\begin{aligned} \frac{H(u^+) - H(u_0)}{u^+- u_0} \leqslant \frac{H(u^+) - H(u^-)}{u^+- u^-}. \end{aligned}$$
(92)

First, we prove the first condition and Eq. (91). According to the definition of u in Eq. (20), the range of u is the compact set $\{p_1,\dots , p_m\}$. As a result, $u^-$ and $u^+$ are in the range of u, i.e., there exist indices j and l such that

$$\begin{aligned} u^-= p_j\quad \text { and }\quad u^+= p_l. \end{aligned}$$

(93)

Let $(\bar{x}(s),s)$ be a point on the curve $\bar{x}$ which is not one of the endpoints. Since u is piecewise constant, there exists a neighborhood $\mathcal {N}$ of $(\bar{x}(s),s)$ such that for any $(x^-,t), (x^+,t)\in \mathcal {N}$ satisfying $x^-< \bar{x}(t) < x^+$, we have $u(x^-,t) = u^-= p_j$ and $u(x^+,t) = u^+= p_l$. In other words, if $x^-, x^+, t$ are chosen as above, according to the definition of u in Eq. (20), we have

$$\begin{aligned} j\in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{x^-p_{i} -t\theta _{i}-\gamma _{i}\} \quad \text { and }\quad l\in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{x^+p_{i} -t\theta _{i}-\gamma _{i}\}. \end{aligned}$$

(94)

Define a sequence $\{x^-_k\}_{k=1}^{+\infty } \subset (-\infty ,\bar{x}(s))$ such that $(x^-_k, s)\in \mathcal {N}$ for any $k\in \mathbb {N}$ and $\lim _{k\rightarrow +\infty }x^-_k = \bar{x}(s)$. By Eq. (94), we have

$$\begin{aligned} x^-_k p_{j} -s\theta _{j}-\gamma _{j} \ge x^-_k p_{i} -s\theta _{i}-\gamma _{i} \text { for any }i\in \{1,\dots ,m\}. \end{aligned}$$

When k approaches infinity, the above inequality implies

$$\begin{aligned} \bar{x}(s) p_{j} -s\theta _{j}-\gamma _{j} \ge \bar{x}(s) p_{i} -s\theta _{i}-\gamma _{i}\quad \text { for any }i\in \{1,\dots ,m\}. \end{aligned}$$

In other words, we have

$$\begin{aligned} j\in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{\bar{x}(s)p_{i} -s\theta _{i}-\gamma _{i}\}. \end{aligned}$$

(95)

Similarly, define a sequence $\{x^+_k\}_{k=1}^{+\infty } \subset (\bar{x}(s), +\infty )$ such that $(x^+_k, s)\in \mathcal {N}$ for any $k\in \mathbb {N}$ and $\lim _{k\rightarrow +\infty }x^+_k = \bar{x}(s)$. Using a similar argument as above, we can conclude that

$$\begin{aligned} l\in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{\bar{x}(s)p_{i} -s\theta _{i}-\gamma _{i}\}. \end{aligned}$$

(96)

By a continuity argument, Eqs. (95) and (96) also hold for the end points of $\bar{x}$. In conclusion, for any $(\bar{x}(t), t)$ on the curve $\bar{x}$, we have

$$\begin{aligned} j, l\in {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{i\in \{1,\dots ,m\}}}\{\bar{x}(t)p_{i} -t\theta _{i}-\gamma _{i}\}, \end{aligned}$$

(97)

which implies that

$$\begin{aligned} \bar{x}(t)p_{l} -t\theta _{l}-\gamma _{l} = \bar{x}(t)p_{j} -t\theta _{j}-\gamma _{j}. \end{aligned}$$

Therefore, the curve $\bar{x}(t)$ lies on the straight line

$$\begin{aligned} x(p_l - p_j) - t(\theta _l - \theta _j) -(\gamma _l - \gamma _j) = 0 \end{aligned}$$

and Eq. (93) and Lemma 3.2(iii) imply that its slope equals

$$\begin{aligned} \frac{d\bar{x}}{dt} = \frac{\theta _l - \theta _j}{p_l - p_j} = \frac{H(u^+) - H(u^-)}{u^+- u^-}. \end{aligned}$$

This proves Eq. (91) and the first condition holds.

It remains to show the second condition. Since u equals $\nabla _x f$ and f is convex by Theorem 3.1, its corresponding subdifferential operator u is monotone non-decreasing with respect to x [67, Def. IV.4.1.3 and Prop. VI.6.1.1]. As a result, $u^-< u^+$ and $u_0\in (u^-, u^+)$, where we still adopt the notation $u^-= p_j$ and $u^+= p_l$. Recall that Lemma 3.2(iii) implies $H(p_i) = \theta _i$ for any i. Then, Eq. (92) in the second condition becomes

$$\begin{aligned} \frac{\theta _l - H(u_0)}{p_l - u_0} \leqslant \frac{\theta _l - \theta _j}{p_l - p_j}. \end{aligned}$$

(98)

Without loss of generality, we may assume that $p_1< p_2< \dots < p_m$. Then, the fact $p_j=u^-< u^+= p_l$ implies $j<l$. We consider the following two cases.

First, if there exists some k such that $u_0= p_k$, then $H(u_0) = \theta _k$ by Lemma 3.2(iii). Since $u^-< u_0< u^+$, we have $j< k < l$. Recall that Eq. (97) holds. Therefore, the assumptions of Lemma D.3 are satisfied, which implies Eq. (98) holds.

Second, suppose $u_0\ne p_i$ for every $i\in \{1,\dots ,m\}$. Then there exists some $k\in \{j, j+1,\dots , l-1\}$ such that $p_k< u_0< p_{k+1}$. Lemma D.2 then implies that Eqs. (72) and (73) hold, that is,

$$\begin{aligned} H(u_0) = \beta _k \theta _k + \beta _{k+1}\theta _{k+1}, \quad u_0= \beta _k p_k + \beta _{k+1} p_{k+1}, \quad \text { and }\quad \beta _k + \beta _{k+1} = 1. \end{aligned}$$

Using these three equations, we can write the left-hand side of Eq. (98) as

$$\begin{aligned} \frac{\theta _l - H(u_0)}{p_l - u_0} = \frac{\theta _l - \beta _k \theta _k - \beta _{k+1} \theta _{k+1}}{p_l - \beta _kp_k - \beta _{k+1}p_{k+1}} = \frac{\beta _k (\theta _l - \theta _k) + \beta _{k+1} (\theta _l-\theta _{k+1})}{\beta _k(p_l - p_k) + \beta _{k+1}(p_l - p_{k+1})}. \end{aligned}$$

(99)

If $k+1 = l$, then this equation become

$$\begin{aligned} \frac{\theta _l - H(u_0)}{p_l - u_0} = \frac{\theta _l - \theta _k}{p_l - p_k}. \end{aligned}$$

Since $j\leqslant k<l$ and Eq. (97) hold, then the assumptions of Lemma D.3 are satisfied. This allows us to conclude that Eq. (98) holds.

If $k+1\ne l$, then using Eq. (97), the inequalities $j\leqslant k<k+1< l$, and Lemma D.3, we obtain

$$\begin{aligned} \frac{\beta _k(\theta _l - \theta _k)}{\beta _k(p_l - p_k)} = \frac{\theta _l - \theta _k}{p_l - p_k} \leqslant \frac{\theta _l - \theta _j}{p_l - p_j} \quad \text { and }\quad \frac{\beta _{k+1}(\theta _l - \theta _{k+1})}{\beta _{k+1}(p_l - p_{k+1})} = \frac{\theta _l - \theta _{k+1}}{p_l - p_{k+1}} \leqslant \frac{\theta _l - \theta _j}{p_l - p_j}. \end{aligned}$$

Note that if $a_i \in \mathbb {R}$ and $b_i \in (0,+\infty )$ for $i\in \{1,2,3\}$ satisfy $\frac{a_1}{b_1} \leqslant \frac{a_3}{b_3}$ and $\frac{a_2}{b_2} \leqslant \frac{a_3}{b_3}$, then $\frac{a_1+a_2}{b_1+b_2}\leqslant \frac{a_3}{b_3}$. Then, since $\beta _k(p_l - p_k)$, $\beta _{k+1}(p_l - p_{k+1})$ and $p_l-p_j$ are positive, we have

$$\begin{aligned} \frac{\beta _k (\theta _l - \theta _k) + \beta _{k+1} (\theta _l-\theta _{k+1})}{\beta _k(p_l - p_k) + \beta _{k+1}(p_l - p_{k+1})}\leqslant \frac{\theta _l - \theta _j}{p_l - p_j}. \end{aligned}$$

Hence, Eq. (98) follows directly from the inequality above and Eq. (99).

Therefore, the two conditions, including Eqs. (91) and (92), are satisfied and we apply [36, Prop 2.1] to conclude that the function u is the entropy solution to the conservation law (21).

Proof of (ii) (sufficiency): Without loss of generality, assume $p_1<p_2<\dots <p_m$. Let $C\in \mathbb {R}$. Suppose $\tilde{H}$ satisfies $\tilde{H}(p_i)=H(p_i)+C$ for each $i\in \{1,\dots ,m\}$ and $\tilde{H}(p)\geqslant H(p)+C$ for any $p\in [p_1,p_m]$. We want to prove that u is the entropy solution to the conservation law (22).

As in the proof of (i), we apply [36, Prop 2.1] and verify that the two conditions hold through Eqs. (91) and (92). Let $\bar{x}(t)$ be any smooth line of discontinuity of u, define $u^-$ and $u^+$ by Eq. (90) (and recall that $u^-= p_j$ and $u^+= p_l$), and let $u_0\in (u^-,u^+)$. We proved in the proof of (i) that $\bar{x}(t)$ is a straight line, and so it suffices to prove that

$$\begin{aligned} \frac{d\bar{x}}{dt} = \frac{\tilde{H}(u^+)-\tilde{H}(u^-)}{u^+- u^-}, \quad \text { and }\quad \frac{\tilde{H}(u^+) - \tilde{H}(u_0)}{u^+- u_0} \leqslant \frac{\tilde{H}(u^+) - \tilde{H}(u^-)}{u^+- u^-}. \end{aligned}$$

(100)

We start with proving the equality in Eq. (100). By assumption, there holds

$$\begin{aligned}&\tilde{H}(u^-) = \tilde{H}(p_j) = H(p_j)+C=H(u^-)+C\quad \text { and }\nonumber \\&\quad \tilde{H}(u^+) = \tilde{H}(p_l) = H(p_l)+C=H(u^+)+C. \end{aligned}$$

(101)

We combine Eq. (101) with Eq. (91), (which we proved in the proof of (i)), we obtain

$$\begin{aligned} \frac{d\bar{x}}{dt} = \frac{H(u^+)-H(u^-)}{u^+- u^-} = \frac{H(u^+)+C-(H(u^-)+C)}{u^+- u^-} = \frac{\tilde{H}(u^+)-\tilde{H}(u^-)}{u^+- u^-}. \end{aligned}$$

Therefore, the equality in (100) holds.

Next, we prove the inequality in Eq. (100). Since $u_0\in (u^-,u^+)\subseteq [p_1,p_m]$, by assumption there holds $\tilde{H}(u_0) \geqslant H(u_0) +C$. Taken together with Eqs. (92) and (101), we get

$$\begin{aligned}&\frac{\tilde{H}(u^+) - \tilde{H}(u_0)}{u^+- u_0} \leqslant \frac{H(u^+)+C - (H(u_0)+C)}{u^+- u_0}\\&\leqslant \frac{H(u^+) - H(u^-)}{u^+- u^-} = \frac{\tilde{H}(u^+) - \tilde{H}(u^-)}{u^+- u^-}, \end{aligned}$$

which shows that the inequality in Eq. (100) holds.

Hence, we can invoke [36, Prop 2.1] to conclude that u is the entropy solution to the conservation law (22).

Proof of (ii) (necessity): Suppose that u is the entropy solution to the conservation law (22). We prove that there exists $C\in \mathbb {R}$ such that $\tilde{H}(p_i)=H(p_i)+C$ for any i and $\tilde{H}(p)\geqslant H(p)+C$ for any $p\in [p_1,p_m]$.

By Lemma B.2, for each $i\in \{1,\dots ,m\}$ there exist $x\in \mathbb {R}$ and $t>0$ such that

$$\begin{aligned} f(\cdot ,t) \text { is differentiable at }x, \text { and }\nabla _x f(x,t)=p_i. \end{aligned}$$

(102)

Moreover, the proof of Lemma B.2 implies there exists $T>0$ such that for any $0<t<T$, there exists $x\in \mathbb {R}$ such that Eq. (102) holds. As a result, there exists $t>0$ such that for each $i\in \{1,\dots ,m\}$, there exists $x_i\in \mathbb {R}$ satisfying Eq. (102) at the point $(x_i,t)$, which implies $u(x_i,t) = p_i$. Note that $p_i\ne p_j$ implies that $x_i \ne x_j$. (Indeed, if $x_i = x_j$, then $p_i = \nabla _x f(x_i,t) = \nabla _x f(x_j,t) = p_j$ which gives a contradiction since $p_i \ne p_j$ by assumption (A1).) As mentioned before, the function $u(\cdot ,t) \equiv \nabla _{x}f$ is a monotone non-decreasing operator and $p_i$ is increasing with respect to i, and therefore $x_1<x_2<\dots <x_m$. Since u is piecewise constant, for each $k\in \{1,\dots ,m-1\}$ there exists a curve of discontinuity of u with $u = p_k$ on the left-hand side of the curve and $u=p_{k+1}$ on the right-hand side of the curve. Let $\bar{x}(s)$ be such a curve and let $u^-$ and $u^+$ be the corresponding numbers defined in Eq. (90). The argument above proves that we have $u^-= p_k$ and $u^+= p_{k+1}$.

Since u is the piecewise constant entropy solution, we invoke [36, Prop 2.1] to conclude that the two aforementioned conditions hold for the curve $\bar{x}(s)$, i.e., (100) holds with $u^-= p_k$ and $u^+= p_{k+1}$. From the equality in (100) and Eq. (91) proved in (i), we deduce

$$\begin{aligned} \frac{\tilde{H}(p_{k+1})-\tilde{H}(p_k)}{p_{k+1} - p_k} = \frac{\tilde{H}(u^+)-\tilde{H}(u^-)}{u^+- u^-} = \frac{d\bar{x}}{dt} = \frac{H(u^+)-H(u^-)}{u^+- u^-} = \frac{H(p_{k+1})-H(p_k)}{p_{k+1} - p_k}. \end{aligned}$$

Since k is an arbitrary index, the equality above implies that $\tilde{H}(p_{k+1})-\tilde{H}(p_k) = H(p_{k+1})-H(p_k)$ holds for any $k\in \{1,\dots ,m-1\}$. Therefore, there exists $C\in \mathbb {R}$ such that

$$\begin{aligned} \tilde{H}(p_k) = H(p_k)+C \quad \text { for any }k\in \{1,\dots ,m\}. \end{aligned}$$

(103)

It remains to prove $\tilde{H}(u_0)\geqslant H(u_0)+C$ for all $u_0\in [p_k,p_{k+1}]$. If this inequality holds, then the statement follows because k is an arbitrary index. We already proved that $\tilde{H}(u_0)\geqslant H(u_0)+C$ for $u_0=p_k$ with $k\in \{1,\dots ,m\}$. Therefore, we need to prove that $\tilde{H}(u_0)\geqslant H(u_0)+C$ for all $u_0\in (p_k,p_{k+1})$. Let $u_0\in (p_k,p_{k+1})$. By Eq. (103) and the inequality in (100), we have

$$\begin{aligned} \frac{H(p_{k+1})+C - \tilde{H}(u_0)}{p_{k+1} - u_0} = \frac{\tilde{H}(u^+) - \tilde{H}(u_0)}{u^+- u_0} \leqslant \frac{\tilde{H}(u^+) - \tilde{H}(u^-)}{u^+- u^-} = \frac{H(p_{k+1}) - H(p_k)}{p_{k+1} - p_k}. \end{aligned}$$

(104)

By Lemma D.2 and a straightforward computation, we also have

$$\begin{aligned} \frac{H(p_{k+1}) - H(u_0)}{p_{k+1} - u_0} = \frac{H(p_{k+1}) - H(p_k)}{p_{k+1} - p_k}. \end{aligned}$$

(105)

Comparing Eqs. (104) and (105), we obtain $\tilde{H}(u_0)\geqslant H(u_0)+C$. Since k is arbitrary, we conclude that $\tilde{H}(u_0)\geqslant H(u_0)+C$ holds for all $u_0\in [p_1,p_m]$ and the proof is complete.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Darbon, J., Langlois, G.P. & Meng, T. Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures. Res Math Sci 7, 20 (2020). https://doi.org/10.1007/s40687-020-00215-6

Download citation

Received: 27 October 2019
Accepted: 22 June 2020
Published: 14 July 2020
DOI: https://doi.org/10.1007/s40687-020-00215-6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differential equations via neural network architectures

Abstract

Access this article

Similar content being viewed by others

Polynomial Neural Forms Using Feedforward Neural Networks for Solving Differential Equations

Solving Differential Equations Using Feedforward Neural Networks

Computational characteristics of feedforward neural networks for solving a stiff differential equation

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Proofs of lemmas in Section 3.1

1.1 A.1 Proof of Lemma 3.1

1.2 A.2 Proof of Lemma 3.2

B Proof of Theorem 3.1

1.1 B.1 Statement and proof of Lemma B.1

Lemma B.1

Proof

1.2 B.2 Statement and proof of Lemma B.2

Lemma B.2

Proof

1.3 B.3 Statement and proof of Lemma B.3

Lemma B.3

Proof

1.4 B.4 Proof of Theorem 3.1

C Connections between the neural network (17) and the viscous HJ PDE (18)

Lemma C.1

Proof

D Proof of Proposition 3.1

1.1 D.1 Statement and proof of Lemma D.1

Lemma D.1

Proof

1.2 D.2 Statement and proof of Lemma D.2

Lemma D.2

Proof

1.3 D.3 Statement and proof of Lemma D.3

Lemma D.3

Proof

1.4 D.4 Proof of Proposition 3.1

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation