Abstract
The subdifferential calculus for the expectation of nonsmooth random integrands involves many fundamental and challenging problems in stochastic optimization. It is known that for Clarke regular integrands, the Clarke subdifferential of the expectation equals the expectation of their Clarke subdifferential. In particular, this holds for convex integrands. However, little is known about the calculation of Clarke subgradients for the expectation of non-regular integrands. The focus of this contribution is to approximate Clarke subgradients for the expectation of random integrands by smoothing methods applied to the integrand. A framework for how to proceed along this path is developed and then applied to a class of measurable composite max integrands. This class contains non-regular integrands from stochastic complementarity problems as well as stochastic optimization problems arising in statistical learning.
Similar content being viewed by others
References
Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning I: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)
Aumann, R.J.: Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12 (1965)
Blundell, R., Powell, J.L.: Censored regression quantiles with endogenous regression. J. Econom. 141, 65–83 (2007)
Burke, J.V.: Second order necessary and sufficient conditions for convex composite NDO. Math. Program. 38, 287–302 (1987)
Burke, J.V., Hoheisel, T.: Epi-convergent smoothing with applications to convex composite functions. SIAM J. Optim. 23, 1457–1479 (2013)
Burke, J.V., Hoheisel, T., Kanzow, C.: Gradient consistency for integral-convolution smoothing functions. Set-Valued Var. Anal. 21, 359–376 (2013)
Burke, J.V., Luke, D.R.: Variational analysis applied to the problem of optimal phase retieval. SIAM J. Control Optim. 42, 576–595 (2003)
Chen, C., Mangasarian, O.L.: A class of smoothing functions for nonlinear and mixed complementarity problems. Comput. Optim. Appl. 5, 97–138 (1996)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134, 71–99 (2012)
Chen, X., Fukushima, M.: A smoothing method for a mathematical program with P-matrix linear complementarity constraints. Comput. Optim. Appl. 27, 223–246 (2004)
Chen, X., Fukushima, M.: Expected residual minimization method for stochastic linear complementarity problems. Math. Oper. Res. 30, 1022–1038 (2005)
Chen, X., Qi, L., Sun, D.: Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities. Math. Comput. 67, 519–540 (1998)
Chen, X., Wets, R., Zhang, Y.: Stochastic variational inequalities: residual minimization smoothing sample average approximations. SIAM J. Optim. 22, 649–673 (2012)
Clarke, F.H.: Optimization and Nonsmooth Analysis, Volume 5 of Classics in Applied Mathematics. SIAM, Philadelphia (1990)
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Springer, New York (1998)
Dunford, N., Schwartz, J.T.: Linear Operators, Part1: General Theory. Wiley, Hoboken (1988)
Folland, G.B.: Real Analysis, 2nd edn. Wiley, New York (1999)
Hildenbrand, W.: Core and Equilibria of a Large Economy. Princeton University Press, Princeton (1974)
Lyapunov, A.: Sur les fonctions-vecteur complétement additives. Bull. Acad. Sci. USSR Ser. Math. 4, 465–478 (1940)
Mordukhvich, B.: Variational Analysis and Generalized Differentiation II. Springer, Berlin (2006)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1906 (2009)
Ralph, D., Xu, H.: Convergence of stationary points of sample average two stage stochastic programs: a generalized equation approach. Math. Oper. Res. 36, 568–592 (2011)
Richter, H.: Verallgemeinerung eines in der statistik benötigten satzes der masstheorie. Math. Ann. 150, 85–90 (1963)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000)
Rockafellar, R.T.: Integral functionals, normal integrands and measurable selections. In: Nonlinear Operators in the Calculus of Variations, Volume 543 in Lecture Notes in Mathematics, pp. 157–207. Springer, New York (1976)
Rockafellar, R.T., Wets, R.: On the interchange of subdifferentiation and conditional expectation for convex functions. Stochastics 7, 173–182 (1982)
Rockafellar, R.T., Wets, R.: Variational Analysis. Springer, New York (1998)
Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, San Francisco (1976)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)
Tardella, F.: A new proof of the Lyapunov convexity theorem. SIAM J. Control Optim. 28, 478–481 (1990)
Wets, R.: Stochastic programming. In: Nemhauser, G.L., et al. (eds.) Handbooks in OR & MS, vol. 1, pp. 573–629 (1989)
Xu, H.: An implicit programming approach for a class of stochastic mathematical programs with complementarity constraints. SIAM J. Optim. 16, 670–696 (2006)
Xu, H., Ye, J.J.: Necessary optimality conditions for two-stage stochastic programs with equilibaium constraints. SIAM J. Optim. 20, 1685–1715 (2010)
Xu, H., Zhang, D.: Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Program. 119, 371–401 (2009)
Acknowledgements
We would like to thank Associate Editor and two referees for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is support by NSF Grant No. DMS-1908890, Hong Kong Research Grant Council Grant 153000/17P, and NSFC Grant Nos. 11871276 and 11571178.
Appendix: Background
Appendix: Background
1.1 Finite-dimensional variational analysis
Since we allow mappings to have infinite values, it is convenient to define the extended reals \({{\overline{{\mathbb {R}}}}}:={\mathbb {R}}\cup {\{+\infty \}}\). The effective domain of \(f:\,{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\), denoted \({\text{ dom }\,}f\subseteq {\mathbb {R}}^n\), is the set on which f is finite. To avoid certain pathological mappings the discussion is restricted to proper not everywhere infinite) lower semi-continuous (lsc) functions. Of particular importance is the epigraph of such functions: \({\text{ epi }\, f}:=\left\{ (x,\mu )\,\left| \,f(x)\le \mu \right. \right\} \). We have that f is lsc if and only if \({\text{ epi }\, f}\) is closed, and f is convex if and only if \({\text{ epi }\, f}\) is convex.
Definition 8
(Subderivatives) [27, Exercise 9.15] For a locally Lipschitz function \(f:\,{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\) near a point \({{u}_*}\in {\mathbb {R}}^n\) with \(f({{u}_*})\) finite,
- (i)
the subderivative\(df({{u}_*}):{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\) is defined by
$$\begin{aligned} df({{u}_*})(w):= \underset{\tau \downarrow 0}{\lim \inf ~} \frac{f({{u}_*}+\tau w) - f({{u}_*})}{\tau }; \end{aligned}$$ - (ii)
the regular subderivative (or the Clarke generalized directional derivative when f is locally Lipschitz) \(~{\widehat{d}}f({{u}_*}):{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}~\) is defined by
$$\begin{aligned} {{{\widehat{d}}}}f({{u}_*})(w):= \underset{u\rightarrow {{u}_*},\,\tau \downarrow 0}{\lim \sup }\frac{f(u+\tau w)-f(u)}{\tau }. \end{aligned}$$
Definition 9
(Subgradients, subdifferentials and subdifferential regularity) Consider a locally Lipschitz function \(f:{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\), a point \(v\in {\mathbb {R}}^n\), and a point \({{u}_*}\in {\mathbb {R}}^n\) with \(f({{u}_*})\) finite.
- (i)
[27, Theorem 8.49] The vector v is a Clarke subgradient of f at \({{u}_*}\) if v satisfies
$$\begin{aligned} {{{\widehat{d}}}}f({{u}_*})(w)\ge \left\langle v\, ,\, w\right\rangle ~\forall ~w\in {\mathbb {R}}^n. \end{aligned}$$We call the set of Clarke subgradients v the Clarke subdifferential of f at \({{u}_*}\) and denote this set by \(\partial f({{u}_*})\).
- (ii)
[27, Corollary 8.19] \(f:\,{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\) is said to be subdifferentially regular (or Clarke regular) at \({{u}_*}\in {\text{ dom }\,}f\) with \(\partial f({{u}_*})\ne \emptyset \) if
$$\begin{aligned} df({{u}_*})(w) = {{{\widehat{d}}}}f({{u}_*})(w)\quad \forall \, w\in {\mathbb {R}}^n. \end{aligned}$$ - (iii)
[14, Definition 2.6.1] [12] The vector v is a B-subgradient of f at \({{u}_*}\) if
$$\begin{aligned} v=\lim _{u^k\rightarrow {{u}_*}} \nabla f(u^k), \quad \mathrm{where }\,\, f \,\, \text{ is } \text{ differentiable } \text{ at }\,\, u^k. \end{aligned}$$We call the set of B-subgradients v of f at \({{u}_*}\) the B-subdifferential of f at \({{u}_*}\) and denote this set by \(\partial ^B f({{u}_*})\).
- (iv)
[27, Definition 8.3] The vector v is an M-subgradient of f at \({{u}_*}\) if there are sequences \( u^k\rightarrow {{u}_*}\) and \(v^k \rightarrow v\) with
$$\begin{aligned} \underset{u\rightarrow u^k}{\lim \inf ~}\frac{f(u)-f(u^k) - \left\langle v^k\, ,\, u-u^k\right\rangle }{\Vert u-u^k\Vert }\ge 0. \end{aligned}$$We call the set of M-subgradients v of f at \({{u}_*}\) the M-subdifferential of f at \({{u}_*}\) and denote this set by \(\partial ^M f({{u}_*})\).
Remark 6
In [27], the notion of subdifferential regularity is defined in [27, Definition 7.25]. In the definition given above we employ characterizations of this notion given by the cited results. Note that subdifferential mappings are multi-functions.
Definition 10
(Strict continuity and strict differentiability) Let \(H:\,D\rightarrow {\mathbb {R}}^m\,\), \(D\subseteq {\mathbb {R}}^n\), and \(h:\,{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\).
- (i)
[Strict Continuity [27, Definition 9.1]] We say that H is strictly continuous at \({\bar{x}}\in {\text{ int }\left( D\right) }\) if
$$\begin{aligned} \mathrm {lip}\, H({\bar{x}}):=\limsup _{{\mathop {x\ne x'}\limits ^{x,x'\rightarrow {\bar{x}}}}} \frac{\left\| H(x')-H(x)\right\| }{\left\| x'-x\right\| }<\infty \, . \end{aligned}$$ - (ii)
[Strict Differentiability [27, Definition 9.17]] We say that h is strictly differentiable at a point \({\bar{x}}\in {\text{ dom }\,}{h}\) if h is differentiable at \({\bar{x}}\) and
$$\begin{aligned} \lim _{{\mathop {x\ne x'}\limits ^{x,x'\rightarrow {\bar{x}}}}} \frac{h(x')-h(x)-\left\langle \nabla h({\bar{x}})\, ,\, x'-x\right\rangle }{\left\| x'-x\right\| }=0\, . \end{aligned}$$
It is easily seen that if h is continuously differentiable on an open set U, then h is strictly differentiable and subdifferentially regular on U with \(\partial h(x)=\{\nabla h(x)\}\) for all \(x\in U\) ([27, Theorem 9.18 and Exercise 9.64]).
The notion of strict continuity of f at a point \({\bar{x}}\) implies the existence of a neighborhood of \({\bar{x}}\) on which f is Lipschitz continuous, that is, f is locally Lipschitz continuous at \({\bar{x}}\) where the local Lipschitz modulus is lower bounded by \(\mathrm {lip}\, H({\bar{x}})\). In this light, Definition 8 and Definition 9(ii) combine to tell us that
wherever f is strictly continuous and subdifferentially regular at \({{u}_*}\). Moreover, in this case, [27, Theorem 8.30] tells us that
Remark 7
(Subdifferentials of Compositions) If \(g:\,X\subset {\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\) is given as the composition of two functions \(f:\,Y\subset {\mathbb {R}}^m\rightarrow {{\overline{{\mathbb {R}}}}}\,\) and \(h:\,X\rightarrow Y\,\), i.e. \(g(x)=(f\circ h)(x)=f(h(x))\), then we write \(\partial g(x)=\partial (f\circ h)(x).\) On the other hand, we write \(\partial f(h(x))\) to denote the subdifferential of f evaluated at h(x).
Theorem 9
(Strict differentiability and the subdifferential) [27, Theorem 9.18] [14, Proposition 2.2.4] Let \(h:\,{\mathbb {R}}^n\rightarrow {{\overline{{\mathbb {R}}}}}\,\) with \({\bar{x}}\in {\text{ dom }\,}{h}\). Then h is strictly differentiable at \({\bar{x}}\) if and only if h is strictly continuous at \({\bar{x}}\) and \(\partial h({\bar{x}})=\{\nabla h({\bar{x}})\}\).
1.2 Measurable multi-functions
We now review some of the properties of measurable multi-functions used in this paper [2, 15, 18, 27]. For more information on this topic, we refer the interested reader to [27, Chapter 14] and [25].
A multi-function, or multi-valued mapping, S from \({\mathbb {R}}^k\) to \({\mathbb {R}}^s\) is a mapping that takes points in \({\mathbb {R}}^k\) to sets in \({\mathbb {R}}^s\), and is denoted by \(S: {\mathbb {R}}^k \rightrightarrows {\mathbb {R}}^s\). The outer limit of S at \({{\bar{x}}}\in {\mathbb {R}}^k\)relative to\(X\subseteq {\mathbb {R}}^k\) is
and the inner limit of S at \({{\bar{x}}}\)relative toX is
Here the notation \(\{x^k\}\rightarrow _X {{\bar{x}}}\) means that \(\{x^k\}\subseteq X\) with \(x^k\rightarrow {{\bar{x}}}\). If \(X={\mathbb {R}}^k\), we write \(x\rightarrow {{\bar{x}}}\) instead of \(x\rightarrow _{{\mathbb {R}}^k}{{\bar{x}}}\). We say that S is outer semicontinuous (osc) at \({{\bar{x}}}\)relative toX if
When the outer and inner limits coincide, we write
and say that S is contiuous at \({{\bar{x}}}\)relative toX.
Let \(\varXi \) be a nonempty subset of \({\mathbb {R}}^\ell \) and let \({{\mathcal {A}}}\) be a \(\sigma \)-field of subsets of \(\varXi \), called the measurable subsets of \(\varXi \) or the \({{\mathcal {A}}}\)-measurable subsets. Let \(\rho :\,{{\mathcal {A}}}\rightarrow [0,1]\,\) be a \(\sigma \)-finite Borel regular, complete, non-atomic, probability measure on \({{\mathcal {A}}}\). The corresponding measure space is denoted \((\varXi ,{{\mathcal {A}}}, \rho )\). A multi-function \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) is said to be \({{\mathcal {A}}}\)-measurable, or simply measurable, if for all open sets the set is in \({{\mathcal {A}}}.\) The multi-function \(\varPsi \) is said to be \({{\mathcal {A}}}\otimes {{\mathcal {B}}}^n\)-measurable if \({{\,\mathrm{gph}\,}}(\varPsi ) = \left\{ (\xi ,v)\,\left| \,v\in \varPsi (\xi )\right. \right\} \in {{\mathcal {A}}}\otimes {{\mathcal {B}}}^n\), where \({{\mathcal {B}}}^n\) denotes the Borel \(\sigma \)-field on \({\mathbb {R}}^n\) and \({{\mathcal {A}}}\otimes {{\mathcal {B}}}^n\) is the \(\sigma \)-field on \(\varXi \times {\mathbb {R}}^n\) generated by all sets \(A\times D\) with \(A\in {{\mathcal {A}}}\) and \(D\in {{\mathcal {B}}}^n\). If \(\varPsi (\xi )\) is closed for each \(\xi \) then \(\varPsi \) is closed-valued. Similarly, \(\varPsi \) is said to be convex-valued if \(\varPsi (\xi )\) is convex for each \(\xi \). Finally, we note that the completeness of the measure space guarantees the measurability of subsets of \(\varXi \) obtained as the projections of measurable subsets of \(\varXi \times {\mathbb {R}}^n\):
In particular, this implies that the multi-function \(\varPsi \) is \({{\mathcal {A}}}\)-measurable if and only if \({{\,\mathrm{gph}\,}}(\varPsi )\) is \({{\mathcal {A}}}\otimes {{{\mathcal {B}}}}^n\)-measurable [27, Theorem 14.8].
Let \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\), and denote by \(~{{\mathcal {S}}}(\varPsi )~\) the set of \(\rho \)-measurable functions \(f:\,\varXi \rightarrow {\mathbb {R}}^n\,\) that satisfy \(f(\xi )\in \varPsi (\xi )\) for a.e. \(\xi \in \varXi \). We call \(~{{\mathcal {S}}}(\varPsi )~\) the set of measurable selections of \(\varPsi \).
Theorem 10
(Measurable selections) [27, Corollary 14.6] A closed-valued measurable map \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) always admits a measurable selection.
We say that the measurable multi-function \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) is integrably bounded, or for emphasis \(\rho \)-integrably bounded, if there is a \(\rho \)-integrable function \(a:\,\varXi \rightarrow {\mathbb {R}}^n_+\,\) such that
for all pairs \((\xi ,v)\in \varXi \times {\mathbb {R}}^n\) satisfying \(v\in \varPsi (\xi ).\) Here and elsewhere we interpret vector inequalities as element-wise inequalities. Let \(1\le p\le \infty \). When \(\varXi = {\mathbb {R}}^{\ell }\), we let \(L^p_m({\mathbb {R}}^{\ell },{{\mathcal {A}}},\rho )\) denote the Banach space of functions mapping \({\mathbb {R}}^{\ell }\) to \({\mathbb {R}}^m\). When \(p=2\), \(L^2_m({\mathbb {R}}^{\ell },{{\mathcal {A}}},\rho )\) is a Hilbert space with the inner product on the measure space \(({\mathbb {R}}^{\ell },{{\mathcal {A}}},\rho )\) given by
where \(\left\langle \cdot \, ,\, \cdot \right\rangle \) denotes the Euclidean inner product. If \(\rho ({\mathbb {R}}^{\ell })<\infty \), then
If the function a in (58) is such that \(\left\| a(\xi )\right\| _{\scriptscriptstyle p}\) is integrable with respect to the measure \(\rho \) on the measure space \((\varXi ,{{\mathcal {A}}},\rho )\), then the multi-function \(\varPsi \) is said to be \(L^p\)-bounded, where \(\left\| \cdot \right\| _{\scriptscriptstyle p}\) denotes the p-norm of vectors.
Proposition 2
[7, Proposition 2.2] and [16, Corollary IV.8.4](Weak compactness of measurable selections) Let the multi-function \(\varPsi :{\mathbb {R}}^{\ell }\rightrightarrows {\mathbb {R}}^m\) be closed- and convex-valued, and \(L^2\)-bounded on \(L^2_m({\mathbb {R}}^{\ell },{{\mathcal {M}}}^n,\lambda _n)\), where \({{\mathcal {M}}}^n\) is the Lebesgue field on \({\mathbb {R}}^n\) and \(\lambda _n\) is n-dimensional Lebesgue measure. Then the set of measurable selections \({{\mathcal {S}}}(\varPsi )\) is a weakly compact, convex set in \(L^2_m({\mathbb {R}}^{\ell },{{\mathcal {M}}}^n,\lambda _n)\).
We now develop some properties of integrals of multi-valued mappings. Given a measurable multi-function \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\), we define the integral of \(\varPsi \) over \(\varXi \) with respect to the measure \(\rho \) by
The next theorem, due to Hildenbrand [18], is a restatement of Theorems 3 and 4 of Aumann [2] for multi-functions on the non-atomic measure space \((\varXi ,{{\mathcal {A}}},\rho )\). These results are central to the theory of integrals of multi-valued functions.
Theorem 11
(Integrals of multi-functions) [18, Theorem 4 and Proposition 7] The following properties hold for integrably bounded multi-functions \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) on non-atomic measure spaces \((\varXi ,{{\mathcal {A}}},\rho )\).
- (a)
If \(\varPsi \) is \({{\mathcal {A}}}\otimes {{\mathcal {B}}}^n\)-measurable, then \(\int \varPsi d\rho = \int {{\,\mathrm{conv\,}\,}}\varPsi d\rho \).
- (b)
If \(\varPsi \) is closed valued (not necessarily \({{\mathcal {A}}}\otimes {{\mathcal {B}}}^n\)-measurable), then \(\int \varPsi d\rho \) is compact.
We conclude this section with a very elementary, but useful lemma on measurable tubes, i.e. multi-valued mappings \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) of the form
where \({\mathbb {B}}:=\left\{ x\,\left| \,\left\| x\right\| _2\le 1\right. \right\} \) is the closed unit ball in \({\mathbb {R}}^n\) and \(\kappa :\,\varXi \rightarrow {\mathbb {R}}_+\,\) is measurable.
Lemma 11
(Tubes) Let \(\varPsi : \varXi \rightrightarrows {\mathbb {R}}^n\) be a measurable tube as in (59) with \(\kappa \in L^2_1(\varXi ,{{\mathcal {A}}},\rho )\) non-negative a.e. on \(\varXi \). Then, for every \(E\in {{\mathcal {A}}}\), \(\int _E\varPsi (\xi ) d\rho \subseteq \left[ \int _E\kappa (\xi ) d\rho \right] {\mathbb {B}}\subseteq \left\| {{\kappa }}\right\| _2\rho (E){\mathbb {B}}\).
Proof
The mapping \(\varPsi \) in (59) is obviously closed valued and measurable. Therefore, Theorem 10 tells us that \({{\mathcal {S}}}(\varPsi )\) is non-empty. Let \(E\in {{\mathcal {A}}}\) and \(s\in {{\mathcal {S}}}(\varPsi )\). Then
so that \(\int _Es(\xi ) d\rho \in \left[ \int _E\kappa (\xi ) d\rho \right] {\mathbb {B}}\). This proves the lemma since \( \int _E \kappa (\xi ) d\rho =\left\langle {{\kappa }}\, ,\, {{\mathcal {X}}}_E\right\rangle \le \left\| {{\kappa }}\right\| _2\rho (E).\)
Rights and permissions
About this article
Cite this article
Burke, J.V., Chen, X. & Sun, H. The subdifferential of measurable composite max integrands and smoothing approximation. Math. Program. 181, 229–264 (2020). https://doi.org/10.1007/s10107-019-01441-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01441-9