Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The Modified HZ Conjugate Gradient Algorithm for Large-Scale Nonsmooth Optimization

  • Gonglin Yuan,

    Affiliations Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi 530004, China, College of Mathematics and Information Science, Guangzhou University, Guangzhou, Guangdong 510006, China

  • Zhou Sheng ,

    szhou03@live.com

    Affiliation Guangxi Colleges and Universities Key Laboratory of Mathematics and Its Applications, College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi 530004, China

  • Wenjie Liu

    Affiliations School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China, Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China

Abstract

In this paper, the Hager and Zhang (HZ) conjugate gradient (CG) method and the modified HZ (MHZ) CG method are presented for large-scale nonsmooth convex minimization. Under some mild conditions, convergent results of the proposed methods are established. Numerical results show that the presented methods can be better efficiency for large-scale nonsmooth problems, and several problems are tested (with the maximum dimensions to 100,000 variables).

Introduction

Consider the following optimization problem: (1) If the constrained set satisfies Φ = ℜn, then Eq (1) is called unconstrained optimization; if Φ = {xlxu, x ∈ ℜn}, where n is the number of variables and the vectors l and u represent the lower and upper bounds on the variables, then Eq (1) is called box-constrained optimization; and if Φ = {xhi(x) = 0, gj(x) ≤ 0, x ∈ ℜn, i = 1, ⋯, r, j = 1, ⋯, k}, where r and k are positive integers, then Eq (1) is called normal constrained optimization. If the objective function f: ℜn → ℜm is continuously differentiable, then Eq (1) is called smooth optimization; if f: ℜn → ℜm is a nondifferentiable function, then Eq (1) is called nonsmooth optimization. For a given objective function f and constrained set Φ, Eq (1) will be called the corresponding optimization problem.

Optimization problems are encountered in many fields, including engineering, management, finance, medicine, and biology, and similarly, optimization models can be used in many fields (see [113]). At present, there are many efficient methods available for solving optimization problems [1424]. However, many challenging optimization problems exist, for example, large-scale problems and nonsmooth problems. The workload increases greatly as the dimension of the problem increases, causing the required CPU time to become very long. For certain large-scale problems, a computer may fail to solve them. To address this issue, more efficient algorithms should be designed. It is well known that spectral gradient approaches, conjugate gradient (CG) techniques and limited-memory quasi-Newton methods can cope with large-scale optimization problems. CG methods in particular have been widely used in many practical large-scale optimization problems because of their simplicity and low memory requirements.

Nonsmooth problems are believed to be very difficult to solve, even when they are unconstrained. The direct application of smooth gradient-based methods to nonsmooth problems may lead to a failure in optimality conditions, convergence, or gradient approximation [25]. Haarala et al. (see, e.g., [26, 27]) presented limited-memory bundle methods for large-scale nonsmooth unconstrained and constrained minimization and demonstrated their application to test problems of up to one thousand variables in dimension. Karmitsa et al. [28] tested and compared various methods of both types as well as several methods that may be regarded as hybrids of these two approaches and/or others; the dimensions of the tested nonsmooth problems ranged from 20 to 4000, and the most effective method for large- and extra-large-scale problems was found to be that of [27]. Therefore, special tools for solving nonsmooth optimization problems are needed.

This paper is organized as follows. In the next section, the Hager and Zhang (HZ) CG method is presented, and a modified HZ (MHZ) CG formula is proposed. In Section 3, the application of the HZ and MHZ methods to large-scale nonsmooth problems is discussed, global convergence is established, and numerical experiments on nonsmooth problems are reported. In the last section, conclusions are presented. Throughout the paper, ‖⋅‖ denotes the Euclidean norm.

The HZ CG formula [29, 30] and a modification thereof

For convenience, we rewrite Eq (1) as the following special case: (2) where f: ℜn → ℜ is continuously differentiable; Eq (2) is called an unconstrained optimization problem. CG methods are a class of effective line search methods for solving Eq (2), especially when the dimension n is large. The iterative formula for CG methods is defined as follows: (3) where xk is the current point in the iteration, αk > 0 is the step length, and dk is the search direction; the latter is determined as (4) where gk+1 = ∇f(xk+1) is the gradient of f(x) at point xk+1 and βk ∈ ℜ is a scalar. The parameter βk is chosen such that when the method is applied to minimize a strongly convex quadratic function, the directions dk and dk−1 are conjugate with respective to the Hessian of the quadratic function. Let where from [29] (5) with yk = gk+1gk; we call this formula for [29] the HZ formula. If , then Eq (4) with satisfies (6) This method exhibits global convergence for the strongly convex function f. To obtain a similar result for a general nonlinear function, Hager and Zhang [30] proposed the formula where η > 0 is a constant; in their experiments, they set η = 0.01. This new parameter also has the property expressed in Eq (6).

Based on the formula for , if we let , we can obtain the more general formula (7) In this paper, we set where c ∈ (0, 1) is a scalar. It is easy to deduce that ; moreover, if , then Eq (7) is identical to the HZ formula. The modified formula has the following features:

(i) The new formula can overcome the shortcomings of the CG parameter βk. If , it is not difficult to find that (8)

(ii) Another property of this formula is that it can ensure that the new direction dk+1 in Eq (4) with belongs to a trust region without the need for any line search technique. By Eq (4), for , we obtain (9) By combining this result with Step 1 of Algorithm 2.2, we find that for all k.

(iii) If Tk ≠ 0, then the new direction dk+1 in Eq (4) when possesses the following sufficient descent property: (10) which holds for all k. Now let us analyze this result. If k = 0, we have d1 = −g1 and satisfying Eq (10). For k ≥ 1, because Tk ≠ 0, multiplying Eq (4) by yields (11) Let and ; then, by the inequality , we have

Nonsmooth Convex Problems and Their Results

Consider the unconstrained convex optimization problem (12) where f: ℜn → ℜ is a possibly nonsmooth convex function. For the special case that f is continuously differentiable, this optimization problem has been well studied for several decades. However, nonsmooth optimization problems of the form of Eq (12) also arise in many applications, such as image restoration [31] and optimal control [32]. The Moreau-Yosida regularization of f generates (13) where λ is a positive parameter; then, Eq (12) is equivalent to the following problem: (14) Let p(x) = argmin θ(z) and define a function Thus, p(x) is well defined and unique because the function θ(z) is strongly convex. Therefore, F(x) can be expressed as follows: F(x) possesses many known features (see, e.g., [3335]). The generalized Jacobian of F(x) and its property of BD-regularity are demonstrated in [36, 37]. Here, we list some additional findings regarding the function F(x), as follows:

(i) x is an optimal solution to Eq (12) ⇔ ∇F(x) = 0, i.e., p(x) = x.

(ii) (15) where (16) is the gradient of F.

(iii) The set of generalized Jacobian matrices ∂Bg(x) = {V ∈ ℜn×n: V = limxkxg(xk), xkDg} is nonempty and compact, where

Furthermore, every V ∈ ∂Bg(x) is a symmetric positive semidefinite matrix for each x ∈ ℜn because g is a gradient mapping of the convex function F.

(iv) There exist two constants, μ1 > 0 and μ2 > 0, and a neighborhood Ω of x that satisfy

Two algorithms for nonsmooth problems

Based on the above discussion, we present two algorithms for application to nonsmooth problems of the form Eq (14); afterward, we analyze the solution of Eq (12). In the following, unless otherwise noted, gk = g(xk) is defined as in Eq (16).

Algorithm 1 Algorithm 4.1.

Require:

 An initial point x0 ∈ ℜn, λ > 0, σ, η ∈ (0, 1), ρ ∈ (0, 1/2], ϵ ∈ [0, 1).

 Specify g0 by solving the subproblem Eq (13);

d0 ← −g0, k ← 0.

whilegk‖ > ϵ do

  Determine the step size αk = max{ρj|j = 0, 1, 2, ⋯} satisfying the following Armijo line search condition (17)

  xk+1 = xk + αk dk;

  Compute gk+1 by solving the subproblem Eq (13);

  ifgk+1‖ ≤ ϵ then

   break.

  else

   Compute dk as (18)

  end if

  xkxk+1, dkdk+1, kk + 1.

end while

Algorithm 4.2. Eq (18) of Algorithm 4.1 is replaced with the following: Compute dk as (19) and let k: = k + 1, go back to Step 2.

The following assumptions are needed to ensure the global convergence of Algorithms 4.1 and 4.2.

Assumption 4.1. (i) F is bounded from below and the sequence {Vk} is bounded, namely, there exists a constant M > 0 such that (20)

(ii) g is BD-regular at x, namely, item (iv) in Section 3 above holds.

By Assumption 4.1, it is not difficult to deduce that there exists a constant M* > 1 such that (21)

Lemma 1 The sequence {xk} is generated by Algorithm 4.1 (or Algorithm 4.2). Let Assumption 4.1 hold; then, for sufficiently large k, there exists a constant α* > 0 that satisfies (22)

Proof. Suppose that αk satisfies the Armijo line search condition Eq (17). The proof is complete if αk = 1 holds. Otherwise, let ; then, we have By performing a Taylor expansion, we obtain (23) where and the last inequality follows from Eq (20). By combining Eqs (21) and (23), we obtain (24) Thus, we find that Let , and the proof is complete.

Now, let us prove the global convergence of Algorithm 4.1.

Theorem 1 Suppose that the conditions in Lemma 1 hold. Then, we have (25) moreover, any accumulation point of xk is an optimal solution of Eq (12).

Proof. Suppose that Eq (25) is not true. Then, there must exist two constants, ϵ0 > 0 and k* > 0, that satisfy (26) By combining , Eqs (17), (22) and (26), we obtain (27) Because F(x) is bounded from below for all k, it follows from Eq (27) that This contradicts Therefore, Eq (25) holds. Let x* be an accumulation point of {xk} and, without loss of generality, suppose that there exists a subsequence {xk}K that satisfies (28) From Eq (28), we find that ‖g(x*)‖ = ‖∇F(x*)‖ = 0. Then, from property (i) of F(x) as given in Section 3, x* is an optimal solution of Eq (12). The proof is complete.

In a manner similar to Theorem 4.1 in [38], we can establish the linear convergence rate of Algorithm 4.1 (or Algorithm 4.2). Here, we simply state this property, as follows, but omit the proof.

Theorem 2 Let Assumptions 4.1 (i) and (ii) hold, and let x* be the unique solution of Eq (14). Then, there exist two constants b > 0 and r ∈ (0, 1) that satisfy (29) namely, the sequence {xk} generated by Algorithm 4.1 (or Algorithm 4.2) linearly converges to x*.

Numerical results for nonsmooth problems

In this section, we present several numerical experiments using Algorithms 4.1 and 4.2 and a modified Polak-Ribière-Polyak conjugate gradient algorithm (called MPRP) [18]. It is well known that the CG method is very effective for large-scale smooth problems. We will show that these two algorithms are also applicable to large-scale nonsmooth problems. The nonsmooth academic test problems that are listed, along with their initial points, in Table 1 are described in [27], where “Problem” is the name of the test problem, “x0” is the initial point, and the corresponding numbers of variables are also given, “fops” is the optimal value of the test problem. Problems 1-5 are convex functions, and the others are nonconvex functions. The detailed characteristics of these problems can be found in [27]. Because we wished to test the three considered methods for application to large-scale nonsmooth problems, problem dimensions of 5000, 10000, 50000, and 100000 were chosen. In our experiments, we found that problem 2 required a considerably amount of time to solve; therefore, we set its largest dimension to 50000.

thumbnail
Table 1. Test problems and their initial points and optimal value.

https://doi.org/10.1371/journal.pone.0164289.t001

Both algorithms were implemented using Fortran PowerStation 4.0 with double-precision arithmetic, and all experiments were run on a PC with a Core 2 Duo E7500 CPU @2.93 GHz with 2.00 GB of memory and the Windows XP operating system. The following parameters were chosen for Algorithms 4.1 and 4.2 and MPRP: σ = 0.8, ρ = 0.5, c = 0.01, ϵk = 1E − 15, and η = 0.01. We stopped the algorithms when the condition ‖g(x)‖ ≤ 1E − 5 or ∣F(xk+1) − F(xk)∣ ≤ 1E − 8 or |f(xk) − fops| ≤ 1E − 4 was satisfied. If the number of iterations exceeded ten thousand, the program would also terminate. Because a line search cannot always ensure the descent condition , an uphill search direction may arise in real numerical experiments, which may cause the line search rule to fail. To avoid this condition, the step size αk was accepted only if the search number in the line search was greater than five.

In the experiments, the subproblem Eq (13) was solved using the PRP CG method (called the sub-algorithm), and its numbers of iterations and function evaluations were added to those of Algorithm 4.1, Algorithm 4.2 or MPRP (called the main algorithm). In the sub-algorithm, if ‖∂f(xk)‖ ≤ 1E − 4 or f(xk+1) − f(xk) + ‖∂f(xk+1)‖2 − ‖∂f(xk)‖2 ≤ 1E − 3 (see [39]) holds, where ∂f(xk) is the subgradient of f(x) at point xk, then the algorithm terminates. The sub-algorithm will also terminate when the iteration number exceeds ten. For the line search, the Armijo line search technique was used and the step length was accepted if the search number was greater than five. The columns in Tables 2, 3 and 4 have the following meanings:

  1. Dim: the dimensions of problem.
  2. NI: the total number of iterations.
  3. NF: the number of function evaluations.
  4. g(x)‖: the norm of g(x) at the final iteration.
  5. Time: the CPU time in seconds.
  6. ffinal: the value of f(x) at the final iteration.

From the above three tables, it is not difficult to see that Algorithm 4.1 is superior to Algorithm 4.2 and MPRP in terms of NI, NF, and ‖g(x)‖. Algorithm 4.1 yields smaller values of NI and NF when the program terminates; moreover, the value of ‖g(x)‖ is smaller than that for Algorithm 4.2 in most cases. However, we also note that our algorithms can efficient solve the 3000 dimensional(with maximum dimensional) case for problems 3, 4, 5 and 8, if we increase the dimensional, these algorithms fail to converge to good minima and become stuck at local. To directly illustrate the performances of these two methods, we used the tool developed by Dolan and Moré [40] to analyze their efficiencies in terms of the number of iterations, number of function evaluations, and CPU time. In the following, Figs 1, 2 and 3 represent the results presented in Tables 2, 3 and 4 in terms of NI, NF, and Time, respectively.

From Figs 1 and 2, we can conclude that Algorithm 4.1 performs better than Algorithm 4.2 and MPRP do in terms of the numbers of iterations and function evaluations. Moreover, the excellence of Algorithm 4.1 can obviously be attributed to the superior theoretical properties of the MHZ method compared with the usual HZ method. However, Fig 3 indicates that Algorithm 4.2 is superior to Algorithm 4.1 and MPRP in terms of CPU time. Overall, all methods are very effective for application to large-scale nonsmooth optimization problems.

Conclusion

(i) In this paper, we focus on the HZ CG method and study the application of this method to solve nonsmooth optimization problems. Several results are presented that prove the efficiency of this method for application to large-scale problems of nonsmooth unconstrained optimization.

(ii) Motivated by the HZ formula, we also present a modified HZ CG formula. The modified HZ formula not only possesses the sufficient descent property of the HZ formula but also belongs to a trust region and has the non-negative scale

(iii) We report the results of applying three methods to solve large-scale nonsmooth convex minimization problems. Global convergence is achieved, and numerical experiments verify that both methods can be successfully used to solve large-scale nonsmooth problems.

(iv) Although the HZ and MHZ methods offer several key achievements for large-scale nonsmooth optimization, we believe that there are at least five issues that could be addressed to gain further improvements. The first is the scale c in the modified HZ CG algorithm, which could be adjusted. The second is the application of other CG methods for this type of optimization areas; perhaps there exists a more suitable CG method for this purpose. Regarding the third issue, it is well known that limited-memory quasi-Newton methods are effective techniques for solving certain classes of large-scale optimization problems because they require minimal storage; this inspires us to combine limited-memory quasi-Newton methods with the HZ CG technique to solve large-scale nonsmooth optimization. In future, we will also use the HZ CG method to investigate large-scale nonsmooth optimization with constraints; this is the fourth issue that we believe must be addressed. The last issue is the most important one, namely, the consideration of other optimality conditions and convergence conditions in nonsmooth problems should be paid. All of these issues will be addressed in our future work.

Acknowledgments

The authors would like to thank the referees for their valuable comments which greatly improve this paper. This work was supported by the Guangxi Science Fund for Distinguished Young Scholars (Grant No. 2015GXNSFGA139001), the National Natural Science Foundation of China (Grant No. 11261006 and 11661009), National Natural Science Foundation of China No. 61232016, National Natural Science Foundation of China No. U1405254, and PAPD issue of Jiangsu advantages discipline.

Author Contributions

  1. Formal analysis: GY.
  2. Funding acquisition: GY WL.
  3. Methodology: GY.
  4. Software: ZS WL.
  5. Writing – original draft: GY ZS.
  6. Writing – review & editing: ZS.

References

  1. 1. Fu Z, Ren K, Shu J, et al. Enabling personalized search over encrypted outsourced data with efficiency improvement. 2015.
  2. 2. Fu Z, Wu X, Guan C, et al. Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement. IEEE Transactions on Information Forensics and Security, 2016, 11(12): 2706–2716.
  3. 3. Gu B, Sheng V S. A robust regularization path algorithm for ν-support vector classification. 2016.
  4. 4. Gu B, Sheng VS, Tay KY, et al. Incremental support vector learning for ordinal regression. IEEE Transactions on Neural networks and learning systems, 2015, 26(7): 1403–1416. pmid:25134094
  5. 5. Gu B, Sun X, Sheng V S. Structural minimax probability machine. 2016.
  6. 6. Li J, Li X, Yang B, et al. Segmentation-based image copy-move forgery detection scheme. IEEE Transactions on Information Forensics and Security, 2015, 10(3): 507–518.
  7. 7. Pan Z, Zhang Y, Kwong S. Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Transactions on Broadcasting, 2015, 61(2): 166–176.
  8. 8. Pan Z, Lei J, Zhang Y, et al. Fast motion estimation based on content property for low-complexity H. 265/HEVC encoder. IEEE Transactions on Broadcasting, 2016, 62(3): 675–684.
  9. 9. Xia Z, Wang X, Sun X, et al. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 340–352.
  10. 10. Xia Z, Wang X, Sun X, et al. Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools and Applications, 2016, 75(4): 1947–1962.
  11. 11. Xia Z, Wang X, Zhang L, et al. A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Transactions on Information Forensics and Security, 2016, 11(11): 2594–2608.
  12. 12. Yuan C, Sun X, Lv R. Fingerprint liveness detection based on multi-scale LPQ and PCA. China Communications, 2016, 13(7): 60–65.
  13. 13. Zhou Z, Wang Y, Wu QM, et al. Effective and Efficient Global Context Verification for Image Copy Detection, IEEE Transactions on Information Forensics and Security, 2016.
  14. 14. Yuan G. Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems. Optimization Letters, 2009, 3(1): 11–21.
  15. 15. Yuan G, Lu X. A modified PRP conjugate gradient method. Annals of Operations Research, 2009, 166(1): 73–90.
  16. 16. Yuan G, Meng Z, Li Y. A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. Journal of Optimization Theory and Applications, 2016, 168(1): 129–152.
  17. 17. Yuan G, Zhang M. A three-terms Polak-Ribière-Polyak conjugate gradient algorithm for large-scale nonlinear equations. Journal of Computational and Applied Mathematics, 2015, 286: 186–195.
  18. 18. Yuan G, Wei Z, Li G. A modified Polak-Ribière-Polyak conjugate gradient algorithm for nonsmooth convex programs. Journal of Computational and Applied Mathematics, 2014, 255: 86–96.
  19. 19. Yuan G, Lu X, Wei Z. A conjugate gradient method with descent direction for unconstrained optimization. Journal of Computational and Applied Mathematics, 2009, 233(2): 519–530.
  20. 20. Yuan G, Lu X, Wei Z. BFGS trust-region method for symmetric nonlinear equations. Journal of Computational and Applied Mathematics, 2009, 230(1): 44–58.
  21. 21. Yuan G, Wei Z. New line search methods for unconstrained optimization. Journal of the Korean Statistical Society, 2009, 38(1): 29–39.
  22. 22. Yuan GL, Wei ZX. The superlinear convergence analysis of a nonmonotone BFGS algorithm on convex objective functions. Acta Mathematica Sinica, 2008, 24(1): 35–42.
  23. 23. Yuan G, Wei Z. Convergence analysis of a modified BFGS method on convex minimizations. Computational Optimization and Applications, 2010, 47(2): 237–255.
  24. 24. Yuan G, Duan X, Liu W, et al. Two New PRP Conjugate Gradient Algorithms for Minimization Optimization Models. PloS one, 2015, 10(10): e0140071. pmid:26502409
  25. 25. Lemaréchal C., Nondifferentiable optimization. In Optimization, Nemhauser GL, Rinnooy Kan AHG, and Todd MJ, Eds. Elsevier North- Holland, Inc., New York: 1989: 529–572.
  26. 26. Haarala M., Mäkelä M. M., Limited memory bundle algorithm for large bound constrained nonsmooth minimization problems, Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 1/2006, University of Jyväskylä, Finland, 2006.
  27. 27. Haarala M, Miettinen K, Mäkelä MM. New limited memory bundle method for large-scale nonsmooth optimization. Optimization Methods and Software, 2004, 19(6): 673–692.
  28. 28. Karmitsa N, Bagirov A, Mäkelä MM. Comparing different nonsmooth minimization methods and software. Optimization Methods and Software, 2012, 27(1): 131–153.
  29. 29. Hager WW, Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization, 2005, 16(1): 170–192.
  30. 30. Hager WW, Zhang H. Algorithm 851: CGD ESCENT, a conjugate gradient method with guaranteed descent. ACM Transactions on Mathematical Software (TOMS), 2006, 32(1): 113–137.
  31. 31. Mäkelä M. M., Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0, Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, No. B. 13/2003, University of Jyväkylä, Finland, 2003.
  32. 32. Mäkelä MM, Neittaanmäki P. Nonsmooth optimization: analysis and algorithms with applications to optimal control. Singapore: World Scientific, 1992.
  33. 33. Bonnans JF, Gilbert JC, Lemaréchal C, et al. A family of variable metric proximal methods. Mathematical Programming, 1995, 68(1-3): 15–47.
  34. 34. Correa R, Lemaréchal C. Convergence of some algorithms for convex minimization. Mathematical Programming, 1993, 62(1-3): 261–275.
  35. 35. Hiriart-Urruty JB, Lemaréchal C. Convex analysis and minimization algorithms II. Berlin, Heidelberg: Spring-Verlay. 1983. https://doi.org/10.1007/978-3-662-06409-2
  36. 36. Calamai PH, Moré JJ. Projected gradient methods for linearly constrained problems. Mathematical programming, 1987, 39(1): 93–116.
  37. 37. Qi L. Convergence analysis of some algorithms for solving nonsmooth equations. Mathematics of operations research, 1993, 18(1): 227–244.
  38. 38. Shi ZJ. Convergence of line search methods for unconstrained optimization. Applied Mathematics and Computation, 2004, 157(2): 393–405.
  39. 39. Fukushima M. A descent algorithm for nonsmooth convex optimization. Mathematical Programming, 1984, 30(2): 163–175.
  40. 40. Dolan ED, Morè JJ. Benchmarking optimization software with performance profiles. Mathematical programming, 2002, 91(2): 201–213.