1 Introduction

A typical system of nonlinear equations has the general form

$$\begin{aligned} F\left( x \right) =0, \end{aligned}$$
(1)

where \(F:R^{n}\rightarrow R^{n}\) is a nonlinear mapping assumed to be continuously differentiable in a neighborhood of \(R^{n}\). Systems of nonlinear equations play important role in sciences and engineering fields; therefore, solving (1) has become a subject of interest to researchers in the aforementioned areas. Numerous algorithms or schemes have been developed for solving these systems of equations. Notable among them are the Newton and quasi-Newton schemes [14, 22, 34, 52], which converge rapidly from sufficiently good starting point. However, the requirement for computation and storage of the Jacobian matrix or an approximation of it at each iteration makes the two methods unattractive for large-scale nonlinear systems [51].

The ideal method for solving large-scale systems is the conjugate gradient (CG) method, which forms an important class of algorithms used in solving large-scale unconstrained optimization problems. The method is popular with mathematicians and engineers engaged in large-scale problems because of it low memory requirement and strong global convergence properties [19]. Generally, the nonlinear conjugate gradient method is used to solve large-scale problems in the following form;

$$\begin{aligned} \hbox {min}f\left( x \right) ,\quad x\in R^{n}, \end{aligned}$$
(2)

where \(f:R^{n}\rightarrow R\) is a continuously differentiable function that is bounded from below and its gradient is available. The method generates a sequence of iterates \(x_k\) from an initial point \(x_0 \in R^{n}\) using the iterative formula

$$\begin{aligned} x_{k+1} =x_k +s_k ,\quad s_k =\alpha _k d_k ,\quad k=0,1,\ldots , \end{aligned}$$
(3)

where \(x_k\) is the current iterate, \(\alpha _k >0\) is a step length computed using suitable line search technique, and \(d_k\) is the CG search direction defined by

$$\begin{aligned} d_k =\left( \begin{array}{ll} -F_k ,&{}\quad \text {if }\quad k=0, \\ -F_k +\beta _k d_{k-1} ,&{}\quad \text {if }\quad k\ge 1, \\ \end{array}\right. \end{aligned}$$
(4)

where \(\beta _k\) is a scalar known as the CG update parameter, and \(F_k =\nabla f\left( {x_k } \right) \). It is worth noting that a crucial element in any CG algorithm is the formula definition of the update parameter \(\beta _k\) [4], which is why different CG algorithms corresponding to different choices of \(\beta _k\) in (4) have been proposed (see [8, 10,11,12,13,14, 17, 33, 50, 51, 53, 65]).

Also, some of the CG methods for unconstrained optimization are not globally convergent, so efforts have been made by researchers to develop CG methods that are not only globally convergent but also are numerically efficient. These new methods are based on secant equations. For nonlinear conjugate gradient methods, the conjugacy condition is given by

$$\begin{aligned} d_k^T y_{k-1} =0. \end{aligned}$$
(5)

Perry [44] extended (5) by exploiting the following secant condition of quasi-Newton schemes:

$$\begin{aligned} B_k s_{k-1} =y_{k-1} , \end{aligned}$$
(6)

and quasi-Newton search direction \(d_k\) given by

$$\begin{aligned} B_k d_k =-F_k , \end{aligned}$$
(7)

where \(B_k\) is a square matrix, which approximates the Hessian \(\nabla ^{2}f\left( x \right) \). By using (6) and (7), Perry gave an extension of (5) as:

$$\begin{aligned} d_k^T y_{k-1} =-F_k^T s_{k-1} , \end{aligned}$$
(8)

and using (4), the Perry search direction is given as

$$\begin{aligned} d_k =\left( \begin{array}{ll} -F_k ,&{} \quad \text {if }\quad k=0, \\ -P_k F_k =-F_k +\beta _k^P d_{k-1} ,&{} \quad \text {if }\quad k\ge 1, \\ \end{array}\right. \end{aligned}$$
(9)

where

$$\begin{aligned} B_k^P =\frac{(y_{k-1} -s_{k-1} )^{T}}{s_{k-1}^T y_{k-1} }F_k , \end{aligned}$$
(10)

and

$$\begin{aligned} P_k =I-\frac{s_{k-1} (y_{k-1} -s_{k-1} )^{T}}{s_{k-1}^T y_{k-1} }. \end{aligned}$$
(11)

Following Perry’s approach, Dai and Liao [18] incorporated a nonnegative parameter t to propose the following extension of (8):

$$\begin{aligned} d_k^T y_{k-1} =-tF_k^T s_{k-1} . \end{aligned}$$
(12)

It is noted that for \(t=0\), (12) reduces to (5), and if \(t=1\), we obtain Perry’s condition (8). Consequently, by substituting (4) into (12), Dai and Liao [18] proposed the following CG update parameter:

$$\begin{aligned} B_k^{DL} =\frac{(y_{k-1} -ts_{k-1} )^{T}F_k }{d_{k-1}^T y_{k-1} },\quad t\ge 0. \end{aligned}$$
(13)

Numerical results have shown that the DL method is effective; however, it is much dependent on the nonnegative parameter t for which there is no optimal value [4], and it may not necessarily generate descent directions [8]. That is, the method may not satisfy the descent condition

$$\begin{aligned} F_k^T d_k <0,\quad \forall k, \end{aligned}$$
(14)

or the sufficient descent condition, namely there exists a constant \(\lambda >0\) such that

$$\begin{aligned} F_k^T d_k \le -\lambda \parallel F_k \parallel ^{2},\quad \forall k. \end{aligned}$$
(15)

Based on the DL conjugacy condition (12), conjugate gradient methods have been proposed over the years using modified secant equations. For example, Babaie-Kafaki et al. [13] and Yabe and Takano [55] proposed CG methods by applying a revised form of the modified secant equation proposed by Zhang and Xu [63] and Zhang et al.[64] and the modified secant equation proposed by Li and Fukushima [36]. Li et al.[37] applied the modified secant equation proposed by Wei et al. [54], while Ford et al. [26] employed the multi-step quasi-Newton conditions proposed by Ford and Moghrabi [27, 28]. CG methods based on modified secant equations have also been studied by Narushima and Yabe [57] and Reza Arazm et al. [7]. These methods have been found to be numerically efficient and globally convergent under suitable conditions, but like the DL method, they also fail to ensure sufficient descent.

Recently, by employing Perry’s idea [44], efficient CG methods with descent directions have been proposed. Liu and Shang [39] proposed a Perry conjugate gradient method, which provides prototypes for developing other special form of the Perry method like the HS method and the DL method [18]. Liu and Xu [40] presented a new Perry CG method with sufficient descent properties, which is independent of any line search. Also, based on the self-scaling memoryless BFGS update, Andrei [6] proposed an accelerated adaptive class of Perry conjugate gradient algorithms, whose search direction is determined by symmetrization of the scaled Perry CG direction [44].

CG methods for systems of nonlinear equations are rare as most of the methods are for unconstrained optimization. However, over the years, the method has been extended to large-scale nonlinear systems of equations by researchers. Using a combination of the Polak–Ribieré–Polyak (PRP) conjugate gradient method for unconstrained optimization [45, 47] and the hyperplane projection method of Solodov and Svaiter [48], Cheng [16] proposed a PRP-type method for systems of monotone equations. Yu [58, 59] extended the PRP method [45] to solve large-scale nonlinear systems with monotone line search strategies, which are modifications of the Grippo–Lampariello–Lucidi [29] and Li–Fukushima [35] schemes. As a further research of the Perry’s conjugate gradient method, Dai et al. [21] combined the modified Perry conjugate gradient method [41] and the hyperplane projection technique of Solodov and Svaiter [48] to propose a derivative-free method for solving large-scale nonlinear monotone equations. By combining the descent Dai–Liao CG method by Babaie-Kafaki and Ghanbari [54] and the projection method in [48], Abubakar and Pumam [2] proposed a descent Dai–Liao CG method for nonlinear equations. Numerical results show the method to be efficient. Based on the projection strategy [48], Liu and Feng [38] proposed a derivative-free iterative method for large-scale nonlinear monotone equations, which can be used to solved large-scale non-smooth problems due to its lower storage and derivative-free information. Abubakar and Kumam [1] proposed an improved three-term derivative-free method for solving large-scale nonlinear equations. The method is based on a modified HS method with the projection technique of Solodov and Svaiter [48]. Abubakar et al. [3] proposed a descent Dai–Liao CG method for solving nonlinear convex constraint monotone equations. The method is an extension of the method in [2]. By using a convex combination of two different positive spectral coefficients, Mohammed and Abubakar [42] proposed a combination of positive spectral gradient-like method and projection method for solving nonlinear monotone equations. Awwal et al. [43] proposed a hybrid spectral gradient algorithm for system of nonlinear monotone equations with convex constraints. The scheme is combination of a convex combination of two different positive spectral parameters and the projection technique.

Here, based on the work of Babaie-Kafaki and Ghanbari [9], and the Dai–Liao (DL) [18] approach, we propose a Dai–Liao conjugate gradient method for system of nonlinear equations by incorporating an extended secant equation in the classical DL update.

Throughout this work, we use \(\parallel .\parallel \) to denote the Euclidean norm of vectors, \(y_{k-1} =F_k -F_{k-1} ,s_{k-1} =x_k -x_{k-1}\) and \(F_k =F\left( {x_k } \right) \). We also assume that problem (1) is Lipschitz continuous and f in (2) is specified by

$$\begin{aligned} f\left( x \right) :=\frac{1}{2}\parallel F\left( x \right) \parallel ^{2}. \end{aligned}$$
(16)

The paper is organized as follows: in Sect. 2, we present details of the method. Convergence analysis is presented in Sect. 3. Numerical results of the method are presented in Sect. 4. Finally, conclusions are made in Sect. 5.

2 Proposed method and its algorithm

Following the Dai–Liao approach, Babaie-Kafaki and Ghanbari [9] proposed the following extension of the PRP update parameter

$$\begin{aligned} \beta _k^{\mathrm{EPRP}} =\beta _k^{\mathrm{PRP}} -t\frac{F_k^T d_{k-1} }{\parallel F_{k-1} \parallel ^{2}}, \end{aligned}$$
(17)

where \(\beta _k^{PRP}\) is the classical PRP parameter and t is a nonnegative parameter, whose values were determined by carrying out eigenvalue analysis. Motivated by this, and employing similar approach, we propose a modification of the classical DL update parameter. In what follows, we suggest an extension of some previously modified secant equations.

By expanding (6), Zhang et al. [64] proposed the following modified secant equation

$$\begin{aligned} B_k s_{k-1} =\hat{y}_{k-1} ,\quad \hat{y}_{k-1} =y_{k-1} +\left( {\frac{\theta _{k-1} }{s_{k-1}^T \mu _{k-1} }} \right) \mu _{k-1} , \end{aligned}$$
(18)

where

$$\begin{aligned} \theta _{k-1} =6\left( {f_{k-1} -f_k } \right) +3s_{k-1}^T \left( {F_{k-1} +F_k } \right) , \end{aligned}$$
(19)

where \(\mu _{k-1} \in R^{n}\) is a vector parameter such that \(s_{k-1}^T \mu _{k-1} \ne 0\) (see [64]).

Similarly, Wei et al. [54] gave the following modified secant equation

$$\begin{aligned} B_k s_{k-1} =\bar{y}_{k-1} ,\quad \bar{y}_{k-1} =y_{k-1} +\left( {\frac{\vartheta _{k-1} }{s_{k-1}^T \mu _{k-1} }} \right) \mu _{k-1} , \end{aligned}$$
(20)

with

$$\begin{aligned} \vartheta _{k-1} =2\left( {f_{k-1} -f_k } \right) +s_{k-1}^T \left( {F_{k-1} +F_k } \right) , \end{aligned}$$
(21)

where \(\mu _{k-1} \in R^{n}\) is a vector parameter such that \(s_{k-1}^T \mu _{k-1} \ne 0\) (see [60]). Also, in (18) and (20), the vector parameter \(\mu _{k-1} =s_{k-1}\) [55].

Here, we propose the following secant equation as an extension of (6), (18), and (20):

$$\begin{aligned} B_k s_{k-1} =u_{k-1} =y_{k-1} +2\phi \frac{\vartheta _{k-1} }{s_{k-1}^T \mu _{k-1} }\mu _{k-1} , \end{aligned}$$
(22)

where \(\phi \) is a nonnegative parameter, \(\vartheta _{k-1}\) is defined by (21) and \(s_{k-1}^T \mu _{k-1} \ne 0\). We observe that for \(\phi =0\), (22) becomes the standard secant equation defined by (6), and if \(\phi =\frac{3}{2}\), (22) reduces to (19). Also, for \(\phi =\frac{1}{2}\), we see that (22) reduces to the modified secant equation proposed by Zhang et al. [64]. Substituting \(u_{k-1}\) in (22) for \(y_{k-1}\) in (13), we obtain the following version of the DL update parameter:

$$\begin{aligned} \bar{\beta }_k^{\mathrm{ADL}} =\frac{(u_{k-1} -ts_{k-1} )^{T}F_k }{d_{k-1}^T u_{k-1} },\quad t\ge 0. \end{aligned}$$
(23)

Observe that, in general, the denominator, \(d_{k-1}^T u_{k-1}\) may not be nonzero since \(\vartheta _{k-1}\) as defined in (22) may be non-positive. Therefore, we redefine \(u_{k-1}\) and obtain its revised form as

$$\begin{aligned} z_{k-1} =y_{k-1} +2\phi \frac{\hbox {max}\left\{ {\vartheta _{k-1} ,0} \right\} }{s_{k-1}^T \mu _{k-1} }\mu _{k-1} . \end{aligned}$$
(24)

Consequently, we get the revised form of (23) as

$$\begin{aligned} \hat{\beta }_k^{\mathrm{ADL}} =\frac{z_{k-1}^T F_k }{d_{k-1}^T z_{k-1} }-t\frac{s_{k-1}^T F_k }{d_{k-1}^T z_{k-1} }. \end{aligned}$$
(25)

Andrei [4] noted that the parameter t has no optimal choice and so, to obtain descent directions for our proposed method, we proceed to obtain appropriate values for t. From (4), and after some algebra, our search direction becomes:

$$\begin{aligned} d_k =-F_k +\left( {\frac{s_{k-1} z_{k-1}^T -ts_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }} \right) F_k . \end{aligned}$$
(26)

Following Perry’s approach [44], search direction of our proposed method can be written as

$$\begin{aligned} d_k =-H_k F_k ,\quad k\ge 1, \end{aligned}$$
(27)

where \(H_k \), called the search direction matrix is given by

$$\begin{aligned} H_k =I-\frac{s_{k-1} z_{k-1}^T }{s_{k-1}^T z_{k-1} }+t\frac{s_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }, \end{aligned}$$
(28)

and \(z_{k-1}\) is as defined by (24). And from (27) we can write

$$\begin{aligned} d_k^T F_k =-F_k^T H_k^T F_k =d_k^T F_k =-F_k^T \frac{H_k^T +H_k }{2}F_k , \end{aligned}$$
(29)

where

$$\begin{aligned} \bar{H}_k= & {} \frac{H_k^T +H_k }{2} \nonumber \\= & {} I-\frac{1}{2}\frac{s_{k-1} z_{k-1}^T +z_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} }+t\frac{s_{k-1} s_{k-1}^T }{s_{k-1}^T z_{k-1} } \end{aligned}$$
(30)

Proposition 2.1

The matrix \(\bar{H}_k\) defined by (30) is a symmetric matrix.

Proof

Using direct computation, we see that \(\bar{H}_k =\bar{H}_k^T \). Hence, \(\bar{H}_k\) is symmetric.

And so, to analyze the descent property of our method, we need to find eigenvalues of \(\bar{H}_k\) and their structure.    \(\square \)

Theorem 2.2

Let the matrix \(\bar{H}_k\) be defined by (30). Then, the eigenvalues of \(\bar{H}_k\) consist of 1 with (\(n-2\) multiplicity), \(\lambda _k^+\) and \(\lambda _k^-\), where

$$\begin{aligned} \lambda _k^+= & {} \frac{1}{2}\left[ {\left( {1+a_k } \right) +\sqrt{(a_k -1)^{2}+b_k -1}} \right] \end{aligned}$$
(31)
$$\begin{aligned} \lambda _k^-= & {} \frac{1}{2}\left[ {\left( {1+a_k } \right) -\sqrt{(a_k -1)^{2}+b_k -1}} \right] \end{aligned}$$
(32)

and \(a_k =t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1}}\), \(\quad b_k =\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}\).

Furthermore, all eigenvalues of \(\bar{H}_k\) are positive real numbers.

Proof

Since \(d_{k-1}^T z_{k-1} \ne 0\), then \(s_{k-1}^T z_{k-1} \ne 0\). And so, \(s_{k-1} \ne 0\) and \(z_{k-1} \ne 0\), which implies that the vectors \(s_{k-1}\) and \(z_{k-1}\) are nonzero vectors. Suppose V is the vector space spanned by \(\left\{ {s_{k-1} ,z_{k-1} } \right\} \). Then \(dim\left( V \right) \le 2\) and \(dim\left( {V^{\bot }} \right) \ge n-2\), where \(V^{\bot }\) is the orthogonal complement of V. Therefore, there exists a set of mutually orthogonal vectors \(\{\tau _{k-1}^i \}_{i=1}^{n-2} \subset V^{\bot }\) satisfying

$$\begin{aligned} s_{k-1}^T \tau _{k-1}^i =z_{k-1}^T \tau _{k-1}^i =0. \end{aligned}$$
(33)

By multiplying both sides of (30) by \(\tau _{k-1}^i\), we obtain

$$\begin{aligned} \bar{H}_k \tau _{k-1}^i =\tau _{k-1}^i ,\quad i=1,\ldots ,n-2, \end{aligned}$$
(34)

which can be viewed as an eigenvector equation. So, \(\tau _{k-1}^i \), for \(i=1,\ldots ,n-2\) are the eigenvectors of \(\bar{H}_k\) with eigenvalue 1 each. Let \(\lambda _k^+\) and \(\lambda _k^-\) be the remaining two eigenvalues, respectively. Observe that (30) can be written as

$$\begin{aligned} \bar{H}_k =I-\frac{s_{k-1} (z_{k-1} -2ts_{k-1} )^{T}}{2s_{k-1}^T z_{k-1} }-\frac{z_{k-1} s_{k-1}^T }{2s_{k-1}^T z_{k-1} }. \end{aligned}$$
(35)

Clearly, \(\bar{H}_k\) represents a rank-two update, so from the fundamental algebra formula (see inequality \(\left( {1.2.70} \right) )\) of [49]

$$\begin{aligned} det( {I+u_1 u_2^T +u_3 u_4^T } )= ( {1+u_1^T u_2 } )( {1+u_3^T u_4 } )- ( {u_1^T u_4 } )( {u_2^T u_3 } ), \end{aligned}$$
(36)

where

$$\begin{aligned} u_1= & {} -\frac{s_{k-1} }{2s_{k-1}^T z_{k-1} }, \quad u_2 =\left( {z_{k-1} -2ts_{k-1} } \right) ,\quad u_3 =-\frac{z_{k-1} }{2s_{k-1}^T z_{k-1} },\quad u_4 =s_{k-1} \nonumber \\ det\left( {\bar{H}_k } \right)= & {} \frac{1}{4}+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{1}{4}\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}. \end{aligned}$$
(37)

Since sum of the eigenvalues of a square symmetric matrix equals to its trace, from (30), we have

$$\begin{aligned} \text {trace}\left( {\bar{H}_k } \right)= & {} n-1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} } \nonumber \\= & {} \mathop {\underbrace{1+\cdots +1}}\limits _{\left( {{n}-2} \right) \mathrm{times}} +\lambda _k^+ +\lambda _k^- , \end{aligned}$$
(38)

for which we obtain

$$\begin{aligned} \lambda _k^+ +\lambda _k^- =1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }. \end{aligned}$$
(39)

Using the relationship between trace and determinant of a matrix and its eigenvalues, we can obtain \(\lambda _k^+\) and \(\lambda _k^-\) as roots of the following quadratic polynomial:

$$\begin{aligned} \lambda ^{2}-\left( {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }} \right) \lambda +\frac{1}{4}+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{1}{4}\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}=0. \end{aligned}$$
(40)

So, the remaining two eigenvalues are obtained from (40). And applying the quadratic formula with some rearrangements, we obtain

$$\begin{aligned} \lambda _k^\pm =\frac{1}{2}\left[ {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }\pm \sqrt{\left( {t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-1} \right) ^{2}+\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}-1}} \right] \end{aligned}$$
(41)

We can write (41) as

$$\begin{aligned} \lambda _k^\pm =\frac{1}{2}\left[ {\left( {1+a_k } \right) \pm \sqrt{(a_k -1)^{2}+b_k -1}} \right] , \end{aligned}$$
(42)

which proves (31) and (32).

To obtain \(\lambda _k^+\) and \(\lambda _k^-\) as real numbers, we must have \(\Delta =(a_k -1)^{2}+b_{k-1} \ge 0\).

From Cauchy inequality, \(b_k =\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}\ge 1\), so, \(\Delta >0\). Consequently, both eigenvalues are real numbers and \(\lambda _k^+ >0\) since \(\left( {1+a_k } \right) \) is nonnegative. And to obtain \(\lambda _k^- >0\), the following must be satisfied:

$$\begin{aligned} \frac{1}{2}\left[ {1+t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\sqrt{\left( {t\frac{\parallel s_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-1} \right) ^{2}+\frac{\parallel s_{k-1} \parallel ^{2}\parallel z_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}-1}} \right] >0. \end{aligned}$$
(43)

After some algebra, we obtain the following estimation for the parameter t, which satisfies (43):

$$\begin{aligned} t>\frac{1}{4}\left( {\frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right) . \end{aligned}$$
(44)

So, \(\lambda _k^- >0\) if (44) is satisfied. In addition, for t satisfying (44), \(\bar{H}_k\) is nonsingular.

Therefore, all the eigenvalues of the symmetric matrix \(\bar{H}_k\) are positive real numbers, which ensures that it is a positive-definite matrix. Moreover, using (42) and (44), we obtain the following estimation for \(\lambda _k^+\) and \(\lambda _k^- \):

$$\begin{aligned} \lambda _k^+ \ge \left( {\frac{3(s_{k-1}^T z_{k-1} )^{2}+\parallel z_{k-1} \parallel ^{2}\parallel s_{k-1} \parallel ^{2}}{(s_{k-1}^T z_{k-1} )^{2}}} \right) ,\quad \lambda _k^- >0. \end{aligned}$$
(45)

And the proof is complete. Hence, from (29), we have

$$\begin{aligned} d_k^T F_k =-F_k^T \bar{H}_k F_k \le -\lambda _k^- \parallel F_k \parallel ^{2}<0, \end{aligned}$$
(46)

which shows that the descent condition is satisfied. We, therefore, propose the following formula for the parameter t in the modified DL method:

$$\begin{aligned} t^{\mathrm{ADL}}=\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}, \end{aligned}$$
(47)

where \(\xi >\frac{1}{4}\) and \(\gamma <\frac{1}{4}\). \(\square \)

Remark 2.3

Since the DL parameter t is nonnegative, we restrict the values of the parameter \(\gamma \) in (47) to be negative so as to avoid a numerically unreasonable approximation [32]. So, based on the above remark, we can write the modified DL update parameter as

$$\begin{aligned} \beta _k^{\mathrm{ADL}} =\frac{F_k^T z_{k-1} }{d_{k-1}^T z_{k-1} }-t^{\mathrm{ADL}}\frac{F_k^T s_{k-1} }{d_{k-1}^T z_{k-1} }, \end{aligned}$$
(48)

with \(\xi \ge \frac{1}{4}\) and \(\gamma <0\) satisfying (47) and guaranteeing the descent condition. We also write the search direction for the proposed method as

$$\begin{aligned} d_k^{\mathrm{ADL}} =-F_k +\left( {\frac{(z_{k-1} -t_k^{\mathrm{ADL}} s_{k-1} )^{T}F_k }{d_{k-1}^T z_{k-1} }} \right) d_{k-1}. \end{aligned}$$
(49)

We use the derivative-free line search proposed by Li and Fukushima [34] to compute our step length \(\alpha _k \).

Let \(\sigma _1 >0\), \(\sigma _2 >0\) and \(r\in \left( {0,1} \right) \) be constants and let \(\left\{ {\eta _k } \right\} \) be a given positive sequence such that

$$\begin{aligned} \mathop {\sum }\limits _{k=0}^\infty \eta _k<\eta <\infty , \end{aligned}$$
(50)

and

$$\begin{aligned} \parallel F_{k+1} \parallel ^{2}-\parallel F_k \parallel ^{2}\le -\sigma _1 \parallel \alpha _k F_k \parallel ^{2}-\sigma _2 \parallel \alpha _k d_k \parallel ^{2}+\eta _k \parallel F_k \parallel ^{2}. \end{aligned}$$
(51)

Let \(i_k\) be the smallest non-negative integer i such that (51) holds for \(\alpha =r^{i}\). Let \(\alpha _k =r^{i_k }.\)

Now, we describe the algorithm of the proposed method as follows:

Algorithm 2.4

A Dai–Liao CG method (ADLCG)

Step 1 :

Given \(\varepsilon >0\), choose an initial point \(x_0 \in R^{n}\), a positive sequence \(\left\{ {\eta _k } \right\} \) satisfying (50), and constants \(r\in \left( {0,1} \right) ,\sigma _1 ,\sigma _2 >0\), \(\xi \ge \frac{1}{4}\), \(\gamma <0\). Compute \(d_0 =-F_0\) and set \(k=0\).

Step 2 :

Compute \(F\left( {x_k } \right) \). If \(\parallel F\left( {x_k } \right) \parallel \le \varepsilon \), stop. Otherwise, compute the search direction \(d_k\) by (49).

Step 3 :

Compute \(\alpha _k\) via the line search in (51).

Step 4 :

Set \(x_{k+1} =x_k +\alpha _k d_k \).

Step 5 :

Set \(k:=k+1\) and go to Step 2.

3 Convergence analysis

The following assumptions are required to analyze the convergence of the ADLCG algorithm.

Assumption 3.1

The level set

$$\begin{aligned} \Omega =\{x|F\left( x \right) \le F\left( {x_0 } \right) \} \end{aligned}$$
(52)

is bounded.

Assumption 3.2

  1. (1)

    The solution set of problem (1) is not empty.

  2. (2)

    F is continuously differentiable on an open convex set \(\Phi _1\) containing \(\Phi \).

  3. (3)

    F is Lipschitz continuous in some neighborhood N of \(\Phi \); namely, there exists a positive constant \(L>0\) such that,

    $$\begin{aligned} \parallel F\left( x \right) -F\left( y \right) \parallel \le L\parallel x-y\parallel , \end{aligned}$$
    (53)

    for all \(x,y\in N\).

    Assumption (3.1) and condition (3) imply that there exists a positive constant \(\omega \) such that

    $$\begin{aligned} \parallel F\left( {x_k } \right) \parallel \le \omega , \end{aligned}$$
    (54)

    for all \(x\in \Phi \), (see Proposition 1.3 of [13]).

  4. (4)

    The Jacobian of F is bounded, symmetric and positive-definite on \(\Phi _1 \), which implies that there exist constants \(m_2 \ge m_1 >0\) such that

    $$\begin{aligned} \parallel F^{\prime }\left( x \right) \parallel \le m_2 ,\quad \forall x\in \Phi _1 , \end{aligned}$$
    (55)

    and

    $$\begin{aligned} m_1 \parallel d\parallel ^{2}\le d^{T}F^{\prime }\left( x \right) d,\quad \forall x\in \Phi _1 ,d\in R^{n}. \end{aligned}$$
    (56)

Lemma 3.3

Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then \(d_k\) is a descent direction for \(F\left( {x_k } \right) \) at \(x_k\). i.e.,

$$\begin{aligned} F(x)^{T}d_k <0. \end{aligned}$$
(57)

Proof

By (46), the Lemma is true and we can deduce that the norm function \(f\left( {x_k } \right) \) is a descent along the direction \(d_k \). i.e., \(\parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \) is true \(\forall k\). \(\square \)

Lemma 3.4

Suppose Assumptions 3.1 and 3.2 hold. Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then \(\left\{ {x_k } \right\} \subset \Omega \). Moreover, \(\parallel F_k \parallel \}\) converges.

Proof

By Lemma 3.3, we have \(\{\parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \). So, by Lemma 3.3 in [20], we conclude that \(\left\{ {\parallel F_k \parallel } \right\} \) converges. Moreover, for all k, we have

$$\begin{aligned} \parallel F\left( {x_{k+1} } \right) \parallel \le \parallel F\left( {x_k } \right) \parallel \le \parallel F\left( {x_{k-1} } \right) \parallel \cdots \le \parallel F\left( {x_0 } \right) \parallel . \end{aligned}$$
(58)

This implies that \(\left\{ {x_k } \right\} \subset \Omega \)\(\square \)

Lemma 3.5

Suppose Assumption 3.1 and 3.2 hold. Let \(\left\{ {x_k } \right\} \) be generated by the Algorithm 2.4. Then

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel \alpha _k d_k \parallel =\mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel s_k \parallel =0, \end{aligned}$$
(59)

and

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel \alpha _k F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(60)

Proof

From the line search (51) and for all \(k>0\), we obtain

$$\begin{aligned} \sigma _2 \parallel \alpha _k d_k \parallel ^{2}\le & {} \sigma _1 \parallel \alpha _k F_k \parallel ^{2}+\sigma _2 \parallel \alpha _k d_k \parallel ^{2} \nonumber \\\le & {} \parallel F_k \parallel ^{2}-\parallel F_{k+1} \parallel ^{2}+\eta _k \parallel F_k \parallel ^{2}. \end{aligned}$$
(61)

And by summing up the above k inequality, we obtain

$$\begin{aligned} \sigma _2 \mathop {\sum }\limits _{i=0}^k \parallel \alpha _k d_k \parallel ^{2}\le & {} \mathop {\sum }\limits _{i=0}^k \left( {\parallel F\left( {x_i } \right) \parallel ^{2}-\parallel F\left( {x_{i+1} } \right) \parallel ^{2}} \right) +\mathop {\sum }\limits _{i=0}^k \eta _i \parallel F\left( {x_i } \right) \parallel ^{2} \nonumber \\= & {} \parallel F\left( {x_0 } \right) \parallel ^{2}-\parallel F\left( {x_{k+1} } \right) \parallel ^{2}+\mathop {\sum }\limits _{i=0}^k \eta _i \parallel F\left( {x_i } \right) \parallel ^{2}\nonumber \\\le & {} \parallel F\left( {x_0 } \right) \parallel ^{2}+\parallel F\left( {x_0 } \right) \parallel ^{2}\mathop {\sum }\limits _{i=0}^k \eta _i \nonumber \\\le & {} \parallel F\left( {x_0 } \right) \parallel ^{2}+\parallel F\left( {x_0 } \right) \parallel ^{2}\mathop {\sum }\limits _{i=0}^\infty \eta _i . \end{aligned}$$
(62)

Therefore, by (52) and since \(\left\{ {\eta _i } \right\} \) satisfies (50), then the series \(\mathop {\sum }\nolimits _{i=0}^k \parallel \alpha _k d_k \parallel ^{2}\) is convergent, which implies that (59) holds. Using the same argument as above, with \(\sigma _1 \parallel \alpha _k F\left( {x_k } \right) \parallel ^{2}\) on the left-hand sides, we obtain (60). \(\square \)

Lemma 3.6

[62] Suppose Assumptions 3.1 and 3.2 hold and \(\left\{ {x_k } \right\} \) be generated by Algorithm 2.4. Then, there exists a constant \(m>0\) such that,

$$\begin{aligned} y_k^T s_k \ge m\parallel s_k \parallel ^{2}>0,\quad \forall k\ge 1. \end{aligned}$$
(63)

Proof

By mean-value theorem, we have

$$\begin{aligned} y_k^T s_k =s_k^T \left( {F\left( {x_{k+1} } \right) -F\left( {x_k } \right) } \right) =s_k^T {F}'', \end{aligned}$$
(64)

where \(\varphi =\lambda x_k +\left( {1-\lambda } \right) x_{k+1}\), for some \(\lambda \in \left( {0,1} \right) \). We obtain the last inequality from (56). Letting \(m_1 =m\), the proof is established. \(\square \)

Lemma 3.7

Suppose Assumptions 3.1 and 3.2 hold. Let the sequence \(\left\{ {x_k } \right\} \) be generated by Algorithm 2.4 with update parameter \(\beta _k^{\mathrm{ADL}} \). Then, there exists \(M>0\) such that

$$\begin{aligned} \parallel d_k^{\mathrm{ADL}} \parallel \le M,\quad \forall k. \end{aligned}$$
(65)

Proof

Using (24) and (64), we get

$$\begin{aligned} s_{k-1}^T z_{k-1} =s_{k-1}^T y_{k-1} +2\phi \frac{\hbox {max}\left\{ {\vartheta _{k-1} ,0} \right\} }{s_{k-1}^T \mu _{k-1} }s_{k-1}^T \mu _{k-1} \ge s_{k-1}^T y_{k-1} \ge m\parallel s_{k-1} \parallel ^{2}. \end{aligned}$$
(66)

Applying the mean-value theorem, we have

$$\begin{aligned} \left| {\vartheta _{k-1} } \right|= & {} |2\left( {f_k -f_{k+1} } \right) +(F_{k-1} +F_k )^{T}s_{k-1} | \nonumber \\= & {} |(-2\nabla f\left( \varphi \right) +\nabla f\left( {x_k } \right) +\nabla f\left( {x_{k+1} } \right) )^{T}s_{k-1} |, \end{aligned}$$
(67)

where \(\varphi =\lambda x_k +\left( {1-\lambda } \right) x_{k+1} \), for some \(\lambda \in \left( {0,1} \right) \).

Hence from (53), we have

$$\begin{aligned} \left| {\vartheta _{k-1} } \right|\le & {} (\parallel \nabla f\left( {x_k } \right) -\nabla f\left( \varphi \right) \parallel +\parallel \nabla f\left( {x_{k+1} -\nabla f\left( \varphi \right) \parallel } \right) \parallel s_{k-1} \parallel \nonumber \\\le & {} \left( {L\left( {1-\lambda } \right) \parallel s_{k-1} \parallel +L\lambda \parallel s_{k-1} \parallel } \right) \parallel s_{k-1} \parallel \nonumber \\= & {} L\parallel s_{k-1} \parallel ^{2}. \end{aligned}$$
(68)

Utilizing (24), (53), (68), and setting \(\mu _{k-1} =s_{k-1} \), we obtain

$$\begin{aligned} \parallel z_{k-1} \parallel\le & {} \parallel y_{k-1} \parallel +2\phi \frac{\left| {\vartheta _{k-1} } \right| }{\left| {s_{k-1}^T s_{k-1} } \right| }\parallel s_{k-1} \parallel \nonumber \\\le & {} L\parallel s_{k-1} \parallel +2\phi L\frac{\parallel s_{k-1} \parallel ^{2}}{\parallel s_{k-1} \parallel ^{2}}\parallel s_{k-1} \parallel \nonumber \\= & {} \left( {L+2\phi L} \right) \parallel s_{k-1} \parallel . \end{aligned}$$
(69)

And using (47), (53) and (69), we get

$$\begin{aligned} \left| {t^{\mathrm{ADL}}} \right|= & {} \left| {\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }-\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right| \nonumber \\\le & {} \left| {\xi \frac{\parallel z_{k-1} \parallel ^{2}}{s_{k-1}^T z_{k-1} }} \right| +\left| {\gamma \frac{s_{k-1}^T z_{k-1} }{\parallel s_{k-1} \parallel ^{2}}} \right| \nonumber \\\le & {} \xi \frac{\left( \left( {L+2\phi L} \right) \parallel s_{k-1} \parallel \right) ^{2}}{m\parallel s_{k-1} \parallel ^{2}}+\left| \gamma \right| \frac{m\parallel s_{k-1} \parallel ^{2}}{\parallel s_{k-1} \parallel ^{2}} \nonumber \\= & {} \xi \frac{(L+2\phi L)^{2}}{m}+m\left| \gamma \right| . \end{aligned}$$
(70)

By utilizing (4), (47), (48), (69) and (70) we obtain,

$$\begin{aligned} \Vert d_{k}^{\mathrm{ADL}}\Vert= & {} \Vert -F(x_{k})+\beta _{k}^{\mathrm{ADL}}d{k-1}\Vert \nonumber \\\le & {} \Vert F(x_{k})\Vert +|\beta _{k}^{\mathrm{ADL}}|\Vert d{k-1}\Vert \nonumber \\= & {} \Vert F(x_{k})\Vert +\frac{\Vert F(x_k)\Vert \Vert z_{k-1}\Vert }{s_{k-1}^Tz_{k-1}}\Vert s_{k-1}\Vert +|t^{\mathrm{ADL}}|\frac{\Vert F(x_k)\Vert \Vert s_{k-1}\Vert }{s_{k-1}^Tz_{k-1}}\Vert s_{k-1}\Vert \nonumber \\\le & {} \Vert F(x_{k})\Vert +\frac{\Vert F(x_k)\Vert \Vert (L+2\phi L)}{m} +\left( \xi \frac{(L+2\phi L)^2}{m} +m|\gamma |\right) \frac{\Vert F(x_k)\Vert }{m} \nonumber \\= & {} \left( 1+\frac{(L+2\phi L)}{m}+\left( \xi \frac{(L+2\phi L)^2}{m^2}+|\gamma | \right) \right) \Vert F(x_k)\Vert \nonumber \\= & {} \frac{(m^2+m(L+2\phi L)+((L+2\phi L)^2\xi +|\gamma |))\Vert F(x_k)\Vert }{m^2} \nonumber \\= & {} \frac{c_1\Vert F(x_k)\Vert }{m^2}, \end{aligned}$$
(71)

where \(c_1 =(m^{2}+m\left( {L+2\phi L} \right) +\left( {\left( {L+2\phi L)^{2}\xi +\left| \gamma \right| } \right) } \right) \).

Setting \(M:=\frac{c_1 \parallel F\left( {x_k } \right) \parallel }{m^{2}}\), we obtain the required result.

In the next, we prove the global convergence of the ADLCG method. \(\square \)

Theorem 3.8

Suppose Assumption 3.1 and 3.2 hold and that the sequence \(\left\{ {x_k } \right\} \) is generated by Algorithm 2.4. Also, assume that for all \(k>0\)

$$\begin{aligned} \alpha _k \ge c\frac{|F(x_k )^{T}d_k |}{\parallel d_k \parallel ^{2}}, \end{aligned}$$
(72)

where c is some positive constant. Then, \(\left\{ {x_k } \right\} \) converges globally to a solution of problem (1); i.e,

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(73)

Proof

By (59) and the boundedness of \(\left\{ {\parallel d_k \parallel } \right\} \), we have

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \alpha _k \parallel d_k \parallel ^{2}=0. \end{aligned}$$
(74)

From (72) and (74), we have

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } |F(x_k )^{T}d_k |=0. \end{aligned}$$
(75)

On the other hand, from (46), and (45), we have

$$\begin{aligned} F(x_k )^{T}d_k= & {} -\lambda _k^- \parallel F\left( {x_k } \right) \parallel ^{2}\nonumber \\ \parallel F\left( {x_k } \right) \parallel ^{2}= & {} \left\| -\frac{1}{\lambda _k^- }F(x_k )^{T}d_k\right\| \nonumber \\\le & {} |F(x_k )^{T}d_k |\left| {\frac{1}{\lambda _k^- }} \right| . \end{aligned}$$
(76)

But from (45), we have

$$\begin{aligned} \lambda _k^+>\lambda _k^- >0,\quad \forall k. \end{aligned}$$
(77)

Thus, from (76) and applying the sandwich theorem, we obtain

$$\begin{aligned} 0\le \parallel F\left( {x_k } \right) \parallel ^{2}\le |F(x_k )^{T}d_k |\left( {\frac{1}{\lambda _k^- }} \right) \rightarrow 0. \end{aligned}$$
(78)

Therefore,

$$\begin{aligned} \mathop {\hbox {lim}}\limits _{k\rightarrow \infty } \parallel F\left( {x_k } \right) \parallel =0. \end{aligned}$$
(79)

And the proof is completed. \(\square \)

4 Numerical result

In this section, we test the efficiency and robustness of our proposed approach using the following method in the literature:

A new derivative-free conjugate gradient method for solving large-scale nonlinear systems of equations (NDFCG) [24]. All the codes used were written in MATLAB R2014a environment and run on a personal computer (2.20GHZ CPU, 8GB RAM). Also, the two algorithms used in the experiment were implemented with the same line search procedure, and the parameters are set to \(\sigma _1 =\sigma _2 =10^{-4}\), \(\alpha _0 =0.1\), \(r=0.2\) and \(\eta _k =\frac{1}{(k+1)^{2}}\). In addition, we set \(\xi =0.5\), \(\gamma =-0.5\) and \(\mu _{k-1} =s_{k-1}\) for the ADLCG method. Also, the iteration was set to terminate if it exceeds 2000 or the inequality \(\parallel F_k \parallel \le 10^{-10}\) is satisfied (Table 1).

The two algorithms were tested using the following test problems with various sizes:

Problem 4.1

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =2x_i -sin\left| {x_i } \right| , \quad \quad i=1,\ldots ,n. \end{aligned}$$

Problem 4.2

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =\hbox {log}\left( {x_i +1} \right) -\frac{x_i }{n}, \quad \quad i=2,\ldots ,n. \end{aligned}$$

Problem 4.3

[67] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} 2x_1 +\hbox {sin}\left( {x_1 } \right) -1, \\ F_i \left( x \right)= & {} -2x_{i-1} +2x_i +\hbox {sin}\left( {x_i } \right) -1, \quad \quad i=2,\ldots ,n-1, \\ F_n \left( x \right)= & {} 2x_n +\hbox {sin}\left( {x_n } \right) -1. \end{aligned}$$

Problem 4.4

[56] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =x_i -\frac{1}{n}x_i^2 +\frac{1}{n}\mathop {\sum }\limits _{i=1}^n x_i +i, \quad \quad i=1,2,\ldots ,n.. \end{aligned}$$

Problem 4.5

[38] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =2x_i -\hbox {sin}\left( {x_i } \right) ,\quad i=1,2,\ldots ,n. \end{aligned}$$

Problem 4.6

[51] The function \(F\left( x \right) \) is given by

$$\begin{aligned} F\left( x \right) =Ax+b_1 , \end{aligned}$$

where \(b_1 =(e_1^x -1,\ldots ,e_n^x -1)^{T}\), and

$$\begin{aligned} A=\left( {{\begin{array}{ccccc} 2 &{}\quad {-1} &{} &{} &{} \\ {-1} &{}\quad 2 &{}\quad {-1} &{} &{} \\ &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{} \\ &{} &{}\quad \ddots &{}\quad \ddots &{}\quad {-1} \\ &{} &{} &{}\quad {-1} &{}\quad 2 \\ \end{array} }} \right) \end{aligned}$$

Problem 4.7

[61] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right)= & {} \sqrt{10^{-5}}\left( {x_i -1} \right) , \\ F_n \left( x \right)= & {} \frac{1}{4n}\mathop {\sum }\limits _{j=1}^n x_j^2 -\frac{1}{4}, \quad i=2,3,\ldots ,n-1. \end{aligned}$$

Problem 4.8

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_i \left( x \right) =e^{x_i }-1,\quad i=1,2,\ldots ,n. \end{aligned}$$

Problem 4.9

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} x_1 \left( {x_1^2 +x_2^2 } \right) -1, \\ F_i \left( x \right)= & {} x_i \left( {x_{i-1}^2 +2x_i^2 +x_{i+1}^2 } \right) -1,\quad i=2,3,\ldots ,n-1, \\ F_n \left( x \right)= & {} x_n \left( {x_{n-1}^2 +x_n^2 } \right) . \end{aligned}$$

Problem 4.10

[2] The elements of the function \(F\left( x \right) \) are given by:

$$\begin{aligned} F_1 \left( x \right)= & {} x_1 -e \left( \text {cos}\frac{x_1 +x_2 }{n+1}\right) ,\\ F_i \left( x \right)= & {} x_i -e \left( \text {cos}\frac{x_{1-1} +xi+x_{i+1} }{n+1}\right) ,\quad i=2,3,\ldots ,n-1, \\ F_n \left( x \right)= & {} x_n -e \left( \text {cos}\frac{x_{n-1} +x_n }{n+1}\right) . \end{aligned}$$
Table 1 Initial starting points used for the test problems
Table 2 Number of problems and percentage for which each method is a winner with respect to iterations and CPU time

Using the performance profile of Dolan and More [23], we generate Figs. 1 and 2 to show the performance and efficiency of each of the two methods. To better illustrate the performance of the two methods, a summary of the results is presented in Table 2. The summarized data show the number of problems for which each method is a winner in terms of number of iterations and CPU time, respectively. The corresponding percentages of number of problems solved are also indicated

In Figs. 1 and 2, we observed that the curve representing the ADLCG method is above the curve representing the NDFCG method. This is a measure of the efficiency of the ADLCG method compared to the NDFCG scheme.

Fig. 1
figure 1

Performance profile for number of iterations

Fig. 2
figure 2

Performance profile for the CPU time

Similarly, the summary reported in Table 2 indicated that the ADLCG method is a winner with respect to number of iterations and CPU time. The table shows that the ADLCG method solves 95% (76 out of 80) of the problems with less number of iterations compared to the NDFCG method, which solves only 3.75% (3 out of 80). The summarized result also shows that both methods solve 1 problem with the same number of iteration, which translates to 1.25% and is reported as undecided. Also, the summary indicated that the ADLCG method outperforms the NDFCG scheme as it solves 72.5% (58 out of 80) of the problems with less CPU time compared to 27.5% (22 out of 80) solved by the NDFCG. Therefore, it is clear from Figs. 1 and 2 and the summarized result in Table 2 that our method is more efficient than the NDFCG method and better for large-scale nonlinear systems.

5 Conclusion

In this work, we proposed a Dai–Liao conjugate gradient method via modified secant equation for systems of nonlinear equations. This was achieved by finding appropriate values for the nonnegative parameter in the DL method using of an extended secant equation developed from the work of Zhang et al. [64] and Wei et al [54]. Numerical comparisons with some existing methods and Global convergence show that the method is efficient.