A New Family of Hybrid Conjugate Gradient Methods for Unconstrained Optimization

The conjugate gradient method is a very efficient iterative technique for solving large-scale unconstrained optimization problems. Motivated by recent modifications of some variants of the method and construction of hybrid methods, this study proposed four hybrid methods that are globally convergent as well as computationally efficient. The approach adopted for constructing the hybrid methods entails projecting ten recently modified conjugate gradient methods. Each of the hybrid methods is shown to satisfy the descent property independent of any line search technique and globally convergent under the influence of strong Wolfe line search. Results obtained from numerical implementation of these methods and performance profiling show that the methods are very competitive with well-known traditional methods.


Introduction
From the industrial perspective and the associated processes such as facility location, production, distribution, logistics, etc, the applications of optimization techniques can not be overemphasized. For instance, their applications take central place in the well studied vehicle routing problem, a problem frequently encountered by the distribution units of manufacturing industries (see [2]). Their importance has necessitated the design of many techniques that can be deployed for solving both constrained and unconstrained optimization problems (see [11,12]). In this paper, however, we consider an optimization technique known as the conjugate gradient (CG) method for solving unconstrained optimization problems of the form where f : R n → R is a continuously differentiable objective function for which the gradient can be evaluated. The nonlinear CG methods constitute a class of iterative scheme for solving (1) effectively especially for large values of n. The CG scheme for (1) can be written as follows:

A NEW FAMILY OF HYBRID CONJUGATE GRADIENT METHODS
where α k is the step-length evaluated by an appropriate line search process, and d k is the search direction given by the following rules: where g k = g(x k ) is the gradient of f at x k , β k is the CG update parameter. Usually β k is chosen such that the iterative scheme (2) and (3) reduces to the linear CG method when f is a strictly convex quadratic function and α k is determined by a one-dimensional exact line search technique.
The past few decades have seen many variants of the nonlinear CGM developed, some of which includes Hestenes and Stiefel (HS) [16], Fletcher and Reeves (FR) [13], Polak, Ribier and Polyak (PRP) [27,28], Liu and Storey (LS) [24], Dai and Yuan (DY) [7] and Hager and Zhang (CG DESCENT (N)) [14]. The basic difference among these methods exists in the value of their as may be observed in their respective value as follows: where ∥ · ∥ denotes the Euclidean norm and y k = g k − g k−1 .
Experiments have shown that for strictly convex function f all the above methods are equivalent with exact line search. However, for non-quadratic objective functions, the β k of these methods perform differently [1,15]. Establishing the global convergence results of these methods usually requires that the step-length α k satisfies some inexact line search criteria because most exact line search approaches are computationally expensive. The most commonly used among the approximate line search criteria is the strong Wolfe line search (SWLS) criterion which is given as follows: and with 0 < δ < σ < 1. The weaker version of this condition combines (4) with Dai and Yuan [6] gave a generalization (GWLS) of the above conditions as a combination of (4) and where 0 < δ < σ 1 < 1 and σ 2 ≥ 0. Note that when σ 1 = σ 2 = σ, (4) and (7) reduces to (4)- (5). Similarly, when σ 1 = σ and σ 2 = inf, the combination of (4) and (7) reduces to (4) and (6). In [14], Hager and Zhang introduced an approximation of the Wolfe conditions by replacing (7) with such that 0 < δ < 1 2 and δ < σ < 1. This line search procedure implemented on the CG DESCENT method (See [14]) is among the fastest known CG algorithm.
However, an important requirement for deploying the above CG variants and others not mentioned here is the existence of descent property during the implementation of any of the line search techniques described above. A CG algorithm is said to satisfy the descent property if A more natural way of guaranteeing descent for CG algorithms apart from (9) is by using the sufficient descent property. This is given by

401
where c is a positive constant. For more information on the theories and applications of CG methods, we refer the reader to [20,21,22,23,25,26,32]. Each CG method has very striking features that makes it adaptable to some sets of unconstrained problems. For instance, the FR and DY methods have been identified as having the best convergence results (See [3,8]). However, for general objective functions, the two methods have poor computational power. Conversely, the HS and PRP methods have good computational strength even though they exhibit less desirable convergence results. These contrasting features have led to the development of hybrid methods which are constructed with the aim of overcoming existing deficiencies in two or more methods. For instance, a well-constructed hybrid method of FR and PRP should perform well computationally as well as yield good convergence properties. These advantages associated with hybrid CG methods are the motivations behind this study. In this paper, new hybrid CG methods are proposed through the combination of recently proposed variants of the PRP and HS methods. Even though the two methods are specifically known to perform well computation-wise, it is important to note that their hybrids exhibit strong global convergence properties.
The rest of this paper is organized as follows. In Section 2, a review of some recent methods leading to the proposed methods is presented. In Section 3, the new methods are highlighted along with a suitable algorithm for testing their computational strength. The descent and global convergence properties of the proposed methods in are described in Section 4. Some preliminary numerical results are presented in Section 5, while some concluding remarks are given in Section 6.

Related Studies
In this section, we trace the development of methods that directly relates to the proposed hybrid methods. The PRP and HS methods form the basic building blocks for all that is discussed hereafter. In [30], a new variant of PRP method, named VPRP method, was proposed with the CG update parameter β k given by The study assumed that the method satisfies (10) and established its global convergence under different line search techniques, ranging from exact to approximate methods. Specifically, Huang et al. [17] demonstrated the global convergence of the method under the influence of strong Wolfe line search (SWLS) (4-5) with σ < 1 4 . Motivated by the results in [30], Shengwei et al. [29] extended the approach to suggest CG variants of the HS and LS methods. Particularly, their proposed variant of HS method has the value of β k given by This method was shown to possess the sufficient descent property and globally convergent with σ < 1 3 . Zhang [34] modified (11) and (12) to give new variants, which are referred to as MVPRP and MVHS here, respectively, with the update parameters given as and Both methods of (13) and (14) satisfy (10) and were shown to be globally convergent for SWLS (4-5) with σ < 1 2 . The numerical results reported for these two methods show considerable improvement compared to (11) and (12). In fact, Du et al. [10] noted that 0 ≤ β M V P RP k ≤ β F R k , and thus according to [33], the MVPRP method possesses the descent property (9) and is globally convergent under GWLS (4) and (7)   . Their variant, which we shall denote as HPRP, has its β k given as What was said of MVPRP above is also applicable to HPRP since 0 ≤ β HP RP k ≤ β F R k . A similar study in [31] produced a new variant of HS method referred to as WHS method in this study. In this case β k is given as This method satisfies (10) and is globally convergent for the SWLS with σ ∈ ( 0, 1 2 ) . Du et al. [10] later showed that WHS method is very closely related to the DY method. Particularly, they showed that β W HS k ≤ β DY k , for all k ≥ 1 and thus claimed that the method has a similar global convergence property like the DY method. In other words, the method is a descent method and globally convergent under the Wolfe line search (4) and (6) with σ < 1.
Very recently, Du et al. [10] proposed four modified CG methods. Here, we only mention two, which are of interest in this paper. The first of these, a variant of the PRP method which we shall refer to as DPRP method, has its β k value given as follows: and the second, referred to as DHS method, is a variant of the HS method and has its β k given as follows: It was shown that the DPRP and DHS methods possess the sufficient descent property and are globally convergent under SWLS with 0 < σ < 1 4 and 0 < σ < 1 3 , respectively. It was further shown that the two methods are always nonnegative under the same assumptions of SWLS and the respective intervals of convergence above.
Next, the newly proposed hybrid methods are presented by projecting the methods described above. The development of these methods was motivated by the observation that numerical results reported for these methods revealed that the methods are not so efficient compared to some classical methods, especially the CG DESCENT method.

Hybrid Conjugate Gradient Methods and Implementation Algorithm
In this section, the hybrid methods of PRP, HS, VPRP, VHS, MVPRP, MVHS, HPRP, WHS, DPRP and DHS are presented. The following are the four proposed hybrid methods: i. The hybrid of DPRP, DHS, PRP and HS, where β k is given as ii. The hybrid of DPRP, DHS, HPRP and WHS, where β k is given as iii. The hybrid of DPRP, DHS, VPRP and VHS, where β k is given as iv. The hybrid of DPRP, DHS, MVPRP and MVHS, where β k is given as These methods exhibit the properties of both PRP and HS methods, and the analyzes that follow in the next sections specifically show that the proposed methods are always descent independent of the step-length line search procedure adopted. Based on (19)-(22), the following algorithm is proposed for the analyzes and numerical implementation of the methods.

Descent and Global Convergence Properties of Algorithm 1
In this section, the properties of Algorithm 1 are presented. The analyzes begin by establishing the descent properties of each method followed by their global convergence results. The results that follow establish the descent properties of the sequence of search directions generated by Algorithm 1. Interestingly, these results were obtained independent of any line search procedure. (1) is a smooth function and d k is generated by Algorithm 1.
To show that g T k d k < 0 for all k, the proof is divided into four different cases for each of the hybrid method. Note that if β k = 0, then, from (3), Thus, it is assumed that β k ̸ = 0 in all the cases.

A NEW FAMILY OF HYBRID CONJUGATE GRADIENT METHODS
Case I: (23), the first equality was obtained by finding the inner product of (3) and substituting β DHS k for β k . The first inequality is obvious because > 1 from the first assumption above, while the second was obtained from the fact that Thus, starting from (3), we obtain the following result: Case III: Proceeding from the inner product of (3) with g k gives The second assumption in this case allows us to (3) as in other cases, we obtain The descent property is satisfied in (23)- (26). Hence, the sequence of search directions generated by the DPH method satisfies the descent property.

DHW Method
Case I: Clearly, this result may be established as in Case I of the DPH method.
The proof of this result is similar to Case II in the DPH method.
Starting from (3) yields the following result: The first inequality was obtained from the fact that g k − g k−1 ≥ |g k − g k−1 | (from the first assumption) and d T k (g k − g k−1 ) > 0 (the second assumption). These two inequalities enforced the positivity of the second term in the expression of the last equality. The last inequality was obtained from the fact that The inner product of (3) and after some algebra give Therefore, since g T k d k < 0 for all k ≥ 1 in all the cases, the DHW method generates descent directions.

DV Method
Case I: The proof of the descent property of d k in this case is similar to Case I of the DPH and DHW methods. Case II: then β k is yielded by (17). The proof of the descent property of in this case is also similar to Case II of the DPH and DHW methods.

A NEW FAMILY OF HYBRID CONJUGATE GRADIENT METHODS
Case III: These conditions yield beta DV k = beta V HS k and beginning with (3), the following result is obtained.
This result is explained as follows: the first supposition holds if g T k g k−1 > 0, and so we have 0 < cos θ k < 1, where θ k is the angle between g k and g k−1 . Thus, it is easy to see that (∥g k ∥ / ∥g k−1 ∥) g T k g k−1 = ∥g k ∥ 2 cos θ k , thus establishing the last equality. Case IV: and beginning with (3), the following result is obtained.
Results in (29) and (30) together with cases I and II show that the DV method generates descent directions.

DM Method
Case I: In this case, g T k g k−1 < 0, similar to results in Cases I of the DPR, DHW and DV methods above.
Case II: In this case also g T k g k−1 < 0 as in Cases II of the DPR, DHW and DV methods as earlier proved.
Case III: (14). It is clear that |g T k g k−1 | > 0, and so β M V HS k < β DY k . This and the inner product of (3) with g k give The last inequality follows as d T Case IV: The first inequality follows the fact that β M V P RP k ≤ β F R k as ∥g k−1 ∥ 2 > 0, and the second inequality holds according to the second assumption. Hence, the DM method is a descent method. Therefore, the descent property is satisfied for all the methods.
The following important result can easily be inferred from Theorem 1. This result is important in establishing the global convergence of the Hybrid Algorithm. The result is shown for only the DPH method as the same approach is valid for the other three methods.

Lemma 1
The Proof From (19) it is known that β DP H k ≥ 0, which establishes the first part of the inequality. Now, suppose β DP H k = 0 . By assuming β DP H k > 0, the following four cases are established. Case I: Hence, Theorem 1 together with (23) gives Hence, Theorem 1 together with (24) gives Hence, Theorem 1 together with (25) gives Hence, Theorem 1 together with (26) gives Next, the global convergence result of Algorithm 1 is established. The result is established under the following assumptions.
Assumption 1: 1. Bound on the Objective Function: f (x) is bounded from below on the level set Assumption 2: Lipschitz Condition: within some neighbourhood N of Ω, f (x) is continuously differentiable, and its gradient is Lipschitz continuous, that is, for all x, y ∈ N , there exists a constant L ≥ 0 such that Together with Assumptions 1 and 2, the following important results is usually employed to establish the global convergence results of CG algorithms.

Lemma 2
Let an iterative scheme of the form (2) where d k is a descent direction and α k satisfies the Wolfe line search conditions (4) and (6). If assumptions 1 and 2 hold, then, Lemma 2 is the famous Zoutendijk condition, the proof of which may be found in [7]. Using Lemma 2 and the above assumptions, the following global convergence result of the Algorithm 1 is presented.

Theorem 2
Let d k be generated by the iterative rules (2)-(3) with β k computed from Line 5 of Algorithm 1. If Assumptions 1 and 2 hold, then by Lemmas 1 and 2, lim k→∞ inf ∥g k ∥ = 0.

Proof
Suppose by contradiction that the conclusion does not hold, that is, lim k→∞ inf ∥g k ∥ ̸ = 0. However, since ∥g k ∥ > 0, then there exists a constant ζ > 0 such that ∥g k ∥ ≥ ζ, ∀k. Starting with (3) gives Dividing both sides of (35) by ( d T k g k ) 2 and using Lemma 1, the following was obtained

Numerical Experiments
In this section, we describe the numerical implementations of the new hybrid methods and compare their performances with other existing methods based on four indicators. The indicators for profiling the performance of our methods with other methods are: number of iterations (Table 1), the final value of the objective function (Table  2), the infinite norm of the final value of the gradient (Table 3) and the CPU time of executing Algorithm 1 ( Table  4). All the test functions were drawn from [4], many of which are contained in the CUTEr library described in [5]. Hybrid CG Algorithm was coded and implemented on Matlab R2016a installed on a PC with Windows 10 OS and 2G RAM. Varying dimensions of 500, 1000, 5000 and 10000 were used except for very few cases where, due to problem peculiarity, we used much-reduced dimensions to obtain complete result output by all the methods. The new methods were compared with other existing methods based on the indicators mentioned earlier adopting the Dolan and Mor [9] performance profile theory. The theory outlines the comparison of the performance of a set of methods S on a set of problem P In this case, for instance, S denotes the set of different CG methods implemented on our algorithm while P is the set of unconstrained test functions. The theory is described in the following paragraphs.
Suppose there exist n s methods and n p problems, for each problem p and method s, then, r p,s is the computing cost (number of iterations, CPU time, function or gradient evaluations) required to solve problem p by method s. If I p,s is an attribute (say, number of iterations or function evaluation, etc) for problem p ∈ P by method s ∈ S, then the comparison between the different methods is based on the performance ratio given by Equation (37) gives the required number of iterations for solving problem p ∈ P with method s ∈ S. Consequently, if there exists a parameter r K , which is large enough so that r K ≥ r p,s for all p, s. Equality holds, that is r K = r p,s , only when the chosen method s does not solve the problem p. Based on (37), the cumulative distribution function for the performance ratio r p,s , is defined by where | · | represents cardinality, ρ s (τ ) is the probability, in relation to method s, that r p,s is within a factor τ ∈ R n . At τ = 1, the value of ρ s (τ ) is the probability that the method will out-performs the other methods. The comparison of methods using the theory described above is outlined for each of the performance indicators as follows: Set 1: DPRP, DHS, DPH, DHW, DV, DM and CG DESCENT and Set 2: VPRP, VHS, DPH, DHW, DV, DM and CG DESCENT. The CG DESCENT method, being one of the most efficient CG methods, was selected in both sets to profile the performance of the proposed methods.
At the start of the simulation, the value of α k at k = 0 is set to zero (0) while other values of α k are computed using SWLS (4)- (5). The stopping criterion is given by ∥g k ∥ ≤ ϵ, where ϵ = 10 −6 . Other parameters of the algorithm are set as follows: δ = 0.0001 and σ = 0.9. The numerical results are presented in Tables 1-4. Note that in Tables (3) and (4), the entries of a few cells are NaN meaning undefined or unrepresentative floating numbers. In instances where this occurs, the numbers of iterations and the CPU times were still recorded in Table 1 and Table  2, respectively.
The following Figures 1-8 produced from the numerical values in Tables 1-4 indicate the performance profiles of our hybrid methods against some non-hybrid methods. The performance profile with respect to the numbers of iterations is illustrated in Figures 1. Clearly, in Figure 1, DPH and DV gave better performance than DHS, DPRP, CG DESCENT (Figure 1 (left) at τ = 1 and τ = 2) and VPRP, VHS, CG DESCENT (Figure 1 (right) at τ = 1 and τ = 2), respectively. The comparison based on CPU time of execution (see Figures 2) shows that DPH and DV methods remain very efficient and highly competitive (see Figure 2 (left) at τ = 2). Similarly, according to Figure 2 (right), DPH and DV methods perform better than the CG DESCENT method. In term of the function value, DM produced the best performance ( Figure 3). From Figure 4, the average performance of DV and DM are approximately the better methods.

Conclusion
This study, based on previous ideas of hybrid CG method construction and recently proposed CG methods of the PRP and HS family, has described four hybrid CG methods which satisfy the descent property as well as globally convergent. Numerical experiments revealed that two of the proposed methods, DPH and DV, are computationally efficient compared to non-hybrid methods. As a major contribution to knowledge, the proposed methods (especially, DPH and DV) gave better performances than the CG DESCENT method, acclaimed to be one of the best known CG methods. The DHW and DM methods which are less efficient computation-wise also compete reasonably with the non-hybrid methods. In fact, it was observed that the weak performance of these methods could be addressed if hybridized with classical methods such as the FR and DY methods which are known to possess good convergence properties. Therefore, as part of future work, new hybrid methods comprising of the methods introduced in this paper and those exhibiting good global convergence properties would be developed, tested numerically and compared with highly efficient methods.