Wei-Yao-Liu Conjugate Gradient Algorithm for Nonsmooth Convex Optimization Problems

This paper presents a Wei-Yao-Liu conjugate gradient algorithm for nonsmooth convex optimization problem. The proposed algorithm makes use of approximate function and gradient values of the Moreau-Yosida regularization function instead of the corresponding exact values. Under suitable conditions, the global convergence property could be established for the proposed conjugate gradient method. Finally, some numerical results are reported to show the efficiency of our algorithm.


Introduction
We consider the unconstrained minimization problem where f : R n → R is a nonsmooth convex function. The following problem is the so-called Moreau-Yosida regularization of f [10], which is defined by where λ is a positive parameter and ∥ · ∥ denotes the Euclidean norm. It is well known that problems (1) and (2) are equivalent in the sense that the solution sets of the two problems coincidentally are the same. The function F has some good properties: it is a differentiable convex function, it has a Lipschitz continuous gradient even when the function f is nondifferentiable, and although F is not twice differentiable in general, but the gradient function of F can be proved to be semismooth under some reasonable conditions [8,21]. Based on these features, some algorithms have been proposed for solving (2), see [2,8,21,24]. The proximal methods have been proved to be effective in dealing with evaluating the function value of F (x) and its gradient ∇F (x) exactly at a given point x 404 WYL CG FOR NONSMOOTH CONVEX OPTIMIZATION PROBLEMS [1,3,6,23]. Lukšan [15] and Monjezi [19] proposed the bundle method, which can handle convex and nonconvex f . And many trust region algorithms of minimizing a nonsmooth objective function have been presented, see [5,13,27,30,33]. Recently, Yuan et al. [28,29,32] and Li [16,17] have extended the spectral gradient method and conjugate gradient-type method to solve nonsmooth optimization problem, respectively. Conjugate gradient techniques have been developed for solving large-scale optimization problems recently (see [9,12,29,31] etc.). Motivated by these techniques as well as the Moreau-Yosida regulation (smoothing) approach, we will propose the Wei-Yao-Liu conjugate gradient algorithm for solving a nonsmooth unconstrained convex minimization problem in this paper. The Wei-Yao-Liu (WYL) conjugate gradient method with formula (21) has not only excellent numerical results but also some good properties. First, the β W Y L k always retains nonnegativity while g k ̸ = 0. Second, the WYL method has a nice property which was introduced by Gilbert and Nocedal [9]. This property pertains to the PRP method and ensures that the small steplength would not be too much long so that it is helpful for the global convergence, see [25] for details. Even more, the β W Y L k , like β P RP k , β HS k and β LS k whose numerators share the common g T k (g k − g k−1 ), which can avoid jamming automatically. Further research of the WYL method can be found in [11,12,18,25,26]. The purpose of this paper is to extend Wei-Yao-Liu conjugate gradient algorithm to solve the nonsmooth optimization problem (1). The presented algorithm has the following main attributes: (1) A WYL conjugate gradient algorithm is introduced for nonsmooth problem (1) and smooth problem (2); (2) The search direction satisfies the sufficient descent property; (3) This algorithm possesses the global convergence; (4) Numerical results show that this algorithm is efficient.
This paper is organized as follows. In Section 2, we briefly review some basic results in convex analysis and nonsmooth analysis. In Section 3, we present a WYL conjugate gradient algorithm for (1), prove its global convergence, and then present some numerical results in Section 4. Finally, we have a conclusion section.

Preliminaries
In this section, we review some important and useful results in convex analysis and nonsmooth analysis. The following proposition ( [10], Chapter XV, Theorem 4.1.4) shows some basic properties of the Moreau-Yosida regularization function F .

Proposition 2.1
The function F is finite-valued, convex, and everywhere differentiable with gradient where p(x) is the unique minimizer in (3), that is Moreover, the gradient mapping g : R n → R n is globally Lipschitz continuous with modulus 1 λ for all x, y ∈ R n , i.e., The generalized Jacobian of ∇F (x) and the property of BD-regularity can be found in [4,13,20].

Proposition 2.2
By the Rademacher theorem and the Lipschitzian property of ∇F (x), for each x ∈ ℜ n , the set of generalized Jacobian matrices is nonempty and compact, where D g = {x ∈ ℜ n : g is differentiable at x}. (ii) If g is BD-regular at x, which means all matrices V ∈ ∂ B g(x) are nonsingular, then there exist constants µ 1 > 0, µ 2 > 0 and a neighborhood Ω of x such that for all y ∈ Ω The next proposition ( [10], Chapter XV, Theorem 4.1.7) formally states the equivalence between problems (1.1) and (1.2).

Proposition 2.3
The following statements are equivalent: It is obvious that F (x) and g(x) can be obtained through the optimal solution of (3), but p(x) is difficult and sometimes impossible to be solved precisely, so we often use the approximate values of F (x) and g(x) in the real computation. Suppose that for each x ∈ R n and any ε > 0, there exists an approximation vector p a (x, ε) ∈ R n of the unique minimizer p(x) in (3) such that An implementable procedure for finding such an approximate minimizer may be found, for example, in [7]. The existence and computation theorem of p a (x, ε) are presented as following.
where α k > 0 is a stepsize and υ k is an approximate subgradient at x k , i.e., (ii) Conversely, if (9) holds with ε k given by (11), then (10) holds: We can use p a (x, ε) to define approximations of F (x) and g(x) by and respectively, where The next proposition (see [8]) states that we can compute the approximations F a (x, ε) and g a (x, ε) by choosing the parameter ε to be sufficiently small such that the approximations remain as close as possible to F (x) and g(x).

Proposition 2.5
Let p a (x, ε) be a vector that satisfies (14), and let F a (x, ε) and g a (x, ε) be defined by (12) and (13), respectively. Then, we obtain A remarkable property of g a (x, ε) is given as follows.

Proposition 2.6
[Lemma 4.3 in [22]] There exist positive constants l and L, and a positive integer k 0 such that and for all k ≥ k 0 .
By (15) and (16), it is easy to deduce that

Algorithm
By using the Moreau-Yosida regularization (smoothing) approach and a nonmonotone line search technique, we propose a Wei-Yao-Liu conjugate gradient algorithm for solving a nonsmooth unconstrained convex minimization problem in this section. First we use the tool of the Moreau-Yosida regularization to smooth the function, then make using of the approximate values of the function F and its gradient g instead of their exact values in WYL conjugate gradient algorithm. We first recall the Wei-Yao-Liu conjugate gradient method for unconstrained optimization problem: where f : R n → R is continuously differentiable and its gradient g is available. The Wei-Yao-Liu conjugate gradient method [25] is defined by where x k is the current iteration point, α k > 0 is the steplength, and d k is the search direction determined by Now we state the steps of the algorithm as follows.
Step 3: Choose a scalar ε k+1 satisfying 0 < ε k+1 ≤ min{τ k , τ k ∥g a (x k , ε k )∥ 2 }, and compute the step size α k by the following nonmonotone Armijo-type line search: and i k is the smallest nonnegative integer such that (22) holds.
Step 5: Update J k+1 by the following formula Step 6: Compute the search direction d k+1 by (20) with g k and g k+1 replaced by g a (x k , ε k ) and g a (x k+1 , ε k+1 ), respectively.

Remarks.
(i) The definition of 0 < ε k+1 ≤ min{τ k , τ k ∥g a (x k , ε k )∥ 2 } in Algorithm 3.1, together with (13) and (14) deduce then with the decreasing property of ε k+1 , the assumed condition (22) is motivated by Zhang and Hager [34]. It is not difficult to see that J k+1 is a convex combination of J k and F a (x k+1 , ε k+1 ).
The choice of ρ controls the degree of nonmonotonicity. If ρ = 0, then the line search is the usual monotone Armijo line search. If ρ = 1, then J k = C k , where is not a difficulty in practice, despite the fact that the constant L is not known. For example, we can set . We need the following assumptions which are given in papers [13,29,30,32].
Assumption A. (i) The sequence {V k } is bounded, i.e., there exists a positive constant M such that where the matrix V k ∈ ∂ B g(x k ).
(ii) F is bounded from below.
The following lemma shows that Algorithm 3.1 satisfy the sufficient descent property.

Lemma 3.2
Let Assumption A holds and the sequence {x k , d k } be generated by Algorithm 3.1. Then
Using (26), (27) and Assumption A, similar to Lemma 1.1 in [34], it shows that Algorithm 3.1 is well-defined in the following lemma. For the proof is essentially the same as Lemma 1.1 in [34], we omit its proof here.

Lemma 3.3 Suppose that Assumption A holds. Then, for the iteration generated by Algorithm 3.1, we have
Moreover, there exists α k satisfying Armijo conditions of the line search update.

Lemma 3.4
Suppose that Assumption A holds. Let {(x k , ε k )} be the sequence generated by Algorithm 3.1. Suppose that ε k = o(α 2 k ∥d k ∥ 2 ) holds. Then, there exists a constant m 0 > 0 such that Proof Let α k satisfies the nonmonotone Armijo-type line search (22). We proceed by the method of contradiction and suppose that lim inf k→∞ α k = 0. By passing to a subsequence if necessary, we may assume that α k → 0. Then, by the line search, Combining this with F a (x k , ε k ) ≤ J k ≤ C k in Lemma 3.3, Proposition 2.5 and Taylor's formula, we have

409
where u k ∈ (x k , x k+1 ). It follows from (29) that where the equality follows from ε k = o(α 2 k ∥d k ∥ 2 ), the second inequality follows from (26), Proposition 2.5(iii) and ε k+1 ≤ ε k , and the last inequality follows (27). Dividing both sides by α k and passing to limit, we see that which is impossible, so the conclusion is obtained.
The following theorem establishes the global convergence of Algorithm 3.1.

Theorem 3.1
Let {x k } be generated by Algorithm 3.1 and suppose that the conditions in Lemma 3.4 hold. Then, lim k→∞ ∥g(x k )∥ = 0, and any accumulation point of {x k } is an optimal solution of (1).

Proof
In order to complete this proof, we first show that Suppose that (31) is not true, then there exists constants ϵ 0 > 0 and k 0 > 0 such that Since F is bounded from below by Assumption A(ii) and F (x k ) ≤ F a (x k , ε k ) for all k by Proposition 2.5, we see that F a (x k , ε k ) is bounded from below. Together with F a (x k , ε k ) ≤ J k for all k by Lemma 3.3, it shows that J k is also bounded from below, and ∑ k>k0 On the other hand, by (22), (26) and (28), we have Therefore, it follows from the above inequality and (23) that this is a contradiction with (33). Thus, (31) holds. By Proposition 2.5(iii), we obtain Since ε k → 0 by the construction of Algorithm 3.1, thus Let x * be an accumulation point of {x k }, without loss of generality, there exists a subsequence {x k } K satisfying lim k∈K, k→∞ From properties of F (x), we have g(x k ) = (x k − p(x k ))/λ. Then, by (34) and (35), x * = p(x * ) holds. Therefore x * is an optimal solution of (1). The proof is complete.

Numerical Results
In this section, we perform numerical experiments to test the performance of the given algorithm, then compare it with the MPRP gradient method in [29] and the proximal bundle method (PBL) in [15]. All the nonsmooth problems of Table 1 can be found in [14]. Table 1 contains problem names and optimum function values. The codes were written in MATLAB R2010a and run on a personal computer with an Intel Core 2 Duo CPU at 2.8 GHz and 2 GB of memory. We set ρ = 0.75, σ = 0.9 and adopt the termination condition ∥g a (x, ε)∥ ≤ 10 −5 . The quality of the objective function value at terminationf is measured by the relative error to f ops , i.e., The subproblem (14) involves the finding of a vector p a (x k , ε k ) for given x k and ε k . We use the Nelder-Mead simplex program solver fminsearch.m from the Matlab optimization toolbox to perform the subproblem (14), this subalgorithm stops if the maximum coordinate difference between the current best point and the other points in the simplex is less than or equal to 10 −3 , and the corresponding difference in function values is less than or equal to 10 −3 . The subalgorithm will also stop if the maximum number of iterations or function evaluations is larger than two hundred. Firstly, we give some insight into the behavior of WYL conjugate gradient algorithm with different approximation parameter ε. In this test, we fixed λ = 5. The results are listed in Table 2, which contains the number of the iterations (NI), the number of the function evaluations (NF) and the relative error (RelErr). Viewing from the table, we conclude that the proposed algorithm works reasonably well for all the test cases. This table also illustrates that the effectiveness of the algorithm is improved while approximation parameter ε is small. Secondly, to specifically illustrate the performance of WYL conjugate gradient algorithm, we present two test results on Problems 1-10 in terms of number of iterations and the optimum function values error as the regularization parameter λ varies from 1 to 10 in Figure 1. In this test, we fixed τ k = 1 (k+2) 2 . From Figure 1, we can see that Problems 2, 3 and 8 are more sensitive to the regularization parameter.  Finally, we compared the performance of the proposed algorithm to the algorithm MPRP and PBL. In the test, we fixed τ k = 1 5(k+2) 5 . As we can see from Figure 1 that Problems 1-10 have different sensitiveness on the regularization parameter, we set λ = 7 for Problem 3 and 5, λ = 2 for Problems 7 and 9, λ = 1 for Problems 4 and 10, λ = 10 for other Problems. We present three comparison results in terms of number of iteration, number of function evaluations and the final objective function value (f (x)) in Table 3. From the numerical results in Table 3, we can conclude that these three methods are effective for nonsmooth optimization problems, and WYL conjugate gradient algorithm performs best of all the methods in term of the iteration number, the number of function evaluations and the final objective function value.

Conclusions
By making use of the Moreau-Yosida regularization, a nonmonotone line search technique of [34] and a new formula in [25] developed by the authors earlier, we presented a Wei-Yao-Liu conjugate gradient algorithm for solving nonsmooth convex optimization problems. Our algorithm satisfies the sufficiently descent property, and the corresponding search direction belongs to a trust region automatically. The global convergence of the given algorithm was established under suitable conditions and the effectiveness of the algorithm can be observed from the result of numerical experiments.