An improved partial bundle method for linearly constrained minimax problems

In this paper, we propose an improved partial bundle method for solving linearly constrained minimax problems. In order to reduce the number of component function evaluations, we utilize a partial cutting-planes model to substitute for the traditional one. At each iteration, only one quadratic programming subproblem needs to be solved to obtain a new trial point. An improved descent test criterion is introduced to simplify the algorithm. The method produces a sequence of feasible trial points, and ensures that the objective function is monotonically decreasing on the sequence of stability centers. Global convergence of the algorithm is established. Moreover, we utilize the subgradient aggregation strategy to control the size of the bundle and therefore overcome the difﬁculty of computation and storage. Finally, some preliminary numerical results show that the proposed method is effective.


Introduction
We consider the linearly constrained minimax problem where the objective function f (x) = max{f j (x), j ∈ J} with J = {1, . . ., m}, the component functions f j (j ∈ J) : R n → R are convex but not necessarily differentiable, and Denote by ⟨x, y⟩ = x T y the inner product of vectors x and y.The feasible set of problem ( 1) is denoted by Minimax problems are a special and important class of nonsmooth optimization problems, whose essence is to make an "optimal" decision under the "worst" case.They widely appear in many fields, such as location problems [1], portfolio strategy [23], optimizing energy consumption [2], etc.Furthermore, many mathematical problems can be transformed to minimax problems, such as ℓ 1 and ℓ ∞ approximation [21], system of nonlinear equations [26].

85
The case where the component functions f i (i ∈ I) are all continuously differentiable is well studied, see e.g., [17,24,25].In what follows, we particularly concern the case where the component functions are not necessarily differentiable.In order to reduce the number of component function evaluations, Gaudioso et al. [6] proposed an incremental bundle method for solving convex unconstrained minimax problems, in which a partial cutting-planes model of the objective function f is introduced.Hare & Macklem [7] and Hare & Nutini [8] proposed a derivativefree gradient sampling method for solving unconstrained minimax problems by applying the gradient sampling idea of Burke et al. [4].Liuzzi et al. [18] presented a derivative-free method for linearly constrained minimax problems by using a smoothing technique based on an exponential penalty function.Jian et al. [9] proposed a feasible descent bundle method for general inequality constrained minimax problems by using the partial cutting-planes model of [6].Tang et al. [22] proposed a proximal-projection partial bundle method for convex constrained minimax problems by combining the partial cutting-planes model of [6] with the proximal-projection idea of [13,15].
In this paper, we propose an improved partial bundle method for solving linearly constrained minimax problem (1).Its main interesting features are summarized as follows.
(i) Our method not only extends the method of [6] to linearly constrained case, but also improves the descent test criterion used in [6] and [9] (see Remark 1 for details).
(ii) Compared to [9], our method can easily deal with linear constraints, and guarantee that all the trial points are feasible.
(iii) Compared to [22], at each iteration, only one quadratic programming (QP) subproblem needs to be solved to obtain a new trial point, while two subproblems are required to solve in [22].
(iv) By making use of the partial cutting-planes model, the number of component function evaluations may be reduced greatly.The objective function is monotonically decreasing on the sequence of stability centers.
(v) The subgradient aggregation strategy [11] is used to control the size of the bundle and therefore overcome the difficulty of computation and storage.
(vi) Global convergence of the algorithm is established, and some preliminary numerical results show that our method is effective.
The paper is organized as follows.In section 2, we review some previous methods which are closely related to our method.In section 3, we present the details of our algorithm and discuss its properties.Global convergence of the algorithm is established in section 4. Improvement of the algorithm by subgradient aggregation is given in Section 5. Preliminary numerical results are reported in sections 6. Conclusions are presented in section 7. We denote by ∥ • ∥ the Euclidean norm in R n .The subdifferential (in convex analysis) of a function f at any point x is denoted by ∂f (x), and each element g ∈ ∂f (x) is called a subgradient.

Preliminaries
In this section, we review briefly the cutting-planes method, partial cutting-planes model and bundle methods.For simplicity, we consider momentarily the following unconstrained problem where f : R n → R is convex but not necessarily differentiable.

Cutting-planes method and partial cutting-planes model
Let k be the current iteration index, and y l , l ∈ L k := {1, • • • , k} be given points (generated in previous iterations) with subgradients g l ∈ ∂f (y l ).The linearizations of f (x) at y l are given by The cutting-planes method [5,10] uses the following cutting-planes (piecewise affine) model of f at the k-th iteration: which is a lower approximation to f by convexity, i.e., f k cp (x) ≤ f (x).The new point y k+1 is generated by solving which is equivalent to the linear programming If the cutting-planes method is applied directly to the objective function of problem (1), then at each point y l , we need to calculate the objective value f (y l ), and therefore we have to evaluate all the values of the component functions f j (j ∈ J) at y l .This is time-consuming if the number of the component functions is large.
In order to reduce the number of component function evaluations, Gaudioso et al. [6] proposed that at each point y l just evaluate one of the component functions, say f j l (y l ), for some j l ∈ J, along with a subgradient g l j l ∈ ∂f j l (y l ), and then define the linearizations: Therefore, the so-called partial cutting-planes model is defined by It also holds that

Bundle methods
Bundle methods [16,27,3] can be considered stabilized variants of cutting-planes method.Let y l , l ∈ L k be trial points generated in past iterations.Bundle methods keep memory of these points, their function values and subgradients in a bundle of information: and a point x k (called stability center) which has the "best" objective value obtained so far.Proximal bundle methods [12,14] are a typical class of bundle methods, which generate a new trial point y k+1 by solving where f k B built on the information in B k is a piecewise affine model of f , and t k > 0 is proximal parameter that controls the distance from x k to y k+1 .
If the function f achieves sufficient decrease at y k+1 compared to f (x k ), then a descent step is declared, and the stability center is updated by x k+1 = y k+1 ; otherwise a null step is declared, and set x k+1 = x k .

The algorithm
In this section, combining the partial cutting-planes model and the idea of proximal bundle methods, we propose an improved partial bundle method for problem (1).The partial cutting-planes model (3) would be applied to the objective function of problem (1).
Let x k ∈ X be the current stability center.Define the linearization errors: which are nonnegative by convexity.The bundle set (4) can be rewritten as From ( 3) and ( 5), we have We consider the following subproblem where t k > 0 is proximal parameter.From ( 6), we know that the solution y k+1 of ( 7) is also the solution to Letting d = y − x k , we can equivalently consider the following QP subproblem It is obvious that (z, d) = (0, 0) is a feasible solution of QP(B k ).Moreover, QP(B k ) has a unique optimal solution, denoted by (v k , d k ).Therefore, we have The Lagrangian dual problem of QP(B k ) has the form Let (λ k , µ k ) be the optimal solution of DP(B k ).It is not difficult to verify the following relations: where Now we present the details of our algorithm.
and calculate g k and ϵ k by ( 9) and (10), respectively.
Step 4. Descent test.Extract any index h from the function index set J. If then set , and update the bundle by where Restore the function index set by setting J = {1, . . ., m}.Let t k+1 = t k , k := k + 1, and return to Step 1.
Before giving some comments on Algorithm 1, we first present three lemmas as follows.
Lemma 1 Let x k , d k , g k and ϵ k be generated by Algorithm 1. Then Proof From (2) and ( 5), we have Stat., Optim.Inf.Comput.Vol.On the other hand, for x ∈ X, we have Multiplying both sides of ( 12) and ( 13) by λ k l and µ k i , respectively, and summing up for l ∈ L k and i ∈ I, we have This together with Cauchy-Schwarz inequality proves the lemma.
Lemma 2 (Cut Property).Suppose that and the bundle is modified by where

Proof
From the construction of B k+1 , we know that subproblem QP(B k+1 ) has a constraint of the form However, from (14), it follows is not feasible for QP(B k+1 ), and the proof is complete.
Based on the lemmas above, we give some comments on Algorithm 1.

Remark 1
(1) In Step 2, if both ∥g k ∥ and ϵ k are "sufficiently small", then from Lemma 1, Algorithm 1 can stop, and we accept the current stability center x k as an approximate optimal solution.If ∥g k ∥ is small but ϵ k is large, then we consider that the partial cutting-planes model is "bad", and therefore a bundle reset is made.If ∥g k ∥ is not small, then we accept y k+1 as a new trial point.
(2) In Step 3, a bundle reset implies that the model is not reliable enough, so we decrease the proximal parameter such that a "closer" trial point is obtained.
(3) In Step 4, the rule of selecting the index h ∈ J dose not affect the theoretical analysis, but suitable rules are helpful to improve the numerical performance (see [6,22] for more detailed discussion).
(4) The descent test criterion ( 11) is proposed in [22], which is simpler than the one of [6] in the sense that it does not contain a problem-data-independent parameter.In fact, [6] (essentially) used the following criterion where θ k > 0 is a problem-data-independent parameter satisfying θ k /t k < η 2 .This is somewhat restricted such that it may be not easy to choose the parameters θ k and t k numerically.Furthermore, the constant β in ( 11) is very helpful to improve numerical performance.The numerical results in [22] show that the criterion (11) performs better than (18).
(5) In Step 4, if there exists an index h such that (11) holds, then a new cutting-plane built on the component function f h at y k+1 is added, which can improve the model significantly (see Lemma 2).
(6) In Step 5, if J = ∅, then Algorithm 1 enters Step 6.From Lemma 3, we know that the new trial point y k+1 satisfies the feasible descent property, so the stability center is updated by this point.

Global convergence
In this section, we establish the global convergence of Algorithm 1.The following basic assumption is required.
From Lemma 3, we know that the sequence {f (x k )} is monotonically decreasing.So the sequence {x k } of stability centers belongs to the level set Γ.
Algorithm 1 must take only one of the following two cases.
(1) The algorithm loops between Step 1 and Step 5, generates null steps, and the stability center does not change; (2) The algorithm enters Step 6, generates a descent step, and the stability center is updated.
In what follows, we will show that Algorithm 1 is well defined, i.e., the number of loops between Step 1 and Step 5 is finite, and therefore Algorithm 1 must enters Step 6 after a finite number of iterations.In particular, we will show that the following claims hold when the stability center does not change.The following lemma provides an upper bound for the number of bundle resets, and therefore claim (a) holds.

Lemma 4
Suppose that Algorithm 1 reaches a certain stability center x k which remains unchanged.Then Algorithm 1 passes finitely many times through Step 3. In particular, let Nk be the number of times passing through Step 3, then there  (19) where notation ⌈A⌉ rounds A to the nearest integer greater than or equal to A.

Proof
From the statement of the lemma, we know that x k ≡ x k, ∀k ≥ k.Denote k r (≥ k) the r-th time the algorithm enters Step 3.After entering Step 3, the bundle is reset by where g k ∈ ∂f (x k), and the proximal parameter is updated by For k r + 1 ≤ k < k r+1 , from (8) we have On the other hand, by the constraints of QP(B k ), we have which together with (20) shows that Combining ( 20) with ( 21), we have In addition, by Assumption 1, there exists a positive constant M such that ∥g k∥ ≤ M , so from ( 8) and ( 22), we have This means that ( 19) holds.
In order to prove (b) and (c), we first prove the following lemma.From Lemma 4, in what follows, we may assume that the bundle reset does not occur.

Lemma 5
Suppose that Algorithm 1 reaches a certain stability center x k which keeps unchanged, and the bundle reset does not occur.Let p, q be two iteration indices with p > q ≥ k.Let (v p , d p ), w p and (v q , d q ), w q be the optimal solutions and optimal values of QP(B p ) and QP(B q ), respectively, then (i)

AN IMPROVED PARTIAL BUNDLE METHOD FOR LINEARLY CONSTRAINED MINIMAX PROBLEMS
Proof (i) Since (v p , d p ) is feasible for subproblem QP(B q ), we have Multiplying both sides of ( 23) and ( 24) by λ q l and µ q i , respectively, and summing up, we have By ( 8), we have Combining ( 25) and ( 26), we have Since the bundle reset does not occur, then it follows t q = t p = t.This together with (27) shows that Hence, part (i) holds.
(ii) From (i) and w k ≤ 0, we know that the sequence {w k } k≥ k is monotonic nondecreasing and bounded above, so it converges, and therefore ∥d p − d q ∥ → 0, as q → ∞.
Moreover, we have we obtain v p − v q → 0, as q → ∞.
Based on Lemma 5, we now prove the claims (b) and (c).

Lemma 6
Suppose that Algorithm 1 reaches a certain stability center x k which keeps unchanged.Then Algorithm 1 passes finitely many times through Step 5, and the number of loops between Step 1 and Step 4 is finite.

Proof
Suppose by contradiction that one of the following cases occur: (i) infinitely many times through Step 5; (ii) the number of loops between Step 1 and Step 4 is infinite.Notice that both cases imply that Hence, for any p > q ≥ k, it follows This further implies Passing to the limit as q → ∞, and combining Lemma 5 and Assumption 1, we have which contradicts (28), so the lemma holds.
From Lemma 4 and Lemma 6, the following theorem holds immediately.

Theorem 1
Algorithm 1 is well defined, i.e., the stability center must be updated after a finite number of iterations.Now we prove the global convergence of Algorithm 1.

Theorem 2
For any ϵ > 0 and η > 0, Algorithm 1 stops after a finite number of iterations at a point satisfying the approximate optimality condition where k * is the index for the last iteration.Furthermore, x k * ∈ X can serve as an approximately optimal solution of problem (1).

Proof
Suppose by contradiction that Algorithm 1 cannot stop finitely.Then from Theorem 1, we know that the stability centers are updated infinitely many times.Let k s (s = 1, 2, . . . ) be the indices of stability centers.From Lemma 3, we have Let N ks be the number of bundle resets from x ks to x ks+1 , then it follows Combining ( 29) and (30), we have where τ = β tη 2 > 0.
On the other hand, from Lemma 4, we know that This together with (31) shows that Passing to the limit as s → ∞, we have which contradicts Assumption 1, so the theorem holds.

Improvement by subgradient aggregation
In order to control the bundle size and meanwhile keep the theoretical convergence, we utilize the subgradient aggregation strategy [11] to improve Algorithm 1.

Remark 2
The choice of subset Lk is very flexible, since theoretically speaking, Lk can be a singleton or even an empty set.In practice, we can delete some elements from the bundle when its size reaches a preset maximum value, see e.g.[3].

Numerical results
In this section, we aim to test the practical effectiveness of Algorithm 2. We tested a set of 15 constrained minimax problems, in which problems Wong 2 and Wong 3 are taken from [19], and the other 13 problems are modifications of the corresponding problems in [19] by imposing box constraints of the form ℓ ≤ x ≤ u (see Table I for detailed data, and see [22] for detailed explanations of the data).For simplicity, we may use some MATLAB notations: mod(x,y) finds the remainder after division of x by y; ones(p,q) and zeros(p,q) are p-by-q matrices of ones and zeros, respectively.
All numerical experiments were implemented by using MATLAB R2011b, and on a PC with Intel 3.2GHz CPU, 4GB RAM, Windows 7 platform.The subproblem AQP(B k ) is solved by the well-known software MOSEK [20].
The parameters are selected as: ϵ = 0.001, σ = 4, η = 0.01; t = 1 and β = 0.3 for the first 8 problems in Table I and Maxl; t = 0.001 and β = 0.5 for Wong 2 and Maxquad; t = 100/11 and β = 0.4 for Wong 3; t = 10/3 and β = 0.3 for Maxq, Goffin and MXHILB.The update of t k is slightly more sophisticated than stated in the algorithm, since a heuristic procedure similar to the one in [12] is applied.The index h in Step 4 is selected basically in order 1, . . ., m, but we first test the indices that violate (i.e.(1) holds) in previous iterations, since they are more likely to violate at this time.In addition, we set a maximum value M = min{10n, 50} for the bundle sizes.
The numerical results for problems Wong 2 and Wong 3 are reported in Table II, and the other results are reported in Table III.The notations are: the number of iterations "NI"; the number of descent steps "ND"; the number of component function evaluations "Nf i "; the approximate optimal value "f * "; the number of equivalent objective function evaluations "N eq f ", i.e., N eq f =Nf i /m.
(a) The algorithm passes finitely many times through Step 3; (b) The algorithm passes finitely many times through Step 5; (c) The number of loops between Step 1 and Step 4 is finite.

Table III .
Numerical results for 13 problems