On the Convergence and O ( 1 / N ) Complexity of a Class of Nonlinear Proximal Point Algorithms for Monotonic Variational Inequalities

This paper presents a class of proximal point algorithms (PPA) with nonlinear proximal terms. Proximal minimization algorithm using Bregman distance for convex minimization is extended for solving monotonic variational inequalities. Under suitable conditions, the convergence and O(1/N) computing complexity/convergence rate of the proposed algorithm is obtained. Further more, connections to some existing popular methods are given, which shows that the presented algorithm can include metric proximal point algorithm and projection method within a general form.


Introduction
Variational inequality (VI) has received a lot of attention due to its various applications in operation research, economic equilibrium, engineering design, and other fields [5,6].
In this paper, we will study iterative algorithms for monotonic VI problems, which can be summarized in a form as follows: find a point u * in Ω such that JIAN WU AND GAOHANG YU where F : R d → R d is a mapping from a Euclidian space R d to itself, and Ω a convex subset in R d .VI has an important application in optimization.Let F : R d → R be a differentiable convex function on Ω.Then any minimum point x * of F on Ω satisfies the inequality ⟨x − x * , ∇F (x * )⟩ ≥ 0, ∀x ∈ Ω, (2) which means that the VI problem (1) includes as a subproblem the following optimization problem min x∈Ω F (x).
In this paper we are interested in a class of so-called monotonic VIs in the sense for two considerations.Firstly this class of VIs has a nice property that its solution set is nonempty; secondly, it responds to the convex program in the optimization field.
There is extensive literature on numerical algorithms for VIs [1,2,4,7,10].Among these solvers, a classical algorithm widely known as proximal point algorithm (PPA) (or proximal minimization algorithm specially for minimization problems [3]) is first proposed in [8] and then developed in [9].PPA solves the main problem (1) by successive approximation, or in another word, solves in every iteration a subproblem proximal to the main problem.Say or where M is a symmetric and positive-definite matrix, and the proximal term in ( 5) is known as Mahalanobis distance defined as All aforementioned PPAs so far are using a linear proximal term, and the main aim of this paper is to present a class of PPAs using nonlinear proximal terms, and to provide a theoretic justification of its convergence and computing complexity.
The rest of this paper is organized as follow: Section 2 is devoted to present the algorithm, and the proof of its convergence and complexity analysis are given in Section 3. Section 4 makes several connections between the proposed algorithm and other popular methods.Finally, we have a concluding remark.

Motivation and Proposed Algorithm
Suppose f is a strongly function on X, the Bregman distance induced by f is defined as below: where ∇f (y) is a subgradient of f at x.
Bregman distance is an extension of traditional Euclidian or Mahalanobis distance, this point can be seen more clearly thorough some special cases of Bregman distance: It is a natural idea to replace the Mahalanobis distance in (5) with a Bregman distance, which is the main contribution presented in [3].Then the proximal minimization algorithm with a D-function, so called by authors of [3], is proposed as According the optimality condition of ( 6), we have which can be viewed as a PPA for (2).It is worthy of notice that the proximal term in the inequality above is not necessarily linear.Enlighten by this, we propose a nonlinear PPA for the general monotonic VI (1) as follow A remarkable difference between ( 6) and ( 7) is that in (6), ∇F needs to be the gradient of a convex function, while in (7) F needn't, which means the later is an extension of the former.
In another word, (7) can solve variational problems which are not derived from minimization problems, such as those derived from saddle point problems and complementary problems.For instance, suppose one is going to solve a saddle point problem min x∈X max y∈Y Φ(x, y).(8) By transforming it as a variational inequality problem the proposed algorithm (7) can be applied, while application of algorithm (6) does not cover (8) or (9).G in ( 7) still needs to be gradient/subgradient of a strongly convex function f , thus two assumptions upon G are necessary: A1 G is strongly monotonic (from the strongly convexity of f ); This work looks similar to that in [4], but some differences are essential.At first, algorithm presented in [4] takes approximate iterations as while in this work, proximal subproblem is solved exactly.Secondly, as we will see in the coming section, this work provides a complexity analysis result, which is absent in [4].

Convergence and O(1/N ) Complexity
In this section, we will thoroughly study the convergence of the proposed algorithm.The analytic tool used here is similar with that in [8] but is more general in this paper.

Lemma 1
Let {u n } be a sequence generated by the algorithm (7), and u * a solution of the VI (1), then we have Proof.By setting u = u * in (7), we have Lemma 2 Let {u n } be a sequence generated by the algorithm (7), and u * a solution of the VI (1), then we have and Stat., Optim.Inf.Comput.Vol.
from which immediately we obtain (12).By setting u = u * in the equality above, and using (10) we can easily obtain (11) as well. 2

Lemma 3
Let {u n } be a sequence generated by the algorithm ( 7), then we have Proof.From (11) we know that } is monotonically non-increase and bounded below, thus has a limit, say d.
According to the assumption A2, the assertion (13) is obtained.2

Theorem 1
Let {u n } be a sequence generated by the algorithm (7), then it is bounded and any cluster point of {u n } is a solution point of (1).Further more where At first, the boundedness of {u n } can be deduced from (11), hence a cluster point u ∞ of {u n } exists.Secondly from (7) we have Computing lower limits of both sides of the inequality above, we have

JIAN WU AND GAOHANG YU
Subsequently we have Hence u ∞ is a solution of ( 1).Now we move to prove the last part of the theorem.From the monotonicity of F we have Summarize the inequality above from 0 to Finally we get the result (14), which completes the proof. 2 The last assertion of the theorem means that the proposed algorithm has a complexity of O(1/N ).

Connection to Other Methods
This section is devoted to make some connections to some existing methods, which shows that our algorithm could include these methods in a uniformed way.

Linear Case
It is very obvious that when G is a linear mapping, say a matrix M , then algorithm (7) become a linear PPA (4).While what needs to be pointed out is that M must fulfill the assumptions A1, A2.Specially M must be positive-definite and bounded.Now we verify that linear mapping G so defined fulfills the assumptions A1, A2.Assume the maximum and minimum eigenvalue of M respectively be λ max and λ min , then we have Hence we see that G is both strongly monotonic and Lipschitz continuous.

Entropy Case
It is very natural to present an entropy-like version of the algorithm (7) motivated by the third case of the D-function listed in Section 2, By differentiate it over the variable u, we obtain a corresponding nonlinear proximal term for our algorithm However in this case, we need to restrict the domain to a positive orthant, say where c is a sufficiently small positive number.
The continuity of G is derived from its definition, and strongly monotonicity of G can be seen from following arguments: In the argument, a mean value theorem on each component is applied.

Projection Method: a Special Case
VI problem is closely related to the fixed-point problem.The fixed-point theory has played an important role in the development of various algorithms for solving VIs.In fact we have the following well-known result Lemma 4 u * is a solution of VI (1) if and only if where the operator P Ω (•), call projection, is defined as The fixed-point formulation in the above lemma suggests the simple iterative algorithm solving (1) This algorithm is widely known as projection method, and converges to a solution point of (1) provided 1. F is Lipschitz continuous with constant L; 2. F is strongly monotonic; 3. stepsize fulfills c n < 1/L.
We now demonstrate how these convergence conditions coincide those of the proposed algorithm A1, A2.By applying the principle of (2) in the minimization problem in ( 16), one has ⟨u − P Ω (v), P Ω (v) − v⟩ ≥ 0, ∀u ∈ Ω, which leads to an important fact that From this point of view, the projection method (17) can be seen as a special case of (7) where G = 1 cn I − F. Now we move to justify that G = 1 cn I − F is continuous and strongly monotonic.Assume F is Lipschitz continuous with constant L, immediately we know the continuity of G from its definition.On the other hand, one has Hence we know that G is strongly monotonic on the condition of c n < 1/L.