A note on the inertial proximal point method

The proximal point method (PPM) for solving maximal monotone operator inclusion problem is a highly powerful tool for algorithm design, analysis and interpretation. To accelerate convergence of the PPM, inertial PPM (iPPM) was proposed in the literature. In this note, we show that some of the attractive properties of the PPM, e.g., the generated sequence is contractive with the set of solutions, do not hold in general for iPPM. To partially inherit the advantages of the PPM and meanwhile incorporate inertial extrapolation steps, we propose an iPPM with alternating inertial steps. Our analyses show that the even subsequence generated by the proposed iPPM is contractive with the set of solutions. Moreover, we establish global convergence result under much relaxed conditions on the inertial extrapolation stepsizes, e.g., monotonicity is no longer needed and the stepsizes are significantly enlarged compared to existing methods. Furthermore, we establish certain nonasymptotic O(1/k) and asymptotic o(1/k) convergence rate results, where k denotes the iteration counter. These features are new to inertial type PPMs.


Introduction
Let T : ℜ n ⇒ ℜ n be a set-valued maximal monotone operator.In this paper, we consider the following operator inclusion problem find x * ∈ ℜ n such that 0 ∈ T (x * ). ( It is well known that (1) serves as a unified model for many problems of fundamental importance, including the fixed point problem, the variational inequality problem, the minimization of closed proper convex functions, and their variants and extensions.Therefore, its efficient solution is of practical interests in many situations.The proximal point method (PPM, [25,24,31]) converts (1) to a fixed point problem of a firmly nonexpansive resolvent operator.Let λ > 0 be a constant.The resolvent operator of T is defined by J λT := (I + λT ) −1 , i.e., for any x ∈ ℜ n , J λT (x) is the unique solution of 0 ∈ x + λT (x).Initialized at any x 0 ∈ ℜ n , the PPM iterates for k ≥ 0 as ( It turns out that the PPM is a very powerful algorithmic tool and contains many well known algorithms as special cases, including the classical augmented Lagrangian method [20,29], the Douglas-Rachford splitting method [15] and the alternating direction method of multipliers [18,17].Interested readers are referred to the classical references [31,30,16,19] for analysis and generalizations of the PPM.An equivalent representation of the PPM (2) is given by which can be viewed as an implicit discretization of the evolution differential inclusion problem It is known that the solution trajectory of (3) converges to a solution of (1) provided that T satisfies certain conditions, see e.g., [10].To speed up convergence, the following second order evolution differential inclusion problem was studied in the literature: where γ > 0 is a friction parameter.For the special case n = 2 and T = ∇f , where f : ℜ 2 → ℜ is a differentiable convex function with attainable minimum, the system (4) characterizes roughly the motion of a heavy ball which rolls under its own inertia over the graph of f until friction stops it at a stationary point of f .In this case, the three terms in (4) denote, respectively, inertial force, friction force and gravity force.Therefore, the system (4) is usually referred as the heavy-ball with friction (HBF) system.In theory the convergence of the solution trajectories of the HBF system to a solution of (1) can be faster than those of the first-order system (3), while in practice the second order inertial term d 2 x/dt 2 can be exploited to design faster algorithms [1,5].Motivated by the properties of ( 4), an implicit discretization method was proposed in [2,4].Specifically, given x k−1 and x k , the next point x k+1 is determined via which results to an iterative algorithm of the form where λ = h 2 /(1 + γh) and α = 1/(1 + γh).Note that (5) is no more than a proximal point step applied to the extrapolated point x k + α(x k − x k−1 ), rather than x k itself as in the classical PPM.Thus the resulting iterative scheme ( 5) is a two-step method and is usually referred as inertial PPM (iPPM).Convergence properties of (5) were studied in [2,4] under some assumptions on the parameters α and λ.Subsequently, inexact and hybrid type iPPMs were studied in [26,3,23,22].Recently, there are increasing interests in studying inertial type algorithms, see, e.g., inertial forward-backward splitting methods [28,27,6], inertial Douglas-Rachford splitting method [9], inertial ADMM [7], and inertial forward-backward-forward method [8].See also the latest references [11,21,14,12,13], which analyzed the convergence properties of inertial type algorithms for maximal monotone inclusion problem, variational inequality and structured convex optimization, and demonstrated their performance numerically on some imaging and data analysis problems.
In this note, we first give examples to illustrate that some of the attractive properties of the PPM do not hold anymore for iPPM.We then propose an iPPM with alternating inertial steps, which inherits the contractive property of the PPM to some extent.Our analyses show that, under much relaxed conditions, global convergence of the proposed iPPM can be guaranteed.In particular, the inertial extrapolation stepsizes do not need to be monotonically nondecreasing and can be significantly enlarged compared to existing methods.Furthermore, we establish certain nonasymptotic O(1/k) and asymptotic o(1/k) convergence rate results on a subsequence generated by the proposed iPPM.To the best of our knowledge, these features are new to the variants of PPM with inertial steps.

Features of PPM and iPPM
In practice, the proximal parameter λ in PPM usually varies step by step, i.e., given a sequence of positive parameters {λ k }, the PPM appears as Similarly, the iPPM takes the form where {α k } is a sequence of nonnegative inertial extrapolation stepsizes.Classical requirements on the parameters to ensure the global convergence of iPPM are (i) λ k ≥ λ for some λ > 0, and (ii) In the rest of this paper, we denote the set of solutions of (1) by T −1 (0).The following lemma is very useful in our analysis, and its proof is elementary and is thus omitted.

Lemma 1 (Contractive property of PPM)
Let T : ℜ n ⇒ ℜ n be any set-valued maximal monotone operator and λ > 0. Suppose that x and x + satisfy x + = J λT (x).Then, for any Let {x k } be the sequence generated by the PPM ( 6) and x * ∈ T −1 (0).Lemma 1 implies that the PPM is contractive with T −1 (0).In particular, there holds A direct consequence of ( 8) is that if x k ∈ T −1 (0) for some k then all subsequent points will be freezed at x k .In fact, by setting The converse is also true, i.e., if two consecutive points generated by the PPM are identical, then a solution is already reached.In short, the following property holds for PPM: Due to (9), it is natural to terminate PPM in practice by ∥x k+1 − x k ∥ ≤ ϵ for some tolerance ϵ > 0. We next give very simple examples to show that the sequence generated by iPPM violates the attractive properties ( 8) and (9).
Example 1 (violates (8) and the "if" direction in ( 9)) Suppose that we are minimizing f (x) = x 2 /2 by iPPM (7), where λ k is fixed at a positive constant λ.Given x −1 = x 0 ̸ = 0 and a nonnegative sequence {α k }, for k ≥ 0 iterates as follows . The update appears as Clearly, x 2 is already the unique solution.However, further calculations show that This violates (9) since x k ∈ T −1 (0) does not imply x k+1 = x k .This also implies that for iPPM x k+1 can be farther away to the set of solutions than x k since in this example |x 3 The following example shows that x k+1 = x k does not imply x k ∈ T −1 (0) for iPPM.
We give the following additional remarks on iPPM.
(i) It is worth pointing out that in Examples 1 and 2 the extrapolation parameters {α k } are less than 1/3 (simply set λ > 3 in Example 1), as required by [4].Though the monotonicity condition α k ≤ α k+1 is violated in Example 2, we should note that this monotonically nondecreasing requirement is rather unreasonable since the iPPM reduces to the original PPM if α k ≡ 0. (ii) For iPPM, if x k+1 = x k / ∈ T −1 (0) then the next point x k+2 will be closer than x k+1 to the set of solutions T −1 (0), because in this case conforms to a normal PPM step since , then all subsequent points will be equal to x k+1 ∈ T −1 (0) because ∥x k+2 − x * ∥ ≤ ∥x k+1 − x * ∥ holds for any x * ∈ T −1 (0).This follows from setting x * = x k+1 .

An iPPM with alternating inertial steps
Given the unsatisfactory properties of iPPM presented in Section 2, in this section we propose an iPPM with alternating inertial steps.This new iPPM has the advantage that the produced even subsequence is contractive with T −1 (0).Furthermore, the inertial extrapolation stepsizes do not need to be monotonically nondecreasing and can vary freely in [0, 1].These requirements are much less restrictive than those in [4], i.e., 0 For simplicity, in the following we assume that λ k ≡ λ for some λ > 0, and our algorithm and analyses can be simply generalized to the case with varying {λ k } as long as it is bounded below by some positive constant.Given x 0 ∈ ℜ n , λ > 0 and a sequence of nonnegative parameters {α k }.For k ≥ 0, the proposed alternating inertial PPM iterates as where xk is defined as We first give a lemma before the main convergence results.

Lemma 2 (Monotonicity property of two consecutive PPM iterations)
Let {x k } be the sequence generated by algorithm ( 10)-( 11) from any initial point x 0 ∈ ℜ n .Then, it holds for all ) and the monotonicity of Note that for any vectors a, b ∈ ℜ n it holds that ∥a∥ 2 − ∥b∥ 2 ≥ 2⟨b, a − b⟩.Thus, Then, the conclusion follows directly from ( 12) and ( 13).Theorem 1 (Global convergence and convergence rate) Suppose that 0 ≤ α k ≤ 1 for all k ≥ 0 and λ > 0. Let {x k } be the sequence generated by algorithm ( 10)-( 11) from any initial point x 0 ∈ ℜ n .Then, the following results hold.
(iii) Now we prove the convergence rate results ( 16) and (17).Clearly, ( 16) follows immediately from (22).Moreover, by considering (23) and using the Cauchy principle, we obtain as k → ∞ that where ⌊ k−1 2 ⌋ denotes the greatest integer no greater than k−1 2 .Thus, the relation ( 17) holds.It follows from ( 14) that the sequence {x 2k } is contractive with the set of solutions.Furthermore, we have the following corollaries.

Proof
It is obvious from the definition of xi in (11) and Lemma 2 that Then the conclusion of this corollary follows directly from (iii) of Theorem 1.
Since, for any k ≥ 0, x k+1 = xk would imply that x k+1 ∈ T −1 (0), the results given in Corollary 1 can be viewed as convergence rate results on the optimality residue min 1≤i≤2k ∥x i+1 − xi ∥ 2 .As such, this can be used in practice to terminate the algorithm.We also note that, different from the complexity results given in [14,12,13], the results given in Theorem 1 and Corollary 1 do not depend on the upper bound of {α k }.Also, the results presented here are measured by the Euclidean ℓ 2 -norm, rather than certain weighted norm as given in [14,12,13].More importantly, our convergence results do not depend on monotonicity of {α k } and are established under the much relaxed condition 0 ≤ α k ≤ 1 for all k.These features were previously not known for inertial type PPMs.

Concluding remarks
In this short note we have shown via providing concrete examples that some of the attractive properties of the classical PPM do not maintain in iPPM.To partially inherit the advantages of the PPM and meanwhile incorporate inertial extrapolation steps in the algorithm, we proposed an iPPM with alternating inertial steps.Our analyses have shown that the proposed algorithm generates a sequence whose even subsequence is contractive with the set of solutions.Our global convergence results and convergence rate results are established under much relaxed conditions compared to those required in the literature.In particular, we have removed the monotonicity requirement on the inertial extrapolation stepsizes and enlarged them from [0, 1/3) to [0, 1].Till far our discussion is only focused on the theoretical aspects, and a practically more important question is how to select the extrapolation stepsizes adaptively in computation so that the overall performance of inertial type PPMs can be significantly faster than the corresponding conventional PPMs.This topic is interesting for future investigation.
Stat., Optim.Inf.Comput.Vol. 3, September 2015 Z. G. MU AND Y. PENG 245 Now, we are ready to present our main convergence results.