Approximated Function Based Spectral Gradient Algorithm for Sparse Signal Recovery

Numerical algorithms for the l0-norm regularized non-smooth non-convex minimization problems have recently became a topic of great interest within signal processing, compressive sensing, statistics, and machine learning. Nevertheless, the l0norm makes the problem combinatorial and generally computationally intractable. In this paper, we construct a new surrogate function to approximate l0-norm regularization, and subsequently make the discrete optimization problem continuous and smooth. Then we use the well-known spectral gradient algorithm to solve the resulting smooth optimization problem. Experiments are provided which illustrate that this method is very promising.


Introduction
The focus of this paper is on the following structured minimization: where ∥x∥ 0 = card(x) = ∑ n i=1 1(x i ̸ = 0) denotes the cardinality or the number of nonzero components in x.The true x is often sparse in the sense that many of its components are zeros.The function f : R n → R is continuously differentiable (may be non-convex) and bounded below, and the parameter µ > 0 is used to trade off both terms for minimization.Given its structure, problem (1.1) covers a wide range of apparently related formulations in different scientific fields, such as signal/image processing, compressive sensing, statistics, machine learning, and so on.
To solve (1.1), a convex relaxation which can be solved efficiently is the socalled l 1 -regularized least squares: where ∥x∥ 1 is the sum of the absolute values of each component in x.The compressive sensing theory based on the fact that most signals are compressible, was led by Candès et al. [2,3], Donoho [7].It has been shown that under some reasonable conditions, problems (1.1) and (1.2) share the common solution with high probability [6].For this reason, numerous algorithms to solve problem (1.2) have been proposed, analyzed, and tested in the last decade.For instance, by using an operator splitting technique, Hale, Yin and Zhang derive the iterative shrinkage/thresholding fixed-point continuation algorithm [11].A closely related method is the fixed-point continuation and active set [16,17], which solves a smooth subproblem to determine the magnitudes of the nonzero components of x based on an active set.Another closely related method is the sparse reconstruction algorithm [14], which involves minimizing a non-smooth convex problem with separable structures.Unlike all the methods mentioned above, in this paper, we proposed a novel approximated function method to solve (1.1).More preciously, we construct an approximated smooth function to replace the discrete l 0 -norm in (1.1) which based on a small parameter β.We also show that the proposed smooth function trends to the l 0 -norm as β → ∞.Moreover, we use the well-known spectral gradient method to solve the resulting smooth minimization, and incorporate a non-monotone line search technique to accelerate its convergence.Finally, we do numerical experiments to recover a large sparse signal from its limited measurement, which illustrate that the constructed function works well, and furthermore indicate that the proposed algorithm is encouraging.
The rest of the paper is organized as follows.In section 2, we recall some existing surrogate functions to the l 0 -norm and construct our new surrogate function immediately.In section 3, we describe the spectral gradient method for unconstrained optimization problem, and then list the full steps of our algorithm.Furthermore, to illustrate the feasibility and efficiency of the proposed method, we report the numerical experiments in section 4. Finally, the conclusion is drawn in the last section.12 W. WANG AND Q. WANG

Motivation and Surrogate Function
This section is devoted to stating our motivations, reviewing some existing surrogate functions to l 0 -norm, and constructing a new function.In the mean time, some nice properties of the surrogate functions are also presented.

Motivation and ℓ
Instead of the ℓ 0 -norm regularized problem (1.1), another very natural improvement is to apply the ℓ p -norm (0 < p < 1) regularization [10], that is, to solve the following problem where ∥x∥ p is the so-called ℓ p quasi-norm of R n , defined by One can observe that as p ↓ 0, (2.1) approaches the ℓ 0 problem (1.1).On the other hand, as p ↑ 1, (1.2) approaches the ℓ 1 -norm problem (1.2).In other words, problem (2.1) is an intermediate between (1.1) and (1.2).Nevertheless, as the ℓ p -norm regularization is nonconvex, nonsmooth, and non-Lipschitz, the use of the standard ℓ 1 -norm minimization problems' solvers are generally precluded.As shown in [9,4] that finding the global minimizer of (2.1) is strongly Np-hard, but finding a local optimal solution of the problem could be done in polynomial time, and furthermore the lower bounds for the absolute value of nonzero entries in each local optimal solution can be established theoretically [5].The curve of the ℓ 0 -norm and the curves of ℓ p -norm with different values of p are plotted in Figure 1.
It can be seen from Figure 1 that, as p decreases, the curve of ℓ p -norm trends to the one of ℓ 0 .Hence, it is reasonable to think that the smaller p may yield the more sparse solutions.However, the ℓ p norm is non-differentiable at original point, which makes it is challenging to minimize.The important observation motivates us to construct an surrogate function which is continuous and differentiable and approximate the ℓ 0 -norm to produce sparse solutions.

Existing surrogate functions
First we replace the cardinality term in (1.1) with a continuous and smooth surrogate penalty function ω(y), which makes the discrete optimization problem become continuous and smooth.The suitable surrogate function must range from 0 to 1 with the property that w(0) = 0 and w(y) = 1 when y is away from 0. Now, we recall three well-known choices of the surrogate function which are designed in this literature.
1. Truncated l γ penalty given by for γ > 0 and α > 0. Typically, γ = 0.5, 1, 2 can be considered.The magnitude of a controls the sharpness of the approximation.
3. The hyperbolic tangent penalty for α ≥ 0 and γ ≥ 0. When γ = 2, ω(x) is smooth.Its shape is similar to the "weight elimination" penalty, and with a faster progression in its approximation to l 0 .It can be seen that a larger α provides a sharper discrimination between 0 and values that are close to 0.

A new surrogate function
The suitable surrogate function should satisfy at least the following properties.
Firstly, the surrogate function should be continuous and smooth.Secondly, its values should be ranged from 0 to 1 with the property that ω(0) = 0 and ω(y) = 1 when y is away from 0. Now, we develop the following function where the constants β > 0 and γ > 0 is fixed.When γ = 2, the function reduces to Clearly, it is continuous and smooth, as graphically shown at the right plot in Figure 2.
Figure 2 shows the curve of ℓ 0 function and the surrogate function (2.3) with different choices of β.As can be seen from the right plot, the curve approximates the ℓ 0 as the value of β increase.In other words, a bigger β may cause a higher precision of its approximation.In our experiments, we show that a small β can produce higher quality re-solutions.

Algorithm
In this section, based on the surrogate function as described in the previous section, we discuss the construction of our algorithm.Before stating the steps of our method, we first give a brief description of the well-known spectral gradient method for the following unconstrained optimization where f : R n → R is a continuously differentiable function.The spectral gradient method was originated by Barzilai and Borwein [1], and its iterative form is defined by where ∇f (x k ) is the gradient of f at x k and one of the choices of the scalar λ k is given by where and y . The spectral gradient method is also named two-points method, Barziali-Borwein gradient method, and is received much attention in past years (see e.g., [12,13,8].Given an initial point x 0 , our algorithm generates a sequence x k by where d k is a descent direction of f at x k , and α k is a steplength determined by a line search.In our algorithm, the search direction currently is defined by It is easy to see that d T k ∇f (x k ) < 0 provided by λ k > 0, which shows that d k is a descent direction at current point.Along the search direction, in our algorithm, the steplength α k is determined by the nonmonotone line search where δ ∈ (0, 1).
Based on the spectral gradient, we pay our attention to constructing our algorithm.Now we replace ∥x∥ 0 in (1.1) with the surrogate function ω(x), then we get the continuous and smooth f (x) as It is easy to compute that the gradient at x is: .

Remark 3.1
To ensure λ k is bounded from 0 and subsequently ensures that d k is a descent direction per-iteration, the λ k is forced as where 0 < λ (min) ≪ λ (max) < +∞ are fixed scalars.
We clearly see that our proposed algorithm is essentially the standard spectral gradient method for smooth unconstrained minimization.Hence, it global convergence can be followed directly in optimization literature.To end this section, we list the convergence theorem of the proposed algorithm.Its proof can be found in Theorem 2.1 in [13].
Theorem 3.1 Let f be continuously differently and x k be generated by algorithm AFSG, then ∇f (x k ) for some k, or lim k→∞ ∇f (x k ) = 0.

Numerical experiments
In this section, we present numerical results to illustrate the feasibility and efficiency of the proposed method.We use our method to solve l 0 -regularized least squares, which mainly appear in compressive sensing.All experiments are performed under Windows 7 and Matlab 7.11 (R2010b) running on a Lenovo laptop with an Intel Atom CPU at 2.2 GHz and 2GB of memory.
Test on l 0 -regularized least squares.Let x be a sparse or a nearly sparse original signal, A ∈ R m×n (m ≪ n) be a linear operator, ε ∈ R m be a zero-mean Gaussian white noise, and b ∈ R m be an observation that satisfies the relationship We measure the quality of restoration x * through the relative error to the original signal x; that is, In this test, we use a random matrix A with independent identically distributed Gaussian entries.The ε is the additive Gaussian noise of zero mean and standard deviation σ.Given the storage limitations of the PC, we test a small size signal with n = 2 11 , m = 2 9 .The original signal contains randomly p = 2 6 nonzero elements.We also choose the noise level σ = 10 −3 .The proposed algorithm starts at zero point and terminates when the relative change of two successive points are sufficiently small, i.e., In this experiment, we take tol = 10 −4 , λ (min) = 10 −20 , and λ (max) = 10 20 .In the line search, we choose ᾱ = 1, ρ = 0.5, δ = 10 −4 , and m = 5.The original signal, the limited measurement, and the reconstructed signal are given in Figure 3.
Comparing the right plot with the left one in Figure 3, we see that the original sparse signal is restored almost completely in sense of RelErr is 0.84%.All the blue peaks are encircled by the red circles, which illustrates that the original signal has been found successfully.Overall, this experiment shows that the proposed method performs very well and provides an efficient approach to the recovery of large sparse non-negative signal.

Test on AFSG and NBBL1
In this subsection, we compare AFSG with the solver NBBL1 [18] with respect to relative errors and running time.Parameters' values are set just the same to the previous.The Matlab package of NBBL1 is obtained at URL: http://maths.henu.edu.cn/szdw/teachers/xyh.htm.We repeatedly run each code with different combinations of (n, m, k), and record the computing time (Time) and relative errors (RelErr) in Table I.From table I we can see that the running time required by AFSG is less than the one by NBBL1, which shows that AFSG is the faster.Meanwhile, the relative errors obtained by AFSG is also smaller than the ones by NBBL1.In sum, the performance comparisons verified that the proposed algorithm AFSG is superior to NBBL1 in recovering a large and sparse signal.

Conclusions
In this paper, we proposed a new method to solve the l 0 -norm regularized non-smooth non-convex minimization problems.Different from some existing algorithms, the proposed method makes the discrete optimization problem continuous and smooth by using an approximated function.The theoretical guarantee of the replacement ℓ 0 -norm by a smooth function to yield sparse solutions is not analyzed in this paper.However, the numerical experiments illustrate that the approach is promising.Finally, we hope that this novel method even its modifications can further be used in some other fields, including statistics, machine learning, and image processing.

Figure 3 .
Figure 3. Top: original signal with length 4096 and 64 positive non-zero elements; Middle: the noisy measurement with length 512; Bottom: recovered signal by AFSG (red circle) versus original signal (blue peaks).

Table I .
Test results of AFSG and NBBL1 with different combinations of (n, m, k).