Image reconstruction from incomplete convolution data via total variation regularization

Variational models with total variation (TV) regularization have long been known to preserve image edges and produce high quality reconstruction. On the other hand, recent theory on compressive sensing has shown that it is feasible to accurately reconstruct images from a few linear measurements via TV regularization. However, in general TV models are difficult to solve due to the nondifferentiability and the universal coupling of variables. In this paper, we propose the use of alternating direction method for image reconstruction from highly incomplete convolution data, where an image is reconstructed as a minimizer of an energy function that sums a TV term for image regularity and a least squares term for data fitting. Our algorithm, called RecPK, takes advantage of problem structures and has an extremely low per-iteration cost. To demonstrate the efficiency of RecPK, we compare it with TwIST, a state-of-the-art algorithm for minimizing TV models. Moreover, we also demonstrate the usefulness of RecPK in image zooming.


Introduction
Total variation (TV) regularization for image restoration and reconstruction has been widely studied in the literature since its introduction by Rudin, Osher and Fatemi [31], mainly due to the strong ability of TV in preserving edges and object boundaries that are usually the most important features to recover.In image restoration, TV models with the ℓ 2 -norm fidelity, which are suitable for data contaminated by independent and identically distributed Gaussian noise, have been studied in, e.g., [10,13].When observed data suffer from impulsive noise, e.g., salt-and-pepper noise, TV models with nonsmooth data fitting are desirable, e.g., TV regularization with the ℓ 1 -norm fidelity was addressed in [12] with some interesting geometric properties.
Recently, exact recoverability of sparse signals from a few linear measurements is established based on ℓ 1minimization.For two-dimensional image, exact recoverability is also attainable via TV minimization when the underlying image has piecewise constant structure, or equivalently, sparse gradients, see e.g., [8] for exact recoverability from incomplete frequency information.These theoretical results further justify the importance of TV models in image reconstruction.Sparse and compressible signal recovery from incomplete linear measurements is now widely referred as compressive sensing/sampling (CS, [7,8,14]) and has become an emerging methodology 2 IMAGE RECONSTRUCTION FROM INCOMPLETE CONVOLUTION DATA in the field of signal processing.In this paper, we concentrate on image reconstruction via TV minimization from incomplete/partial convolution data.Our main contribution is a very simple and fast algorithm, called RecPK, for solving TV problems with either the ℓ 2 -norm or the ℓ 1 -norm data fitting.In addition, we also present comparison results with TwIST, a two-step iterative shrinkage/thresholding (TwIST) algorithm recently proposed in [2].
In the rest of this section, we present the optimization problem under consideration, review relevant algorithms existing in the literature for TV minimization, introduce our notation, and describe the organization of the paper.

Image reconstruction via TV minimization
Let ū ∈ R n 2 be an original two-dimensional image with n 2 pixels.We assume that ū is grayscale, though our derivations can be equally applied to multichannel images.The incomplete convolution measurements of ū are given by where K ∈ R n 2 ×n 2 , P ∈ R p×n 2 and ω ∈ R p represent, respectively, a convolution matrix, a projection matrix containing p rows of the identity matrix of order n 2 , and random noise.In CS, p is much smaller than n 2 .The challenge is to recover ū from the seemingly insufficient measurements f , which is generally impossible without prior constraints on the original image.However, when ū is sparse or compressible, basic CS results [7,8,14] guarantee that ū can be accurately (or even exactly when ū is sparse and f is noise free) recovered via ℓ 1minimization provided that the sensing system preserves certain desirable attributes.For piecewise constant images, exact recoverability from incoherent measurements [6,4] can be ensured by TV-minimization [8].Moreover, it was pointed out in [5] that minimizing TV is always beneficial even ū is sparse in a transform domain with a suitable basis.
In this paper, we reconstruct ū from f via the CS methodology.Specifically, ū is reconstructed as a solution of the following TV model where ∑ i is taken over all pixels, ∑ i ∥D i u∥ 2 is a discretization of the TV of u, and µ > 0 is a scalar which is used to balance regularization and data fidelity.Theoretical supports for this approach of recovering ū from partial convolution data are provided in [30].Clearly, the rationale behind model ( 1) is similar to the following well-known T V /L 2 model for image deconvolution: (2)

A brief review of existing methods
In the following, we let ∥ • ∥ = ∥ • ∥ 2 .In this section, we review briefly some existing methods for solving the T V /L 2 problem (2), many of which can be easily extended to solve (1).The first class of methods are based on smoothing the TV term.Since TV is nonsmooth, which causes the main difficulty, a number of methods are based on smoothing it and solving the resulting approximation problems.The T V /L 2 problem (2) is usually approximated by where ϵ > 0 is a small constant.As such, ordinary methods for unconstrained optimization can be applied, such as the gradient descent method.In the pioneering work [31], a time-marching scheme was used to solve a partial differential equation system, which in optimization point of view is equivalent to a constant steplength gradient descent method.This time-marching scheme suffers from slow convergence especially when the iterate point approaches the solution set.Although certain line search strategy can be incorporated to speed up convergence, it still converges slowly especially when ϵ becomes small.Another well-known method for the smoothed T V /L 2 problem (3) is the linearized gradient method proposed in [36] for denoising and in [35] for deblurring.The idea of the linearized gradient method is to solve the Euler-Lagrangian equation via a fixed-point iteration.In each iteration of the linearized gradient method, a linear system needs to be solved, which becomes more and more difficult as ϵ goes to 0, or as K becomes more ill-conditioned.As a result, it is not efficient especially for large and/or severely ill-conditioned problems.Furthermore, both the explicit time-marching scheme and the linearized gradient method are first-order methods and thus are linearly convergent at best.To overcome the linear convergence of first-order methods, the authors of [36] incorporated Newton method to solve (3).Indeed quadratic convergence was observed.However, the large per-iteration cost of Newton method prevents its practical applicability, especially to large sized images.Moreover, as ϵ becomes smaller, the computation of Newton directions becomes more difficult.Another class of algorithms for solving TV problems are the iterative shrinkage/thresholding (IST) algorithms, which are independently proposed and analyzed by several authors in different fields, see, e.g., [17,18,34,33].In [2], Bioucas-Dias and Figueiredo introduced a two-step IST (TwIST) algorithm, which exhibits much faster convergence than the primal IST algorithm for ill-conditioned problems.Recently, Beck and Teboulle [1] present a new fast IST algorithm, which not only preserves the computational simplicity of IST but also has an optimal global rate of convergence.We note that IST-based algorithms for TV deconvolution require to solve an expensive TV denoising subproblem at each iteration and do not able to take advantage of problem structures.
Recently, a fast TV deconvolution algorithm called FTVd was proposed in [37] for the solution of (2).Since both K and the finite difference operators have structures, it is desirable to design an algorithm that can take advantage of problem structures.For this purpose, the authors of [37] first transformed (2) to an equivalent constrained problem of the form where, for each i, w i ∈ R 2 is an auxiliary vector, and considered solution algorithms for (4) that are capable of utilizing convolution structures.We note that in (4) the objective function is separable and the constraints are linear.For convenience, we let w = (w 1 ; w 2 ) ∈ R 2n 2 , where w 1 and w 2 are vectors of length n 2 satisfying ((w 1 ) i ; (w 2 ) i ) = w i ∈ R 2 for i = 1, . . ., n 2 .Then, the classical quadratic penalty method was applied to (4) in [37], which gives the following formulation where β > 0 is a penalty parameter.Finally, an alternating minimization with respect to u and w was applied to (5).
Since the objective function in ( 5) is separable with respect to each w i , and both K and the finite different operators are convolution matrices, both subproblems can be solved easily and exactly by either simple shrinkage or fast Fourier transforms (FFTs).Furthermore, the overall convergence of FTVd is accelerated by heuristic continuation on β.Experimental results provided in [37] show that FTVd converges much faster than those algorithms that cannot make use of problem structures.
From optimization theory, the solution of ( 5) well approximates that of (4) only when β becomes large, in which case numerical difficulties arise.For β > 0 and s, t, ν ∈ R 2 , we define Furthermore, let the augmented Lagrangian function of (4) be defined by where each λ i is a vector in R 2 and, similar to w, λ ∈ R 2n 2 is combination of {λ i , i = 1, 2, . . ., n 2 } in a suitable order.To avoid β going to infinity, the classical augmented Lagrangian method (ALM, [27,29]) was applied in [25] resulting the following iterative framework In [25], the authors derived (7) from the Bregman iterative method [28].It is well-known that the presence and iterative updates of multiplier λ avoids β going to infinity and guarantees convergence of ( 7) to a solution of (4).The disadvantage of ( 7) is the exact minimization of L A with respect to (u, w) at each iteration, which requires its own iteration.In this paper, we propose the use of alternating direction method (ADM) for the solution of (1), which is a variant of the classical ALM.

Notation
Let the superscript "⊤" denote the transpose (conjugate transpose) operator for real (complex) matrices or vectors.1) is a 2-by-n 2 matrix such that the two entries of D i u represent the horizontal and vertical local finite differences of u at pixel i whereas D i near the boundary are defined using periodic boundary conditions.The horizontal and vertical global finite difference matrices are denoted by D (1) and D (2) , respectively.As such, D (1) and D (2) contain, respectively, the first and second rows of D i for all i.With a bit abuse of notation, we let the index set corresponding to the selection operator P be denoted by itself, and let the complement of P be defined as Q = {1, . . ., n 2 } \ P .In the rest of this paper, we let ∥ • ∥ = ∥ • ∥ 2 .Additional notation is defined where it occurs.

Organization
The rest of this paper is organized as follows.In Section 2, we present the basic algorithm RecPK for (1).We also discuss in this section connections to some recent work in the field of signal and image processing.In Section 3 , we extend the ADM approach to solve two other TV-based models.Section 4 contains our experimental results, where RecPK is compared to TwIST [2].In this section, we also present recovery results from incomplete convolution data corrupted by impulsive noise and demonstrate the usefulness of RecPK in image zooming.Finally, some concluding remarks are given in Section 4.

Basic algorithm and extensions
Two main difficulties exist in solving (1), one is caused by the nondifferentiability of the TV and the other is by P K which does not maintain the circulant structure of convolution operator.We overcome both difficulties by reformulating (1) as a linearly constrained problem and considering its augmented Lagrangian function.Instead of using the classic ALM, which needs to solve unconstrained subproblems relatively accurately, we propose the use of the cheaper ADM.
In the following of this section, we describe the ALM and the ADM for the solution of (1).We also connect the ADM to some recent work in signal and image processing and discuss some extensions to other TV-based models.

By introducing auxiliary variables
To tackle the linear constraints, we consider the augmented Lagrangian function defined by Stat., Optim.Inf.Comput.Vol.(10) where γ ∈ (0, 2) guarantees the convergence.In the ALM, each subproblem needs to be solved to certain high accuracy before the update of multipliers.In our case, the joint minimization of L A with respect to (w, v, u) is needed, which is not numerically efficient.Thus, the per-iteration cost of the ALM is relatively expensive.In contrast, the ADM approach described below has a much cheaper per-iteration cost.

The ADM
The basic idea of the ADM goes back to the work by Glowinski and Marocco [24] and Gabay and Mercier [21].Following the pioneering work, the ADM was extensively studied in optimization and variational analysis.For example, in [23] it was interpreted as the Douglas-Rachford splitting method [15] applied to a dual problem.The equivalence between the ADM and a proximal point method was shown in [16].Moreover, the ADM has been extended to inexact minimization of subproblems in [16] and [26] and with a steplength criterion introduced in [41].
In a nut shell, the ADM takes advantage of the separable structure of variables in the objective of ( 8) and decreases L A by just one round alternating minimization followed by immediate multipliers update.We note that, for fixed λ (hereafter λ = (λ 1 ; λ 2 )), it is easy to apply alternating minimization to L A with respect to (w, v) and u, as described below.First, for fixed λ and u, the minimization of L A with respect to each w i has an explicit formula given by where Shrink(•, 1/β 1 ) is generally regarded as the two-dimensional shrinkage operator and is defined as where 0 • (0/0) = 0 is assumed, and the minimization of L A with v satisfies the normal equations where I represents the identity matrix.Since P is a selection matrix, P ⊤ P is diagonal and thus the solution to ( 13) is easy.We point out that both computations of w and v are linear in terms of n 2 .Second, for fixed λ and (w, v), the minimization of L A with respect to u becomes a least squares problem.To simplify representation, we let w j = (w 1 (j); . . .; w n 2 (j)), j = 1, 2, w = (w 1 ; w 2 ) and D = (D (1) ; D (2) ).With this notation, the normal equations corresponding to the minimization of L A with respect to u, after reordering variables, can be written as where Since D (1) and D (2) are finite difference matrices and K is a convolution matrix, under the periodic boundary conditions for u, they are block circulant matrices and can be diagonalized by the 2D discrete Fourier transform F. Let K = FKF ⊤ and D(j) = FD (j) F ⊤ (j = 1, 2), which are diagonal.Let D = ( D(1) ; D (2) ).Multiplying by F on both sides of ( 14), we obtain where ŷ = F(y) and M = D⊤ D + (β 2 /β 1 ) K⊤ K is a diagonal matrix.Therefore, ( 14) can be easily solved from given w and v as follows.Before iterations begin, compute M .At each iteration, first compute y and obtain ŷ by applying an FFT.Then, solve (15) by using M to obtain F(u) and thus u after applying an inverse FFT to F(u).
One can circularly apply ( 11), ( 13) and ( 14) until L A is minimized jointly with respect to (w, v, u) and update the multipliers as in the ALM framework (10).However, we choose to update λ immediately after computing (11), ( 13) and ( 14) just once.This gives the ADM as follows.

End Do
In our experiments, we simply terminated Algorithm 1 when the relative change in u is small, i.e., for some tolerance ϵ > 0. In step 3) of Algorithm 1, a relaxation parameter γ ∈ (0, ( √ 5 + 1)/2) is permitted.The shrinkage in the permitted range of γ from (0, 2) in the ALM to (0, ( √ 5 + 1)/2) in the ADM is related to relaxing the exact minimization of L A to merely one round of alternating minimization.The convergence of ADM with such a steplength attached to multipliers was first established in [22] in the context of variational inequality.Following [22], for any β 1 , β 2 > 0 and γ ∈ (0, ( √ 5 + 1)/2), the sequence {(u k , v k , w k )} generated by Algorithm 1 from any starting point (u 0 , λ 0 ) converges to a solution of (8).
The splitting technique for TV shown in (8) for model (1) was first proposed in [37] without using the Lagrange multipliers, in the context of image deconvolution, where the authors used the classic quadratic penalty method to enforce the equality constraints and continuation technique to accelerate convergence.Similar ideas were applied to multichannel deconvolution in [38] and impulsive noise removal in [39].By combining the splitting technique for TV [37] and the Bregman iterative algorithm [28,43], a split Bregman method was proposed in [25] for a class of inverse problems, which is equivalent to the ALM.Also, a variable splitting and constrained optimization approach was recently applied to frame-based deconvolution in [20].The split Bregman algorithm and its relationship with the Douglas-Rachford splitting were analyzed in [32].The relationships between the split Bregman method, the ALM and the ADM were described in [19].The ADM was also applied to image reconstruction from incomplete frequency data in [40] and Toeplitz and circulant measurements in [42].Till today, the ADM has been applied to numerous applications, and the application of ADM to image reconstruction from incomplete convolution data in this paper is demonstrated with convincing numerical comparison results.

Extensions
In this section, we extend the ADM to solve two other TV-based models.When measurements are free of noise, we consider the following constrained TV minimization which, similar to (8), is equivalent to The augmented Lagrangian problem for ( 18) is given by where ϕ β is defined in (6).The only difference between (19) and the minimization of L A defined in ( 9) lies in the v-subproblem.Therefore, the ADM for ( 18) takes exactly the same form as described in Algorithm 1 except that, by letting P(•) be the projection onto {v : . When the measurements are corrupted by impulsive noise, it is suitable to consider TV regularization with the ℓ 1 -norm data fidelity, i.e., Problem (20) can also be incorporated into the two stage algorithm introduced in [3] for deconvolution in the presence of high level impulsive noise, where f is obtained from a first stage algorithm which detects and eliminates corrupted data by certain median-type filter.By introducing auxiliary variables, ( 20) is transformed to the augmented Lagrangian problem of which is given by Likewise, the only difference between (21) and the minimization of L A defined in ( 9) lies in the v-subproblem.Thus, the ADM for (20) has exactly the same form as described in Algorithm 1 except that v is updated by where 22) can be easily verified by noting that v is component-wise separable in (21).

Experimental results
In the remaining of this paper, Algorithm 1 and its variants for the solution of ( 1), ( 17) and ( 20) are referred as RecPK (reconstruction from partial convolution).In this section, we present image reconstruction results of RecPK and compare to TwIST [2], which can be applied to (1) with P K being replaced by a general sensing matrix.All experiments were performed under Windows Vista Premium and MATLAB v7.8 (R2009a) running on a Lenovo laptop with an Intel Core 2 Duo CPU at 1.8 GHz and 2 GB of memory.In our experiments, we used the Lena image (512×512), the Shepp-Logan phantom image (512×512), the Cameraman image (256×256) and the Eye image (384×384), which are widely used in the field of image processing for testing TV regularization models.The original images have intensity values between 0 and 1 and are given in Figure 1.
In all tests, we generated f in three steps.First, apply certain convolution to ū to obtain K ū.Two types of convolution kernel were used -Gaussian and average, which were generated in MATLAB by scripts fspecial('gaussian', 15,11) and fspecial('average',15), respectively.Second, take random samples to obtain P K ū.We tested different sample ratios ranging from 5% to 50%.Finally, for noisy cases, add Gaussian noise of mean zero and standard deviation 10 −3 to P K ū.
Now we describe settings for RecPK.We tried different starting points and found that the performance of RecPK is insensitive.We simply set the starting image to be P ⊤ f , i.e., the missing samples are filled out with zeros.We set (λ 1 ) 0 and (λ 2 ) 0 to be zero in all tests.In theory, RecPK is applicable for any β 1 , β 2 > 0 and γ ∈ (0, √ 5+1 2 ).However, different choices of parameters lead to different algorithmic speed.Based on our previous results for TV image reconstruction from incomplete frequency data [40], we simply set β 1 = 10 and γ = 1.618 for RecPK, which, though may be suboptimal, worked sufficiently well to demonstrate the efficiency of the ADM.As for β 2 , we tried different choices and observed the following.When β 2 is set to be relatively small, RecPK converges well and solution quality improves steadily but at a rather slow speed.When β 2 is set to be relatively large, the intermediate images are usually deteriorated at the beginning iterations and improved quickly after u enters certain "good" region.
To demonstrate these statements, we tested on the Lena image using an average kernel of size 15×15 and sample ratio 10%.The tested results of four β 2 values ranging from "small" to "large" are given in Figure 2, where the SNR results are depicted with the increase of iteration numbers.In order to avoid deteriorated intermediate results for large β 2 while still able to maintain fast convergence, we are motivated by the results given in Figure 2 to implement continuation on β 2 , i.e., initialize β 2 at a small value and increase it step-by-step.Figure 2 also contains the SNR results in which we initialized β 2 = min{10 × 1.2 k , 10 4 }, where k counts the number of iterations.As can be seen from Figure 2, with such continuation on β 2 , the SNR values improved steadily and quickly throughout the whole iteration process.We also tested on different sample ratios and with other kernels such as the Gaussian kernel and observed consistent results.Therefore, in our experiments we applied such a continuation on β 2 .where k counts the number of iterations.

Comparison with TwIST
In this subsection, we compare RecPK with TwIST, which, in general, can be applied to solve where Φ reg (•) can be either TV or ℓ 1 regularization, A is a linear operator and µ > 0. Given the current point u k , TwIST obtains u k+1 as follows where α, δ > 0 are parameters and To compare TwIST with RecPK, we let b = f , A = P K and Φ reg (u) = ∑ i ∥D i u∥.TwIST solves the subproblem (24) using Chambolle's algorithm [9].In contract, RecPK has a much cheaper per-iteration cost of 2 FFTs (including 1 inverse FFT).
We set ϵ = 10 −3 in ( 16) for RecPK, and used the monotonic variant of TwIST, which was stopped when the relative change in the objective function fell below tol = 10 −3 .We assigned a relatively small value 10 −3 to the TwIST parameter lam1 (which was used to compute α and δ), as recommended in the TwIST documentation for ill-conditioned problems.Furthermore, to speed up convergence, TwIST is allowed maximally 10 iterations as default for each call of Chambolle's algorithm to solve (24).
In the comparison to TwIST, we used noisy data, i.e., ω distributes like Gaussian noise with mean zero and standard deviation 10 −3 .For this level of noise, we first set µ = 10 4 in (1) and tested partial Gaussian convolution data of different sample ratios.The recovered results by RecPK and TwIST are given in Figure 3 along with the resulting signal-to-noise ratios (SNR, in decibel), consumed CPU times (in seconds) and numbers of iterations taken (Iter).We also tested the two methods with incomplete data convolved by the average kernel.The recovered results are similar to those recovered from partial Gaussian convolution data and are not presented here.We will present recovery results from average kernel in the next subsections.It can be seen from Figure 3 that RecPK obtained images of comparable quality as those from TwIST in much less CPU time and the number of iterations, especially when the sample ratios are relatively high.This is because TwIST does not take advantage of the structures of P and K but rather treats P K as a generic linear operator.Additionally, compared with TwIST, which uses Chambolle's algorithm for TV denoising, the per-iteration cost of RecPK is very cheap and only takes 2 FFTs.Therefore in the results, we can see that RecPK takes less time per-iteration.Generally speaking, from the above comparison results, it is safe to conclude that RecPK is more efficient than TwIST when applied to (23).

Reconstruction from noiseless data
When measurements are free of noise, it is desirable to consider constrained TV-minimization of the form (17).In this section, we present recovery results of RecPK from (17).Different from Section 4.1, we use the average kernel to generate convolution data.The test results on the Shepp-Logan phantom image, which is usually used as a benchmark image for TV regularization, are given in Figure 4, where in addition to SNR, CPU times and iteration numbers, the relative residue defined by Res := ∥P Ku − f ∥/∥f ∥ are also given.The changing behavior of residue and SNR as RecPK proceeds is also given in Figure 5.  17), where f contains noise free partial average convolution data of sample ratios 30% (left), 10% (middle), and 5% (right).RecPK was terminated by ( 16) with ϵ = 10 −3 .From left to right: (SNR, CPU, Iter, Res) = (29.6dB,37s, 97, 7.1e-5), (21.4dB, 66s, 181, 1.5e-4) and (17.0dB, 85s, 237, 2.1e-4), respectively.16) and 30% average convolution data were used.

Reconstruction from impulsive noise corrupted data
In this subsection, we present recovery results from incomplete convolution data corrupted by impulsive noise.After generating partial convolution data P K ū, we let where N imp (v) corrupts a fraction of elements of v by its minimum and maximum values, each at probability 0.5.

The utilization of RecPK in image zooming
In image zooming, given an image of low resolution, one tries to recover an image with higher resolution.For such purpose, it is usually assumed that the available low resolution image is obtained by down sampling a higher resolution one blurred by an average kernel.Thus, to recover a high resolution image, it is natural to solve the model (1), where K is an average blur.In this test, we set the down sample factor to be 4, which means that the sample ratio is 6.25%.We then zoomed the image to its original size.We tested 3 images.In particular, the Eye image was also tested in [11].The recovery results Stat., Optim.Inf.Comput.Vol. 3, March 2015 Z. SHEN, Z. GENG AND J. YANG 13 are shown in Figure 7. Roughly speaking, we have obtained recoveries of similar quality as presented in [11].Note that the algorithm in [11] is a linearized ADM, rather than ADM itself.

Conclusion
Based on the classic augmented Lagrangian approach and a simple splitting technique, we proposed the use of ADM for solving TV image reconstruction problems with partial convolution data.Our algorithm minimizes the sum of a TV regularization term and a fidelity term measured in either ℓ 1 or ℓ 1 norm.The per-iteration cost of our algorithm contains simple shrinkages, matrix-vector multiplications and two FFTs.Extensive comparison results on single-channel images with partial convolution data indicate that RecPK is highly efficient, stable and robust and, in particular, faster than a state-of-the-art algorithm TwIST [2].Moreover, its superior performance depends on fewer fine tuning parameters than TwIST.We hope that RecPK is useful in relevant areas of compressive sensing such as sparsity-based image reconstruction.

Figure 7 .
Figure 7. Reconstruction results by RecPK.Top row: original images and the corresponding low resolution ones (Down sampling factor is 4. From left to right: Eye, Lena and Cameraman, respectively); Bottom row: results of RecPK.