A fast and accurate algorithm for ℓ1 minimization problems in compressive sampling

An accurate and efficient algorithm for solving the constrained ℓ1-norm minimization problem is highly needed and is crucial for the success of sparse signal recovery in compressive sampling. We tackle the constrained ℓ1-norm minimization problem by reformulating it via an indicator function which describes the constraints. The resulting model is solved efficiently and accurately by using an elegant proximity operator-based algorithm. Numerical experiments show that the proposed algorithm performs well for sparse signals with magnitudes over a high dynamic range. Furthermore, it performs significantly better than the well-known algorithm NESTA (a shorthand for Nesterov’s algorithm) and DADM (dual alternating direction method) in terms of the quality of restored signals and the computational complexity measured in the CPU-time consumed.


Introduction
In this paper, we study the recovery of an unknown vector u 0 ∈ R n from the observed data b ∈ R m and the model where A is a known m × n measurement matrix and z ∈ R m is a noise term. Under an assumption that the vector u 0 of interest is sparse, the work in [1,2] shows that an accurate estimation of u 0 is possible even when m < n, that is, the observations are fewer than unknowns.
Recently, there is a significant body of work that focuses on finding an approximation to u 0 by solving a convex optimization problem. In the presence of noise-free data, i.e., z = 0, the optimization problem is which essentially is the basis pursuit problem proposed early in the context of time-frequency representation [3].
Here, · 1 denotes the 1 -norm of a vector in an Euclidean space. The optimization model (BP) can be solved by linear programming.
In the presence of noisy data, the linear constraint b = Au in (BP) is relaxed to an inequality constraint Au − b 2 ≤ , where · 2 denotes the 2 -norm of a vector in an Euclidean space. As a result, the optimization model (BP) becomes the basis pursuit denoising problem where 2 is an estimated upper bound of the noise power. Both problems (BP) and (BP ) are closely related to the penalized least squares problem A large amount of research has been done on solving problems (BP), (BP ), and (QP λ ). Here, we only give a brief and non-exhaustive review of results for these problems. In [3], problems (BP) and (QP λ ) are solved by first reformulating them as perturbed linear programming and then applying a primal-dual interior-point approach [4]. Recently, many iterative shrinkage/thresholding algorithms are proposed to handle problem (QP λ ). These include the proximal forward-backward splitting [5], the gradient projection for sparse reconstruction [6], the fast iterative shrinkage-thresholding algorithm (FISTA) [7], the fixed-point continuation algorithm [8], the Bregman 1 -norm to the 1 -norm itself. Certainly, the performance of these approaches depends on the fine tuning of the parameter λ in (QP λ ) or a parameter that controls the degree of the closeness of the 1 -norm and its smoothed version.
In this paper, we consider solving problems (BP) and (BP ) by a different approach. We convert the constrained optimization problems to unified unconstrained one via an indicator function. The corresponding objective function for the unconstrained optimization problem is the sum of the 1 -norm of the underlying signal u and the indicator function of a set in R m , which is {0} for (BP) or the -ball for (BP ), composing with the affine transformation Au − b. Non-differentiability of both the 1 -norm and the indicator of the set imposes challenges for solving the associated optimization problem. Fortunately enough, their proximity operators have explicit expressions. The solutions for the problem can be viewed as fixed-points of a coupled equation formed in terms of these proximity operators. An iterative algorithm for finding the fixed-points is then developed. The main advantage of this approach is that solving (QP λ ) or smoothing the 1 -norm are no longer necessary. This makes the proposed algorithm attractive for solving (BP) and (BP ). The efficiency of fixed-point-based proximity algorithms has been demonstrated in [5] and [20][21][22] for various image processing models.
The rest of the paper is organized as follows: in Section 2, we reformulate the 1 -norm minimization problems (BP) and (BP ) via an indicator function and characterize solutions of the proposed model in terms of fixed-point equations. We also point out the connection between the proposed model and (QP λ ) through the Moreau envelope. In Section 3, we develop an algorithm for the resulting minimization problem based on the fixed-point equations arising from the characterization of the proposed model. Numerical experiments are presented in Section 4. We draw our conclusions in Section 5.

An 1 -norm optimization model via an indicator function
In this section, we consider a general optimization model that includes models (BP) and (BP ) as its special cases and characterize solutions to the proposed model. We begin with introducing our notation and recalling necessary background from convex analysis. For the usual d-dimensional Euclidean space denoted by R d , we define x, y := d i=1 x i y i , for x, y ∈ R d , the standard inner product in R d . The class of all lower semicontinuous convex Clearly, the indicator function ι C is in 0 (R d ) for any closed nonempty convex set C. In particular, we define a ball in R m centered at the origin with radius as B := {v : v ∈ R m and v 2 ≤ }.
Given a matrix A ∈ R m×n and a vector b ∈ R m , we consider the following optimization problem We can easily see that if = 0, then model (2) reduces to (BP), and if > 0, then model (2) reduces to (BP ). In other words, both constrained optimization problems (BP) and (BP ) can be unified as the unconstrained optimization problem (2) via the indicator function ι B .
In the following, we shall focus on characterizing solutions of model (2) using fixed-point equations. To characterize solutions of model (2), we first need two concepts, namely, the proximity operator and subdifferential of functions in 0 (R d ). For a function f ∈ 0 (R d ), the proximity operator of f with parameter λ, denoted by prox λf , is a mapping from R d to itself, defined for a given point x ∈ R d by The subdifferential of a proper convex function ψ ∈ 0 (R d ) at a given vector u ∈ R d is the set defined by The subdifferential and the proximity operator of the function ψ are related in the following way (see, e.g. [21]): Now, with the help of the subdifferential and the proximity operator, we can characterize a solution of the indicator function based on model (2) via fixed-point equations.
Proposition 2.1. Let be a nonnegative number, let B be the ball in R m centered at the origin with radius , let b be a point in R m , and let A be an m × n matrix. If u ∈ R n is a solution to model (2), then for any α > 0 and β > 0, there exists a vector v ∈ R m such that Conversely, if there exist α > 0, β > 0, u ∈ R n , and v ∈ R m satisfying (4) and (5), then u is a solution of model (2).
We remark that the above fixed-point characterization can be identified as a special case of Proposition 1 in [22]. We include the proof of Proposition 2.1 here for making the paper self-contained.
The proximity operators of the functions · 1 and ι B (·−b) involved in the characterization can be computed efficiently. Indeed, the proximity operator prox 1 α · 1 is the soft-thresholding operator defined for u ∈ R n by: for i = 1, 2, . . . , n.
The proximity operator prox ι B (·−b) is given by the following lemma.

Lemma 2.2. Let be a nonnegative number and let b be
Proof. By the definition of the proximity operator, we can verify directly that prox ι and prox ι B is the projection operator onto the ball B . The result of this lemma follows immediately.

An algorithm and its convergence
In this section, we develop an algorithm for finding a solution of model (2) and provide a convergence analysis for the developed algorithm.
As we already know, all solutions of model (2) should satisfy the fixed-point equations given by (4) and (5). By introducing an auxiliary variable w = prox ι B (·−b) (Au+v), we have the following equivalent form of (4) and (5) Based on the above fixed-point equations in terms of u, w, and v, for arbitrary initial vectors u 0 ∈ R n , w 0 , v 0 ∈ R m , we generate the sequence {u k : k ∈ N 0 } with N 0 := {0, 1, . . .} by the following iterative scheme To show convergence of the iterative scheme (8), we recall a result from [20]. Lemma 3.1 (Theorem 3.5 in [20]). If x is a vector in R n , A is an m × n matrix, ϕ is in 0 (R m ), and α, β, λ are positive numbers such that β λα < 1 A 2 , then the sequence {u k : k ∈ N 0 } generated by the following iterative scheme (9) converges to a solution of the optimization problem With the help of Lemma 3.1, the following result shows that under appropriate conditions on parameters α and β, the sequence {u k : k ∈ N 0 } converges to a solution of model (2).

Theorem 3.2. Let be a nonnegative number, let B be the ball in R m centered at the origin with radius , let b be a point in R m , and let A be an m × n matrix. If
then for arbitrary initial vectors u 0 ∈ R n , w 0 , v 0 ∈ R m , the sequence {u k : k ∈ N 0 } generated by the iterative scheme (8) converges to a solution of model (2).
Proof. By setting x = 0 and λ = 1 and identifying ϕ = ι B (· − b) in model (10), the iterative scheme (8) can be viewed as a special case of the one given in (9). The desired result follows immediately from Lemma 3.1.
The convergence result given by Theorem 3.2 offers a practical way to find a solution of model (2). Since the explicit forms of the proximity operators prox 1 α · 1 and prox ι B (·−b) are given by (6) and Lemma 2.2, respectively, based on Theorem 3.2, a unified approach for solving both (BP) and (BP ) is depicted in Algorithm 1.

Algorithm 1 The iterative scheme for model
Step 1: Step 2: Denote Step 3: v k+1 ← Au k+1 + v k − w k+1 until a given stopping criteria is met We remark that Algorithm 1 derived from the fixedpoint characterization of model (2) is closed related to existing algorithms based on the idea of augmented direction method for model. We briefly review an alternating direction method for model (2) that is equivalently written as the following constrained optimization problem The primal and dual alternating direction methods for solving (12) can be found in [23]. The generalized alternating direction method for (12), proposed in [24], iterates as follows: given where the matrix P = αI − βA A is positive definite. The condition β α < 1 A 2 ensures the positive definiteness of P. The technique of introducing the term (u − u k )P(u − u k ) was used earlier in [25]. It can be easily seen that the iterative scheme (13) is equivalent to (8) with λ k = βv k . It is worth pointing out that if P is the zero matrix, then the iterative scheme (13) reduces to the conventional alternative direction method of multipliers (ADMM) for the constrained optimization problem (12) (see, e.g., [26]); in this case, the u-subproblem in (13) has no explicit solution and must be solved by an appropriate iterative algorithm, for example, FISTA in [7].
We further make some comments on Algorithm 1.
Step 1 of computing u k+1 is from the first equation in (8); step 2 of computing w k+1 is from the second equation in (8) and Lemma 2.2; step 3 of computing v k+1 is exactly the same as the last equation in (8). This algorithm can be presented in a more computationally efficient way by combining step 2 and step 3 and eliminating the intermediate variable w k . The motivation comes from the observation with an assumption v −1 = v 0 − (Au 0 − w 0 ) for the given u 0 , w 0 , and v 0 . We can further substitute w k+1 computed in step 2 into step 3. In this way, the intermediate variable w k is no longer needed. Hence, these simplifications yield Algorithm 2, a variant of Algorithm 1. When = 0, all vectors w k+1 in Algorithm 1 are equal to the constant vector b for all k ≥ 0. Because of this, we would like to set w 0 = b in both Algorithms 1 and 2. Finally, it is more efficient to update u k+1 with step 1 of Algorithm 2 than with step 1 of Algorithm 1 in each iteration since the matrix-vector multiplication involving A is not required in (14). However, updating u k+1 via the formulation of step 2 in Algorithm 1 can be implemented through the use of the component-wise Gauss-Seidel iteration which may accelerate the rate of convergence of the algorithm and therefore reduce the total CPU time consumed. The efficiency of component-wise Gauss-Seidel iteration has been verified in [20,21].

Algorithm 2 A variant of Algorithm 1 for model (BP )
Step 1: Step 2: Denote until a given stopping criteria is met Algorithm 2 for model (2) can be viewed as the primaldual algorithm proposed in [27]. To make this connection, we need the notion of the conjugate function. The conju- where u is the primal variable and v is the dual variable. An alternating iterative scheme for solving the saddle point problem (15) proposed in [27] is as follows: In terms of proximity operator, the updates u k+1 and v k+1 in (16) are identical to the update u k+1 in step 1 and the update v k+1 in step 2 of Algorithm 2, respectively.

Numerical simulations
This section is devoted to showing the numerical performance of the proposed algorithms for compressive sampling. We use NESTA [18] and dual alternating direction method (DADM) [23] as a comparison. In the comparisons, the NESTA with continuation in available code NESTA_v1.1 is applied and DADM for model (BP ) is chosen. We focus on sparse signals with various dynamic ranges and various measurement matrices including randomly partial discrete cosine transforms (DCTs), randomly partial discrete Walsh-Hadamard transforms (DWHTs), and random Gaussian matrices and evaluate performance of algorithms in terms of various error metrics, speed, and robustness-to-noise. All the experiments are performed in MATLAB 7.11 on DELL XPS 14 with Intel Core i5, 4GB RAM on Windows 8 operating system.
We begin with a description of generating the m × n sensing matrix A and length-n and s-sparse signals. The sensing matrices are divided into two categories. In the first category, the sensing matrices A satisfy AA = I while in the other one condition AA = I is not satisfied. In the first category, when the m × n sensing matrix A is partial DCT or DWHT, it is generated by randomly picking m rows from the n × n DCT matrix or DWHT matrix; when A is random Gaussian, it is elements are randomly generated independently from standard normal distribution and then its rows are orthonormalized. In the second category, elements of A are randomly generated independently from Gaussian distribution. Sparse signals u used in our experiments are generated according to [18]. In each experimental trial, a length-n, s-sparse signal (a signal having exactly s nonzero components), is generated in such a way that non-zero components are given by where η 1 = ±1 with probability 1/2 and η 2 is uniformly distributed in [ 0, 1]. The locations of the nonzero components are randomly permuted. Clearly, the range of the magnitude of nonzero components of an s-sparse signal is [ 1, 10 θ ] with the parameter θ controlling this dynamic range. An observed signal (data) is collected by b = Au+z, where z represents a Gaussian noise. The accuracy of a solution obtained from a specific algorithm is quantified by the relative 2 -error, the relative 1 -error, and the absolute ∞ -error defined, respectively, as follows: where u is the true data and u is the restored data. All results reported in this section are the means of these relative errors and CPU time consumed from simulations that were performed 50 trials.
To use Algorithm 2, one needs to fix the parameters α and β such that β/α < 1/ A 2 (see Theorem 3.2).
From step 1 of the algorithm, the ratio β/α plays a role of step-size of changing u k . We now investigate the performance of Algorithm 2 with various ratio β/α = 0.999 A 2 , 0.999 2 A 2 , 0.999 4 A 2 , and α is fixed. We consider the configuration of n = 2 15 , m = n/2, s = 0.05n, the dynamic range parameter θ = 1 and the sensing matrix A is the partial DCT. The observed data is free of noise. The performance of Algorithm 2 in terms of the relative 2 error against iteration with various values of β/α is shown in Fig. 1. As it can be seen, the performance with the largest ratio β α = 0.999 A 2 is the best. We therefore set in our numerical experiments. In such the way, α is essentially the only parameter that needs to be determined. We now investigate the impact of the parameter α on the performance of Algorithm 2.
To investigate the impact of varying the parameter α on the performance of Algorithm 2, we consider the configurations of n = 2 15 , m = n/2, s = 0.05n, the dynamic range parameter θ = 5, and the sensing matrix A is partial DCT. The observed data is noise free. Six different values of α, namely, 0.0025, 0.005, 0.01, 0.02, 0.04, and 0.08, are tested. Figure 2a depicts the traces of the relative 1 -error (see (18)) against the number of iterations for each α. As it can be seen from this figure, for α = 0.0025, the smallest value in our test, the relative 1 -error drops rapidly from 1 to 10 −4 , stabilizes with insignificant changes for about 1200 iterations, and then quickly drops again to the level of 10 −16 . When α increases from 0.0025 to 0.08, the number of iterations required for the relative 1 -error dropping from 1 to 10 −4 increases. Meanwhile, the numbers of iterations for the transitions from the first sharp jump region to the second one decrease. For example, it is about 700 for α = 0.005 and only few iterations for α = 0.08. These observations motivate us to extend Algorithm 2 to a scenario in which the parameter α can be updated during the iteration with the goal of reducing the number of iterations. The proposed approach is rather simple. It begins with a relative small α and then increases it for every given amount of iterations. A detailed flow of this new approach is given in Algorithm 3.

Algorithm 3 A variant of Algorithm 1 for model (BP )
Given: integers p > 0, τ > 1, and T > 0; > 0 Initialization: v 0 ∈ R m , u 0 ∈ R n , α > 0, and β > 0 with Step 1: Compute u k+1 using step 1 of Algorithm 2 Step 2: Compute v k+1 using step 2 of Algorithm 2 Step 3: If k is a multiple of p and the number of changing the parameters α and β does not exceed T, update α ← τ α, β ← τβ until a given stopping criteria is met Three new parameters introduced in Algorithm 3 are integers p > 0, τ > 1, and T > 0. The parameter T is the allowable maximum number of updating the parameters α and β. For each update, the pair (α, β) will change to (τ α, τβ) that will keep the ratio β/α unchanged. The parameter p is to indicate that the underlying algorithm with a pair (α, β) will iterate p times before the algorithm with the pair (τ α, τβ) runs another p times. We now demonstrate the efficiency of varying the parameters α and β via applying Algorithm 3 for the same data used in Fig. 2a. We set T = 6, τ = 4, and p = 20 and initialize α = m n 20 A 2 A b ∞ . Again, we choose β by using (19). The corresponding result is shown in Fig. 2b. It is clear to see that it takes about 200 iterations to drop the relative 1 error down below 10 −14 . Hence, the strategy of updating the parameters α and β as described in Algorithm 3 is reasonable.
The rest of this section consists of two subsections. The first subsection focuses on comparisons of proposed algorithm to NESTA and DADM for sensing matrices A with AA = I, while the second subsection only focuses on numerical performance of proposed algorithms for random Gaussian sensing matrices.

Numerical comparisons
This subsection consists of three parts. Part one contains the comparisons of Algorithm 3, DADM, and NESTA for data setting with partial DCT measurement matrices, part two contains that for data setting with partial DWHT measurement matrices, and part three contains results on random matrices with orthonormalized rows.

Numerical comparison with partial DCT sensing matrices
First of all, we compare the performance of Algorithm 3 with that of NESTA and that of DADM [23] for noise-free data. The algorithm NESTA was developed by applying a smoothing technique for the nonsmooth 1 -norm and an accelerated first-order scheme, both from Nesterov's work [19]. A parameter denoted by μ is used to control how close the smoothed 1 -norm to the 1 -norm will be. To obtain high accuracy of restored signal for NESTA, μ = 10 −10 is used for partial DCT sensing matrices and various dynamic range parameters. A parameter Tol for tolerance in NESTA varies for different values of the smoothing parameter μ and different settings of generated data and needs to be determined. We choose the tolerance to obtain reasonable results. We finally choose Tol = 10 −12 , 10 −14 , 10 −15 , respectively, for data generated with dynamic range parameters θ = 1, 3, 5. For DADM, two parameters γ and β have to be predetermined. γ = 1.618 is chosen in all settings, while β varies in different settings to obtain reasonable results. We choose parameters β = b 1 m2 1 , b 1 m2 3 , b 1 m2 6 for dynamic range parameters θ = 1, 3, 5, respectively. For Algorithm 3, we set p = 20 and T to be the smallest integer that is greater than log 10 ( n m A b ∞ ). In our experiments, we notice that T is θ or θ + 1. The stopping criterion of Algorithm 3 and DADM is that the relative errors between the successive iterates of the reconstructed signal should satisfy the inequality u k+1 − u k / u k < Tol. We choose Tol = 10 −15 for data generated by partial DCT for Algorithm 3 and DADM.
For the noise-free data, the problem (BP) is used to recover underlying signals in experiments. The dimensions n are chosen from {2 13 , 2 15 } for data generated with partial DCT. The number of nonzero entries s is set to be 0.02n, 0.01n, respectively, for the number of measurements m = n/4, n/8. The performance of different algorithms are reported in Tables 1 and 2. Based on these two tables, the performance of Algorithm 3 and DADM is comparable in terms of accuracy of recovered data for various values of dynamic range parameter θ and measurement ratio m/n. But Algorithm 3 outperforms DADM in terms of computational cost (CUP time or iterations) for data with high value of dynamic range parameter (e.g., θ = 5). The performance of NESTA is Table 1 Numerical results with partial DCT sensing matrices for noise-free data. The number of measurements m is m = n/4, and the test signals are s-sparse with s = 0.02n. Each value in a cell represents the mean over 50 trials  inferior to that of Algorithm 3 and DADM in terms of accuracy and computational cost for various values of θ and measurement ratio m/n. We also observe that the relative 2 -error and 1 -error of the results recovered by Algorithm 3 along with iterations consumed are quite robust with respect to the dynamic ranges of the unknown signals. Next, the comparison of different algorithms for noisy data is discussed. The underlying data is recovered by solving problem (BP ). The settings of dimension, sparsity, and dynamic range of unknown signals for problem (BP ) are the same as those for problem (BP). The only difference is that measurements are contaminated with noise. In our experiments, noise levels in the measurements vary with the dynamic ranges of the unknown signals. More precisely, the noise levels σ are set to be 0.05, 1.0, and 5.0 corresponding to choices 1, 3, and 5 of the dynamic range parameter θ, respectively. It turns out that the noise power is 2 = mσ 2 . The setting of parameters for Algorithm 3 remains the same. For DADM, we choose parameters β = b 1 m2 1 , b 1 m2 3 , b 1 m2 3 for dynamic range parameters θ = 1, 3, 5, respectively, for better performance in terms of accuracy and computational cost. For the smoothing parameter μ in the NESTA, we choose the default setting μ = max{0.1σ , 0.01}. The stopping criteria for our algorithm and DADM is that the relative errors between the successive iterates of the reconstructed signal should satisfy the inequality u k+1 − u k / u k < 10 −5 . And the stopping criterion for NESTA is Tol < 10 −5 . The performance of different algorithms are reported in Tables 3 and 4. The accuracy of recovered data from all three algorithms for each data setting is comparable. The computational cost of DADM is comparable to or slightly better than that of Algorithm 3 for data with dynamic range parameters θ = 1, 3. Both of Algorithm 3 and DADM outperform NESTA for data with dynamic range parameters θ = 1, 3 in terms of computational cost. For data with high dynamic range (e.g., θ = 5), Algorithm 3 Table 3 Numerical results with partial DCT sensing matrices for noisy data. The number of measurements m is m = n/4, and the test signals are s-sparse with s = 0.02n. Each value in a cell represents the mean over 50 trials  performs the best in terms of computational cost while DADM performs the worst.

Numerical comparison with partial DWHT sensing matrices
The performance of the three algorithms will be discussed in this part. The performance of the algorithms will be presented in a different manner from the previous part with partial DCT sensing matrices. In all of those three algorithms, the computational cost is mainly attributed to the matrix-vector multiplication involving A or A . Under the assumption that AA = I, the three algorithms only have two such multiplications, one involving A and the other involving A in each iteration. Hence, we will only use the number of iterations to represent the computational cost. For the accuracy, only the relative Numerical results with partial DWHT sensing matrices. The relative 2 errors (the vertical axis with base 10 logarithmic scale) versus the iteration consumed (the horizontal axis) for a the noise-free case and b the noise case (the right). The colors red, blue, and yellow represent the dynamic ranges of the tested signals with θ being 1, 3, and 5, respectively μ = 10 −8 , Tol = 10 −13 is used in NESTA for noise-free data with dynamic range parameter θ = 5. Figure 3 shows the results of Algorithm 3, DADM, and NESTA when the dimension of the tested signals n is 2048 and the number of measurements m is n/4. The symbols " , " "♦, " and "∇" denote the results produced by Algorithm 3, DADM, and NESTA, respectively. The colors "red, " "blue, " and "yellow" represent the dynamic ranges of the tested signals with θ being 1, 3, and 5, respectively. The relative 2 -error is displayed with a base 10  4 Numerical results with Gaussian sensing matrices whose rows are orthonormalized. The relative 2 errors (the vertical axis with base 10 logarithmic scale) versus the iteration consumed (the horizontal axis) for a the noise-free case and b the noise case. The colors red, blue, and yellows represent the dynamic ranges of the tested signals with θ being 1, 3, and 5, respectively logarithmic scale plot for the vertical axis. We see clearly that performance of the three algorithms follows the similar phenomena that was seen in the numerical results with partial DCT measurements. The same conclusions can be drown for the results with m = n/8 as well.

Numerical comparison with orthonormal Gaussian sensing matrices
In this part, the comparisons of numerical results with orthonormal Gaussian sensing matrices will be shown. Due to the unavailability of source code of NESTA for such sensing matrices, only the comparison between Algorithm 3 and DADM is provided. The setting of parameters of Algorithms 3 and DADM are the same as above except that the stopping criterion Tol = 10 −14 is used for noisefree data. The numerical result is reported in Fig. 4 in the same manner as in previous part with DWHT sensing matrices. For the noise-free data or noisy data, the performance of Algorithm 3 and DADM is comparable in terms of relative 2 error. For noise-free data, the performance of Algorithm 3 and DADM is comparable only for data with dynamic range parameters θ = 1, 3 in terms of computational cost; while Algorithm 3 outperforms DADM in terms of computational cost (e.g., iteration) for data with θ = 5. For noisy data and in terms of computational cost, performance of the two algorithms is comparable for θ = 1; DADM performs slightly better than Algorithm 3 for θ = 3; and Algorithm 3 outperforms DADM for θ = 5. The same conclusions can be drawn for the results with m = n/8 as well.

Simulation with Gaussian sensing matrices
In this subsection, we only focus on the simulation of Algorithm 3 for data generated by general Gaussian sensing matrices (e.g., rows are not orthonormal), that is, AA = I. In such scenario, we do not compare Algorithm 3 with NESTA and DADM since the available source code of NESTA does not apply, and DADM needs an inner loop in each of iteration. The setting of parameters for Algorithm 3 is the same as the setting for data with orthonormal Gaussian sensing matrices. The results for noise-free data and noisy data are reported in Tables 5 and  6, respectively. It can be seen that the underlying signal can be recovered with high accuracy for noise-free data and with reasonable high accuracy for noisy data.

Conclusions
We reformulated the 1 -norm minimization problems (BP) and (BP ) via indicator functions as unconstrained minimization problems. The objective function for each unconstrained problem is the sum of the 1 -norm of the underlying signal u and the indicator function of a set in R m , which is {0} for (BP) or the -ball for (BP ), composing with the affine transformation Au − b. Due to the structure of this objective function and the availability of the explicit forms of the proximity operators for both the 1 -norm and the indicator function, an accurate and efficient algorithm is developed for recovering sparse signals based on fixed-point equation. The algorithm outperforms NESTA in terms of the relative 2 , the relative 1 , and the absolute ∞ error measures as well as the computational cost for tested signals ranging from a low dynamic range to a high dynamic range with different sizes. For signal with high dynamic range, the proposed algorithm also outperforms DADM in terms of computational cost but yields comparable accuracy. Further, the proposed algorithms also solve general problems without requiring condition AA = I efficiently and accurately.