Noisy Sparse Recovery Based on Parameterized Quadratic Programming by Thresholding

. Parameterized quadratic programming (Lasso) is a powerful tool for the recovery of sparse signals based on underdetermined observations contaminated by noise. In this paper, we study the problem of simultaneous sparsity pattern recovery and approximation recovery based on the Lasso. An extended Lasso method is proposed with the following main contributions: (1) we analyze the recovery accuracy of Lasso under the condition of guaranteeing the recovery of nonzero entries positions. Speciﬁcally, an upper bound of the tuning parameter h of Lasso is derived. If h exceeds this bound, the recovery error will increase with h ; (2) an extended Lasso algorithm is developed by choosing the tuning parameter according to the bound and at the same time deriving a threshold to recover zero entries from the output of the Lasso. The simulation results validate that our method produces higher probability of sparsity pattern recovery and better approximation recovery compared to two state-of-the-art Lasso methods.


Introduction
The problem of recovering unknown sparse vector S ∈ R m based on the limited noisy observations Y = AS + e arises in many applications, including compressed sensing [1,2], pattern recognition [3,4], blind source separation [5,6], signal reconstruction [7], and machine learning [8], where A ∈ R n×m is referred to a measurement matrix with n < m and e ∈ R n is an unknown vector of noise.In this paper, we suppose the positions and the signs of nonzero components of S are distributed uniformly at random, and their amplitudes follow an arbitrary distribution.We also assume that e follows zero-mean, independent, and identically distributed sub-Gaussian with parameter σ 2 .Recently, many studies have advocated the use of the parameterized quadratic programming (Lasso [9,10], also called basis pursuit [11]) to deal with the noisy sparse recovery problem through minimizing the following objective function which simultaneously executes approximation and stable recovery of the ideal sparse solution: min S Y − AS 2  2 /2 + h S 1 . ( Here and throughout, • p denotes the L p -norm (0 ≤ p ≤ ∞).Specially, S 0 = |supp(S)|, where supp(S): = { j | S j / = 0}, |Ω| denotes the cardinality of a finite set Ω and S j denotes the jth component in the vector S. In the optimization problem (1), the tuning parameter h is critical for deriving a satisfactory solution.
Up to date, many theoretical results have been obtained on Lasso to recover a sparse signal.The following two scenarios are usually of interest: (1) Sparsity pattern recovery: given noisy observations Y of sparse signal S, how to recover the positions and signs of S's nonzero entries.(2) Stable recovery: analyzing the error bound between Lasso solution S and true sparse vector S.
About scenario (1), based on deterministic framework, Fuchs [12,13] has provided a sufficient condition in mutual incoherence form.Tropp [14] and Donoho et al. [15,16] have both discussed the sufficient conditions for the support of the solution to the Lasso method to be contained within the true support of S.However, the sufficient condition with the mutual incoherence form can easily be violated in applications due to the presence of highly correlated columns in measurement matrix A. In addition, the sufficient conditions derived in the deterministic framework are sometimes too strict to reflect the fact that the Lasso often finds the sparsity pattern of true signal S. Thereby, Wainwright [17] has investigated the sufficient conditions in the probabilistic framework.At the same time, in line with scenario (2), Donoho et al. [15], Tropp [14], Wainwright [17], Tseng [18], and Candès and Plan [19] have derived the error bounds under a noise tolerance constraint.
In fact, the sparsity pattern recovery requires simultaneous recovery of nonzero entries as well as zero entries.For this purpose, the Lasso utilizes the regulation of tuning parameter h to get a tradeoff between nonzero entry recovery and zero entry recovery [13].To guarantee the zero entry recovery, a large h is always required.However, it will shrink the nonzero entries at the same time.Fuchs [12] has investigated the behavior of the solution of Lasso along with the h being increased: when the h becomes large enough, the solution of Lasso will shrink to zero.Hence, on the one hand, it is not easy to appropriately find the optimal tuning parameter h.On the other hand, to arrive at the sparsity pattern recovery, the Lasso requires high signal-tonoise ratio (SNR), and it may have poor performance in recovery accuracy which is also a basic goal in the recovery problem.In this paper, we use "approximation recovery" to reflect the performance in recovery accuracy, and its definition will be given in Section 2. It is worth noting that the achieved results of the stable recovery cannot guarantee the approximation recovery.Therefore, a significant problem in the sparse recovery is to achieve both the sparsity pattern recovery and the approximation recovery.This problem has not been previously discussed as far as we know.
To cope with this problem, we propose an extended Lasso method by utilizing thresholding.At first, we analyze the error of Lasso solution based on the L 2 norm criterion and get an upper bound of h associated with the power of noise.We also prove that, under the condition of guaranteeing the recovery of nonzero entries positions, the error of Lasso solution will increase when h exceeds the bound.Hence, for the purpose of both approximation recovery and sparsity pattern recovery, h has to be selected below this bound.Furthermore, we present a threshold estimation method and an algorithm utilizing the threshold to recover all the zero entries.The simulation results validate that this method not only recovers the sparsity pattern of S with high probability but also has a better approximation recovery than both the classical Lasso with any value of h and the basis pursuit denoising (BPDN) [11].
The rest of the paper is organized as follows.Section 2 describes the sparsity pattern recovery and the approximation recovery.In Section 3, the upper bound on h to achieve a "good" approximation recovery is found, and the threshold estimation method is presented.At the same time, an extended Lasso algorithm is proposed.Section 4 gives some simulation results to validate our algorithms.Section 5 concludes this paper.

Problem Formulation and Performance
Analysis of Lasso 2.1.Sparsity Pattern Recovery.The support and sign pattern (named sparsity pattern) of a sparse signal S are defined as where S j is the jth entry of vector S. Furthermore, we denote S min = min j∈Isupp |S j | and S supp = (S j ) j∈Isupp .The vector S supp is composed of the nonzero entries of S. Suppose S be a recovery of S based on Y; we also denote I supp = {j | S j / = 0}, Based on the above notations, a definition of sparsity pattern recovery is given as follows.

Definition 1 (sparsity pattern recovery). A signal S is a sparsity pattern recovery of S if and only if
where Remark 1. N m and N f denote the number of nonzero entries and zero entries of S that are failed in recovery, respectively.Therefore, for exact sparsity pattern recovery, the support and sign of the nonzero entries as well as the zero entries have to be recovered.

Approximation Recovery.
In noisy case, it is generally impossible to seek exact recovery of a sparse signal, and estimation error is inevitable in the process of recovery.In this paper, the squared error (SE) between the S and S is defined as follows: Using squared error as criterion, the problem of approximation recovery is defined as follows.
Definition 2 (approximation recovery).A recovery S is a "good" approximation recovery to S if the SE is as small as possible.

Performance Analysis of Lasso.
Lasso utilizes the regulation of parameter h to reach both N m = 0 and N f = 0. Generally, as h increases, the probability that the solution of Lasso reaches N f = 0 increases, but the probability that the solution of Lasso reaches N m = 0 decreases.A brief analysis to support this conclusion is given as follows.Wainwright [17] established a sufficient condition for Lasso to reach sparsity pattern recovery: to guarantee N f = 0, h must satisfy where γ ∈ (0, 1] is called incoherence parameter of the matrix A such that where I c supp denotes the supplementary set of I supp , A β is a submatrix composed of the columns of A with their indices being in β, and, for a In a necessary condition for Lasso to reach sparsity pattern recovery, Wainwright defined a quantity where I i is a unit vector whose entries are zeros except that the ith entry equals one.Suppose the existence of inclusion S i ∈ (0, g i (h)) or S i ∈ ( g i (h), 0) for some i ∈ I supp , then the probability of sparsity pattern recovery is bounded away from one According to the analysis in [17], the quantity g i (h) corresponds to the amount that is "shrunken" by the Lasso in position i ∈ I supp .As the quantity g i (h) increases with h, the assumption of the existence of inclusion S i ∈ (0, g i (h)) or S i ∈ ( g i (h), 0) for some i ∈ I supp holds with higher probability.Combined with ( 5) and ( 8), it indicates that as h increases, the probability of N m = 0 decreases.From the above analysis, we know that the probabilities of N m = 0 and N f = 0 have contradicting trends with the changing of h.Therefore, the appropriate selection of parameter h is an important and open problem [16].In practice, a large h is always used to reach N f = 0.In such case, not only the nonzero entries caused by the noise but also the nonzero entries in positions i ∈ I supp are shrunk by Lasso with a large h.This operation increases the probability of N m / = 0 and results in poor performance in approximation recovery.To mitigate this problem, in the following context, we propose a method to reach high probability on sparsity pattern recovery and high accuracy on approximation.

Extended Lasso Method Combining Parameter Selection and Thresholding
In this section, an extended Lasso method is proposed for simultaneous sparsity pattern recovery and approximation recovery.The overall algorithm is presented in Section 3.1.
Two key techniques of the proposed method, including threshold estimation and parameter selection, are given in Sections 3.2 and 3.3.

3.1.
Proposed Extended Lasso Algorithm.The algorithm of the proposed extended Lasso is summarized as follows.
Step 1. Solve (1) through appropriate selection of parameter h based on Section 3.3.
Step 2. Estimate the threshold by following (18) in Section 3.2.
Step 3. Apply pruning to make the entries below the estimated threshold of the solution obtained in Step 1. to be zero.
Remark 2. We do not try to reach the sparsity pattern recovery at Step 1 but to recover the nonzero entries of S and achieve the approximation recovery.The sparsity pattern recovery is realized by thresholding in Step 3.
In the proposed algorithm, we have to answer the following two questions.The first one is how to select an appropriate tuning parameter h that guarantees a "good" approximation recovery at the nonzero entries of S. The second is how to estimate the threshold.The answers of these questions are presented in the following context.1) is considered as follows.We denote ∂ S 1 as the subdifferential of S 1

Threshold Estimation. The Kuhn-Tucker condition of the optimization problem (
The recovery S is the optimum of (1), if and only if Since the optimization problem (1) is convex, the Kuhn-Tucker condition is a sufficient condition.To distinguish the nonzero and zero components of S, we denote S supp as the reduced dimensional vector built upon the nonzero components of S. Similarly, A denotes the associated columns in A, and a j is the jth column in A. In terms of nonzero and zero entries of S, the necessary and sufficient condition (10) can be expressed as Since A is always full rank, (11) can be given as where A + is the Moore-Penrose inverse of A. When N m = 0, (13) becomes If the maximum amplitudes of the second and third terms in the right-hand side of ( 14) are bounded, we can reach N f = 0 through pruning by setting a threshold.Furthermore, the sparsity pattern recovery can be achieved.
In fact, the third term in the right-hand side of ( 14) is a deterministic quantity.We only need to bound the second term.Suppose that the singular value decomposition A + = VΣ −1 U T and λ min is the minimal singular value of A + .Since the elements of e are zero mean and i.i.d sub-Gaussian with parameter σ 2 , it follows from property of sub-Gaussian [17] that A + e is zero mean and sub-Gaussian with a variance satisfying Consequently, using the sub-Gaussian tail bound and the union bound, we have where T represents confidence probability.With a given T, the bound of the second term in ( 14) can be derived as Hence, according to ( 14), the threshold can be set as sign S supp . (18)

Selection of h for Approximation Recovery. Denoting
u * ¸sign( S supp ) and combining ( 4) and ( 14), we have u * ] > 0. Fuchs [13] has checked the solution S of (1) as h varies from 0 to +∞.As h increases, S supp is a continuous function of h, and S supp 1 is monotonically decreasing.Until h > A T Y ∞ , S = 0. Thereby, the interval ]0+, A T Y ∞ [ can be divided into subintervals characterized by the fact that, within each such subinterval, the number of nonzero components of the optimum S of Lasso is constant.According to the above discussion, in each of the subintervals, u * is a fixed vector, SE(h) is a quadratic function about h.Thereby, the following theorem holds.The proof of this theorem is given in the appendix.Remark 3.Under the aim of sparsity pattern recovery, the selection of h has been discussed by Wainwright [17].When h > (2/γ) 2n log mσ and , where C min denotes the minimal eigenvalue of matrix ((A Isupp ) T A Isupp ), Lasso can recover the sparsity pattern of S with high probability.However, from (A.12), parameter h must satisfy h < e 2 ≤ δ to achieve approximation recovery.Hence, the selection of the parameter h is a tradeoff when both sparsity pattern recovery and approximation recovery are simultaneously under consideration.This indicates that the selection of h may be a dilemma.

Simulation Results
In this section, we carried out two experiments to evaluate the performance of the proposed method.Firstly, we studied the distribution of the selected tuning parameter h against two aims, that is, sparsity pattern recovery and approximation recovery.Secondly, we compared the performance of the proposed method with BPDN and standard Lasso.The probabilities of sparsity pattern recovery and the recovery accuracies of these methods were compared numerically.In the standard Lasso, we search its optimal tuning parameter h in its feasible range with size 1e − 2.

Histogram of Optimal h for Standard Lasso.
In this experiment, we study the histograms of optimal h for the purpose of sparsity pattern recovery and approximation recovery respectively.The parameters used in this experiment are n = 10, m = 30, S 0 = 2, 3, 4, and e 2 ≤ 0.1.The simulation results are obtained by 10000 Monte Carlo experiments with randomly generated sparse signal S and matrix A. Firstly, we applied the Lasso to achieve optimal approximation of true sparse signal S. In this experiment, we searched for the optimal tuning parameter h in its feasible range with step size 1e − 2 to achieve minimal mean-square error.The histograms of optimal h for approximation recovery with signals of different sparsity are shown in Figure 1.Secondly, we simulate the histograms of optimal h for sparsity pattern recovery.The results are also shown in Figure 1.From the histograms shown in Figure 1, it is clear that the optimal tuning parameter h for the purpose of sparsity pattern recovery and the h for approximation recovery cannot coincide generally.That is to say, we cannot achieve both purposes simply by tuning the parameter h.This motivates our research on new method to achieve sparsity pattern recovery and approximation recovery simultaneously.error between the true sparse vector S and the solutions S obtained from the proposed method, Lasso, and BPDN [11].The parameters used in this experiment are n = 10, m = 30, S 0 = 1, 2, 3 . . ., 6, and e 2 ≤ 0.1.The results are also obtained from 10000 Monte Carlo experiments with randomly generated sparse signals S and matrices A. In the sparsity pattern recovery experiments, supposing that n p experiments can recover the sparsity pattern of S, we use the ratio n p /10000 to approximate the probability of sparsity pattern recovery.Based on this approach, we obtain the probabilities of sparsity pattern recovery of the Lasso, BPDN (h = σ 2 log(n)) [11], and the proposed method as shown in Figure 2.For the Lasso, it is worth noting that the optimal tuning parameter h is searched with step size 1e−2 to achieve sparsity pattern recovery.Under the condition of sparsity pattern recovery, we further compared the mean-square error between the proposed method and Lasso.The results in Figures 2 and 3 reveal that the proposed method not only has higher probability of sparsity pattern recovery but also has better approximation performance.Since the performance of BPDN with h = σ 2 log (n) is lower than that of the Lasso with optimal h, the approximation performance of BPDN is not shown in Figure 3.

Conclusion
In this paper, motivated by the fact that Lasso can hardly achieve simultaneous optimal sparsity pattern recovery and approximation recovery, we propose an extended Lasso method to achieve satisfactory performance in both aspects.In the proposed method, firstly, we select a tuning parameter h based on Theorem 1 and solve the Lasso for approximation recovery.Then, a threshold estimation method is applied to prune the entries of the obtained solution to achieve sparsity pattern recovery.The simulation results demonstrate the good performance of the proposed method.

Appendix Proof of Theorem 1
Proof.We first denote h optimal the h value of vertex at each quadratic function SE(h).From the analysis of Section 3.3 and ( 19), the following inequality certainly holds: One can further check that S is a continuous function of h.Then, with the above proof, the continuity of squared error function, and the property of the quadratic function, the theorem is proven.

Theorem 1 .
In the general noisy case, let Y = AS + e with e 2 ≤ δ; one solves the optimization problem (1) with h.Then, if N m = 0 and h > δ, SE(h) increases with h.

Figure 1 :Figure 2 :Figure 3 :
Figure 1: The histograms of h for the purposes of sparsity pattern recovery and approximation recovery (square-solid curves: the histogram of tuning parameter h for approximation recovery (AR), circinate-dash curves: the histogram of tuning parameter h for sparsity pattern recovery (SPR)).