Sparse signal recovery with unknown signal sparsity

In this paper, we proposed a detection-based orthogonal match pursuit (DOMP) algorithm for compressive sensing. Unlike the conventional greedy algorithm, our proposed algorithm does not rely on the priori knowledge of the signal sparsity, which may not be known for some application, e.g., sparse multipath channel estimation. The DOMP runs binary hypothesis on the residual vector of OMP at each iteration, and it stops iteration when there is no signal component in the residual vector. Numerical experiments show the effectiveness of the estimation of signal sparsity as well as the signal recovery of our proposed algorithm.


Introduction
Compressive sensing (CS) [1,2], a framework to solve the under-determined system, has drawn great research attention in recent years. The CS problem can be modeled as finding the sparse solution of h for equation where the observation y ∈ R m×1 is obtained by using the sensing matrix X ∈ R m×n to measure the k-sparse signal h ∈ R n×1 . In CS framework, the sensing matrix X in (1), is a 'fat' matrix, i.e., m < n.
To find the sparse solution of h, i.e., recover the sparse signal, one can adopt either the convex relaxation based method, e.g., basis pursuit (BP) [3] or greedy algorithms, e.g., orthogonal matching pursuit (OMP) [4], regularized OMP (ROMP) [5], StOMP [6], etc. The greedy algorithm is often used for its low computational complexity and easy to implement. To implement the greedy algorithm, one needs to know the priori information on the signal's sparsity k. For example, in OMP and its variant, e.g., ROMP, the signal sparsity k must be specified so that the computation stops after k iterations. Other greedy algorithm such as subspace pursuit (SP) [7] also needs to know the value of k so that exact k candidate atoms could be selected at each iteration. The multipath channels, e.g., underwater acoustic (UWA) channel in sonar system [8] and Rayleigh fading channel in wireless communication [9], can be modeled as FIR filter. Those channels can be viewed as sparse signals according to experimental data [10,11]. Thus, CS can be applied for channel estimation. In [12], the authors shown that the CS approach achieves better estimation performance than the conventional methods.
In reality, the number of the channel taps, i.e., the signal sparsity, is usually unknown. Therefore, the greedy algorithms cannot be applied directly. In [13], the authors proposed the sparsity adaptive matching pursuit (SAMP) which does not need the signal sparsity information. In SAMP, the threshold is still needed to stop the iteration, and the performance of SAMP is sensitive to the threshold selection. In [14], stopping rules, i.e., the residual r t at tth iteration meets r t 2 < n 2 , for OMP under noise provides theoretical guarantee for sparse signal recovery.
In this paper, we proposed the detection-based orthogonal match pursuit (DOMP) algorithm which systematically provides the stop threshold based on the signal detection criteria. This is a more general threshold finding approach for stopping the OMP than the threshold proposed in [14]. Since the proposed DOMP is able to recover sparse signal without sparsity, it can be applied to the sparse channel estimation.
The rest of this paper is organized as follows. The analysis of residual vector of OMP is shown in Section 2. Section 3 discusses the hypothesis test on the residual http://asp.eurasipjournals.com/content/2014/1/178 vector. In Section 4, the threshold determined by given false alarm probability (P FA ) is discussed. The efficiency of the proposed stopping criteria is shown by numerical experiments in Section 5.

Analysis of residual vector in OMP
In this section, we show the property of the residual vector of OMP which motivates us to apply signal detection technique to determine the stopping criteria of OMP. In this study, we assume the sensing matrix X satisfies the RIP condition, i.e., δ k+1 < 1 √ k+1 , which guarantees the perfect recovery without noise perturbation [15].
The OMP can be viewed as the successive interference cancellation method, i.e., at each iteration the strongest signal component is subtracted from the residual vector. We denote the residual vector at the tth iteration by r t , the support of signal at the tth iteration by S t , the sub-matrix formed by the columns of X according to the support S t by X S t , and the rest of the matrix X by XS t . At the tth iteration, the column index i of XS t which has the highest correlation with the residual vector r t is added to the support set, i.e., S t = S t−1 i. After updating the signal support, the residual vector is updated by projecting y onto the null-space of X S t , i.e., where P ⊥ t = I − P t is orthogonal projector onto null space of X S t and P t = X t X T t X t −1 X T t ∈ R m×m . Thus, the residual vector r t after t iterations can be expressed as We denote the support of the k-sparse signal h by supp(h), i.e., supp(h) := {i ∈ {1, 2, . . . , n}|h(i) = 0}, where h(i) is the ith element of vector h. When the support obtained via the iteration is the supper-set of the actual support of the signal, i.e., supp(h) ⊂ S t , there is no signal component in the residual vector r t .
Thus, we can adopt the signal detection method to test whether the signal component exists in the residual vector after each iteration. Since one entry of h, indexed by largest column correlation with XS t , is set to zero at the tth iteration, we can define the signal component after t iteration in the residual r t as Then, (3) is equivalent to According to the definition of RIP [16], for real signal h, X obeys for all subsets S t with S t l 0 < k. Since we assumed that δ k+1 < 1 √ k+1 , the sensing matrix X meets the RIP In other words, equation X S t h = 0 has no nonzero solution, or any t columns of X are linearly independent. We have rank(P t ) = t. Since rank (P t ) + rank P ⊥ t = m, P ⊥ t is not a row full rank matrix, and vector P ⊥ t y is of degenerated multivariate normal distribution. To derive the distribution of the residual vector r t , the residual is further projected onto a subspace formed by taking any m − t rows from P ⊥ t . Since any m − t rows of P ⊥ t are linearly independent, i.e., rank(P t ) = m − t, the sub-matrix formed by these rows is of full row rank.
We denote the projection matrix by where m is the number of measurements, and t is the iteration times.
Since any m−t rows of P ⊥ t are linearly independent, and other t rows can be linearly represented by these m − t rows, any M m−t with full row rank projects the residual vector r t onto the identical subspace. Thus, we take any m − t rows from P ⊥ t for the further projection.
Define C m−t := P m−t P T m−t . If there is only noise in the residual vector, that is, z t = P m−t n, then the projected residual z t follows If the residual vector consists the signal component and noise, i.e., where θ t = h t 2 is an unknown parameter.

Hypothesis test on residual vector
With the PDF of the residual vector known, we can form the binary hypothesis test on whether there are signal components in the residual vector after t iterations, If H 0 is decided, the iteration stops. Since one entry of signal h t is set to zero at each iteration, h t 2 decreases after each iteration, and it needs to be estimated. http://asp.eurasipjournals.com/content/2014/1/178 According to (6) and (7), the PDF of the residual vector under H 0 and H 1 are respectively given by where Then, the binary hypothesis test can be conducted using the generalized likelihood ratio test (GLRT) [17]. H 1 is decided if the following inequality holds whereθ t is the maximum likelihood estimation (MLE) of θ t at each iteration, Whenθ t = 0, i.e., no signal component exists, the iteration stops; otherwise, theθ t is plugged into (11) for further test. After plugging theθ t and simplification, the test statistics is given by Since the function g(x) = x − ln x − 1 in (13) is a monotonically increasing function of x for x > 1, and its inverse function g −1 exists for x > 1, (13) can be rewritten as In (14), (14) is simplified as Finally, we obtain the detector T( In other words, when T(z t ) is greater than the threshold γ t , signal component remains in residual vector, and iteration should be continued.

Threshold selection
The threshold selection is crucial in the binary hypothesis test. We use the constant false alarm (CFA) criteria to determine the value of threshold γ t . Recall that the detector is in the quadratic form of T = v T Bv, where B is a symmetric n × n matrix and v is an n × 1 vector following N (0, C). With B = C −1 , we know that T follows the chisquare distribution with n degrees of freedom. Thus, we have Therefore, false alarm probability and detect probability are given as where Q χ 2 v (a) is the right-tail probability of Chi-Square χ 2 v function given by exp − 1 2 t 2 dt. The stopping threshold γ t shown in (17) can be calculated using numerical method [17]. Our proposed DOMP is shown in Algorithm 1. http://asp.eurasipjournals.com/content/2014/1/178

Algorithm 1 DOMP
and γ t need to be calculated for each iteration} S = S t {output the recovered signal, estimated sparsity and recovered support of the signal}

Numerical results
In this section, we present the numerical results of proposed DOMP algorithm. To evaluate the performance of DOMP, we define the mean square error of the estimated vector by whereĥ i is the recovered h of the ith experiment, and N is the number of experiments. N is set to be 5,000 in all our numerical experiments. The detector T(z t ) of DOMP checks whether there is signal in residual for each iteration. First, we show that the detection performance of the T(z t ) on residuals for each iteration. In this test, the sensing matrix is a 128 × 256 Gaussian matrix whose elements follow i.i.d. Gaussian distribution of N (0, 1). A 3-sparse signal, whose nonzero elements are all ones, is sensed. For each P FA , we perform 1,000 trials. The residual at the ith iteration is denoted as r i , and the curves of logarithmic scaled (dB) P FA versus P D at each iteration for different SNRs are shown in Figure 1. The detection probabilities of signal components are high for the first two iterations (when there exists signal components in the residual), and the detection probabilities are low after three iterations (when the residual has no signal component) for P FA between −30 and −10dB. In other words, P FA about 0.001 − 0.1 provides good tradeoff between P FA and P D .
We then compare the performance of the support recovery rate and the MSE of the recovered signal using 1) OMP with sparsity k known; 2) OMP with unknown sparsity with stopping rule of r t 2 < n 2 as proposed in  [14]; 3) DOMP with different false alarm probabilities, i.e., P FA = 0.05, P FA = 0.01, and P FA = 0.001. The sensing matrix is Gaussian matrix whose elements follow i.i.d. Gaussian distribution N (0, 1). The nonzero elements of the 256-dimensional signal are set to one. In Figures 2 and  3, the performance of these methods are shown as number of measurements (dimension of y) increases, while the sparsity of the signal is set to be 4 for SNR = 5 dB. The results shown that the OMP with sparsity k known has the best performance followed by DOMP, and OMP with stopping rule, r t 2 < n 2 . For DOMP with different P FA , the successful support recovery rate increases for lower P FA as the number of measurements increases, e.g., DOMP with P FA = 0.01 outperforms DOMP with P FA = 0.05 when the number of measurements is greater than 60. Note in Figure 3, we can observe the crossovers of DOMP with different P FA as the dimension of y increases. This is due to the fact that detection probability is a Figure 2 The MSE of the recovered signal using DOMP and OMP with/without sparsity information as number of measurements (dimension of y) increases, SNR = 5 dB. http://asp.eurasipjournals.com/content/2014/1/178 Figure 3 The percentage of successfully recovered support of signal using DOMP and OMP with/without sparsity information as number of measurements (dimension of y) increases, SNR = 5 dB.
increasing function both of the number of measurements and P FA . With small number of measurements, the effect of lower less number of measurement is more dominant than the effect of lower P FA . Thus, higher P FA results better support recovery performance for DOMP when the number of measurements is small. As the number of measurement increases, the effect of the more measurement dominates, and the DOMP with lower P FA performs better.
In Figures 4 and 5, we show the performance as sparsity of signal increase, while keep the number of measurements fixed to be 128. The figures show again that the OMP with known sparsity outperforms other Figure 4 The percentage of successfully recovered support of signal using DOMP and OMP with/without sparsity information as the sparsity increases, SNR = 5 dB.

Figure 5
The MSE of recovered signal using DOMP and OMP with/without sparsity information as the sparsity increases, SNR = 5 dB.
methods. Our proposed DOMP outperforms the OMP with stopping rule, r t 2 < n 2 . The DOMP with lower P FA has higher support recovery rate and lower MSE.
It is worth noting that in reality, the sparsity information may not be known in prior. Thus, one may not be able to directly apply OMP. Minimum description length (MDL) criterion is often used, in this scenario, to estimate the sparsity of the signal [18], i.e., the eigenvalues of the sample covariance matrix R of the received signal y, denoted by λ i is used to estimate the signal sparsity aŝ k = arg min k∈{1,2,...,n} where MDL(k) is given by We now compare the accuracy of estimation of the signal sparsity by DOMP and MDL. In this experiment, the signal dimension is set to be n = 256, and the sensing matrix X is a 128 × 256 Gaussian matrix whose entries are i.i.d Gaussian with mean zero and variance of one. The k-sparse signal h is generated by randomly setting k entries in h to be one and other entries of h to be zero. The experiment is conducted for SNR = 5 dB.  The estimated signal sparsity is shown in Table 1 for DOMP and MDL. We can observe that our proposed detection method gives accurate sparsity estimation for low sparsity signal. Actually, the estimated sparsity is the number of iteration for DOMP. Therefore, the average number of iterations for DOMP can be found in the Table 1, which actually matches the signal's sparsity for low sparsity case.
Adopting the scheme shown in Figure 6, we compare the performance of DOMP and other greedy pursuit algorithms, OMP, CoSaMP, ROMP, and SP with the signal sparsity estimated using MDL criterion. Similar with the previous experiment, we choose the sensing matrix to be the Gaussian matrix whose entries follow i.i.d Gaussian distributed of N (0, 1). The support S of the signal is randomly selected, and the amplitude of the nonzero elements of the sparse signals h are drawn from standard Gaussian distribution. The noise n is a zero mean Gaussian noise. In Figure 7, the signal recovery MSE for different number of measurements (dimension of y) is shown. In this figure, we can observe that the estimated error of DOMP is less than other greedy pursuit algorithms with MDL when the number of measurements is less than 40. One of the applications for DOMP is the channel estimation [12] since the number of channel taps are usually unknown. We compare the performances of estimating Rayleigh fading channel by MDL based OMP and DOMP following the same scheme shown in Figure 6. The Rayleigh fading channel h is given by cost207 model [19] with the parameters shown in Table 2. Since the sensing matrix X in wireless channel model y = Xh + n is a Toeplitz matrix constructed by the transmitted sequence x [20], we construct the sensing matrix by circular shift of the Gaussian vector whose elements are drawn from N (0, 1), which models the correlator output of spread spectrum signal. Figures 8 and 9 show the MSE of estimated BUx6 and RAx4 channel using DOMP and MDL based OMP, respectively. These two figures show that the error of estimation by DOMP is less than MDL-based OMP when the number of measurement m is less than 80.

Conclusions
In this paper, we proposed a detection-based OMP algorithm called DOMP. This method forms GLRT for each iteration to test if signal component exists in the residual vector. When no signal component exists, the algorithm stops the iteration. In this paper, we use OMP, a classical greedy algorithm, to apply this detection-based method. We envision that the detection-based method can be apply to other greedy algorithms for iteration stopping rules.
The numerical results show that the proposed DOMP outperforms the classical OMP algorithm without prior sparsity information at lower SNR or number of measurements. We use cost207 wireless channel estimation as an example to show the effectiveness of DOMP. The DOMP can be readily applied to other sparse recovery problems, e.g., underwater channel in sonar system and radar system, where the signal sparsity is unknown.