Exact recovery of sparse signals with side information

Compressed sensing has captured considerable attention of researchers in the past decades. In this paper, with the aid of the powerful null space property, some deterministic recovery conditions are established for the previous ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}–ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} method and the ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}–ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} method to guarantee the exact sparse recovery when the side information of the desired signal is available. These obtained results provide a useful and necessary complement to the previous investigation of the ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}–ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} and ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}–ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} methods that are based on the statistical analysis. Moreover, one of our theoretical findings also shows that the sharp conditions previously established for the classical ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} method remain suitable for the ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}–ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} method to guarantee the exact sparse recovery. Numerical experiments on both the synthetic signals and the real-world images are also carried out to further test the recovery performance of the above two methods.

well as their resultant recovery error estimates, have also been obtained for (1.1) before, see, e.g., [14][15][16]. Unfortunately, the ℓ 1 method do not incorporate any side information of x due to the equal treatment of the ℓ 1 norm for the components of the variable x . Considering that such side information is often available in many real-world applications, it is naturally expected that the performance of model (1.1) can be further improved if the side information is well integrated. In general, there are two types of common side information in CS filed. The first one takes the form of a known support estimate. To deal with this type of side information, the authors in [17] first modeled the known support as a set T, and then integrated it into the ℓ 1 norm, leading to the model where K c models the complement set of K in {1, 2, 3, . . . , n} . Their work also showed that the resultant recovery conditions are weaker than those without side information. In [18], the authors considered a more general weight rather than a constant weight in the known support estimate. In [19], a variant iterative hard thresholding (IHT) algorithm was proposed by incorporating the partially known support information, and some theoretical analysis was also established for this algorithm. In [20], the orthogonal matching pursuit (OMP), as an iterative greedy algorithm, was extended by using the partially known support. The authors of [21] also considered embedding the known support information into the iterative reweighted least squares (IRLS) algorithm at each iteration, leading a reduction of the number of the measurements as well as the computational cost. Recently, some new recovery conditions were obtained by Ge et al. in [22].
Another type of the side information takes the form of a referenced similar to the original signal x . The side information of this type usually comes from applications such as the magnetic resonance imaging (MRI) [23], video processing [24,25] and estimate problems [26]. For example, when faced with some video processing problems, we usually know some previous video frames before we cope with the next video frames. These video frames known in advance, to some degree, can be viewed as the side information of the next video frames. By introducing two ℓ 1 norm and ℓ 2 norm approximation terms to model (1.1), respectively, Mota et al. [27] proposed to solve an ℓ 1 -ℓ 1 method and the ℓ 1 -ℓ 2 method where β is a positive parameter and w ∈ R n is the referenced signal that models the side information. For simple, w is assumed to obey Supp(w) ⊂ Supp( x) , where Supp(w) = {i : |w i | � = 0, i = 1, 2, · · · , n} . Based on some statistical tools, the authors affirmatively answer how many measurements one required to ensure the exact recovery of any k-sparse signal x . Some convincing experiments are also conducted to support their claims. Note that there also exist some works which embeds the prior information of the desired signals into some other models. For example, Zhang et al. [28] recently proposed to use an ℓ 1−2 model to deal with the signal recovery with prior information. We refer the interested readers to [28][29][30] and the references within for more details.
In this paper, we revisit the above ℓ 1 -ℓ 1 and ℓ 1 -ℓ 2 methods for exact sparse recovery with side information. Different from the pioneering work of [27], this paper aims at investigating both the ℓ 1 -ℓ 1 and ℓ 1 -ℓ 2 methods in a deterministic way. To do so, by means of the powerful null space property (NSP), we established two kind of deterministic sufficient and necessary condition for these two methods. Our obtained theoretical results not only well complement the previous work [27] that was based on the statistical analysis, but also surprisingly find that the sharp exact recovery conditions of model (1.1) are still suitable for the ℓ 1 -ℓ 1 model (1.3). Moreover, the resultant numerical experiments show that the recovery performance of the ℓ 1 -ℓ 1 method is superior to other methods in terms of the number of the measurements required by incorporating the side information.
The rest of this paper is organized as follows the main theoretical results are presented in Sect. 2, and the resultant numerical experiments are provided in Sect. 3. Finally, we conclude this paper in Sect. 4.
2 The deterministic analysis of ℓ 1 -ℓ 1 and ℓ 1 -ℓ 2 methods Our main results will be presented in this section, which include the exact recovery guarantees of ℓ 1 -ℓ 1 and ℓ 1 -ℓ 2 methods. Before moving on, we first introduce the following two key definitions. Definition 2.1 (NSP, see, e.g., [31]) For any subsets K ⊂ {1, 2, . . . n} with |K | ≤ k and any h ∈ Ker(A)\{0} {h : Ah = 0, h � = 0} , we say A ∈ R m×n satisfies the k-order NSP if it holds that Furthermore, if it holds that for certain 0 < α < 1 , then we say A satisfies the k-order stable NSP with constant α. Definition 2.2 (Restricted isometry property, see, e.g., [3]) A matrix A is said to satisfy the k-order restricted isometry property (RIP) if there exists 0 < δ < 1 such that holds for all k-sparse signals h ∈ R n and subsets K ⊂ {1, 2, . . . n} with |K | ≤ k . Moreover, the smallest δ obeying (2.7) is denoted by δ k , i.e., the known k-order restricted isometry constant (RIC).

Remark 2.4
The k-order NSP has been demonstrated to be a necessary and sufficient condition for the classical ℓ 1 method to ensure the exactly k-sparse signal recovery. However, according to our Theorem 2.3, this condition also applies to the ℓ 1 -ℓ 1 model (1.3). On the other hand, it has also been shown in [32,33] that if A obeys the k-order stable NSP with constant α , then α can be expressed by tk-order RIC δ tk with t > 1 as follows: If one further restricts α < 1 , then we will get Note that condition (2.8) has been proved to sharp for the classical ℓ 1 to exactly recover any k-sparse signal. Again, condition (2.8) is also suitable to the ℓ 1 -ℓ 1 model (1.3). As far as we know, the RIC-based sufficient conditions have not been established for the ℓ 1 -ℓ 1 method before.

Proof of Theorem 2.3
First, we prove the sufficiency. Pick any feasible k-sparse vectors where we have used the triangle inequality in the first inequality.
Recall that h K 1 < h K c 1 , β > 0 , and we have assumed that A obeys the k-order NSP, we get � Hence, the sufficiency is proved. Now, we prove the necessity. To do so, we first assume that the ith component of x obeys Then, we can obtain the following properties � x� 0 ≤ k , with τ h , and noting that � x + τ h� 1 + β� x + τ h − w� 1 > � x� 1 + β� x − w� 1 since x is the exact solution, we can easily deduce that for any 0 < τ ≤ 1 and β > 0 , which requires (2.5) to hold.
In what follows, we establish the stable NSP condition of order s for the ℓ 1 -ℓ 2 model (1.4).

Proof of Theorem 2.5
Our proof is partially inspired by [34]. We start with proving the sufficiency. Pick any feasible k-sparse vectors Hence, we prove that x is the unique minimizer of (1.4).
As for the necessity, it is sufficient to show that for any given nonzero h ∈ Ker(A) and K with |K | ≤ k the stable NSP of order k with α given by (2.10) holds. Similarly with Furthermore, assume that the scales x and w have fixed values of � x� ∞ and −�w� ∞ , respectively, then we get � x K − w, τ h K � = −(� x� ∞ + �w� ∞ )�h K � 1 for any 0 < τ ≤ 1 . Now, we replace τ h with h and observe that both of inequalities of (2.11) now hold with equality. Since x is the exact recovery, it requires � x + τ h� 1 2 2 , so we get for any 0 < 1 ≤ τ , which proves the necessity.

Numerical simulations
As can be seen in the previous sections, our goal in this paper is to provide some deterministic recovery conditions for the ℓ 1 -ℓ 1 model and the ℓ 1 -ℓ 2 model to guarantee the exact sparse recovery, and the obtained theoretical results for these models can be found in Theorem 2.3 and Theorem 2.5, respectively. On the other hand, it is still a difficult problem to find the desired measurement matrices according to the conditions stated in two obtained theorems. Nevertheless, we still hope to provide some numerical simulations to testify the efficiency of two models with the side information. Since both the ℓ 1 -ℓ 1 model, ℓ 1 -ℓ 2 model and the ℓ 1 model are convex, in this paper, we resort to the popular and easy-implemented CVX 1 (with an SeDuMi solver) to solve them.

Experiments on the synthetic signals
We start with the experiments on synthetic signals. For simplicity, in all experiments, we assume that the length of the desired signal x is n = 256 , and its sparsity is set to be k. We generate such a k-sparse signal x as follows. The location of the nonzero components in x is generated at random and their corresponding values are chosen from a standard normal distribution. We assume that the referenced signal w is k w -sparse with k w ≤ k , and its nonzero entries are chosen at random from nonzero entries in x with Supp(w) ⊆ Supp( x) . Obviously, when k w < k , w contains part of the side information of x , and when k w = k , w contains all the information of x , i.e., w = x . Besides, we generate the measurement matrix A ∈ R m×n by drawing it from a standard Gaussian distribution. To judge the recovery performance of the completing methods, we adopt the signal-to-noise (SNR), see, e.g., [35], given by where x and x are denoted by the original signal and the recovered signal by certain model. If there is no specific description, the average SNR results over independent 50 trails are used as the final results. We first conduct a simple experiment for the ℓ 1 -ℓ 1 model and the ℓ 1 -ℓ 1 model to test their recovery performance on the signal with side information. In this sort of experiments, we set m = 70 , k = 20 , k w = 10 , and β = 10 4 for the ℓ 1 -ℓ 1 model and β = 10 −4 for the ℓ 1 -ℓ 2 model. Figure 1 plots the resultant recovery performance. One can easily observe that both the ℓ 1 -ℓ 1 model and the ℓ 1 -ℓ 2 model well finish the recovery task with the two recovered signals being almost same with the original signal. In the above experiments, we only set β = 10 4 for the ℓ 1 -ℓ 1 model and β = 10 −4 for the ℓ 1 -ℓ 2 model for simplicity. Obviously, if these parameters can be further optimized, the resultant SNR performance will be correspondingly improved. To select the proper β for the models (1.3) and (1.4). We let the parameter β be chosen from {10 −8 , 10 −7 , · · · , 10 8 } , and set the other parameters be same as before.
Furthermore, by fixing k = 20 , k w = 10 , we testify the recovery performance of the two models under different number of measurements. In this sort of experiments, we also consider comparing these two models with the classical ℓ 1 model (1.1) and the weighted ℓ 1 model (without noise) in [36], i.e., where (3.12) min and K ⊆ {1, 2, 3, · · · , n} models the known support set of the original signal x . For simplicity, we set K = Supp(w) . Since (3.12) is also convex we can solve it easily by means of the CVX. Obviously, the above weighted ℓ 1 model will reduce to (1.2) when one sets ω = 0 . According to [36], it is suggested to set ω as closer as 0 when the value of |K ∩ Supp( x)|/|K | is as closer as 1. Considering that in our experiments K is strictly included in Supp( x) , we set ω = 0 for the weighted ℓ 1 model to boost its best recovery performance. Figure 3 shows the obtained results. It first shows that an increasing m leads to a better recovery for all three models. However, among these models, the ℓ 1 -ℓ 1 model performs best, followed by the weighted ℓ 1 model. The ℓ 1 -ℓ 2 model and the classical ℓ 1 model perform worst. It first indicates that a good selection of the constraint on the error of the true signal and its referenced signal plays a key role in enhancing the recovery performance of the models. It also shows that the ℓ 1 -ℓ 1 model is better than the weighted ℓ 1 model in taking good advantage of the side information. When it comes to the (classical) ℓ 1 model itself, it is suggested to add an ℓ 1 -norm based error, rather than an ℓ 2 -norm based error, into the objective function of the ℓ 1 model to boost its recovery performance when the side information of the signals becomes available. This observation is also consistent with the conclusion drawn in [21].
Moreover, to further testify the performance of the ℓ 1 -ℓ 1 model and the ℓ 1 -ℓ 2 model affected by the sparsity of the signals with side information, we consider using these two models to recover the k-sparse signals under different kinds of k w -sparse referenced signals. Figure 4 first plots the recovery results with k changing from {10, 12, 14, · · · , 28} where k w is set to be k w = ⌈k/2⌉.
In general, if the number of the measurements is fixed, a larger k always leads to a poorer recovery. Obviously, this conclusion can be easily drawn from Fig. 4. However, once the side information of the desired signals is available and is also well modeled the recovery performance can be further improved. Since the classical ℓ 1 model do not take the side information into consideration, its performance is weaker than the ℓ 1 -ℓ 1 model and the weighted ℓ 1 model. In this experiment, the ℓ 1 -ℓ 2 model performs poor again. Moreover, it reconfirms again a fact that if one can take good advantage of the side information and also well model the side information, the recovery performance can be further improved. In Fig. 5, we investigate the recovery performance affected by the "quality/quantity" of the side information. To be specific, we fix the sparsity of the original signals as k = 20 , and then let k w change from {1, 3, 5, 7, · · · , 17} . Obviously, the larger the value of k w , the higher "quality/quantity" the side information. It is also expected that the recovery performance will be largely improved once the value of k w is increasing.
Obviously, it is easy to see from Fig. 5 that both the recovery performance of the ℓ 1 -ℓ 2 model and the weighted ℓ 1 model is consistent with our expectations, which is far better than the rest two models. Note that in this sort of experiments, we also fix m = 64.
At the end of this part, we will conduct a special experiment, in which the referenced signal w is set as w = x . Under such setting it becomes very important to investigate how the parameter β affects the recovery performance of both the ℓ 1 -ℓ 1 model and the the ℓ 1 -ℓ 2 model. To conduct this experiment, we set m = 64 , k = k w = 20 , and let the parameter β change from {10 −8 , 10 −7 , · · · , 10 8 } . Figure 6 plots the obtained results. Obviously, a larger β , a better recovery performance of both these two models. When β is relatively small, both the models perform similar. However, when β increases, the ℓ 1 -ℓ 1 model performs much better than the ℓ 1 -ℓ 2 model. It should also be noted that such a assumption that w = x is usually impractical in real-world applications. However, we can rough conclude that a relative bigger β helps the ℓ 1 -ℓ 1 model to yield a better recovery performance.

Experiments on the real-world images
In this part, we consider applying the above-mentioned four models to deal with the real-world image recovery problem. Figure 7 shows ten real-world images that we will use in the following experiments. As is known to all, the real-world images are generally not nearly sparse themselves, but can be transformed to be nearly sparse by using some sparse dictionaries such as the discrete cosine transform (DCT). On the other hand, almost all the real-world images usually have local smoothness, which indicates that one can use some known information of the original image to help recover some unknown neighboring information of the original image. Therefore, we will take the nearly sparse vectors (generated by applying the DCT to each column of the input images) as the test signals to test the recovery performance of these models. To be specific, let the original image G be denoted by G = [g 1 , g 2 , · · · , g d ] with g i ∈ R n for i = 1, 2, · · · , d , and the DCT dictionary be denoted by D ∈ R n×n , then we can easily get the desired sparse (test) signal x i by x i = Dg i . To model the side information of x i , we first generate r i = x i + 0.01 * �x i � 2 * ξ , where ξ ∈ R n and its elements are generated independently from the standard norm distribution. The signal r i can be viewed as the perturbed version of the signal x i . As to the support estimate K in the weighted ℓ 1 model, we set K to be the indices of the ⌈n * 1%⌉ largest absolute elements in r i . As to the ℓ 1 -ℓ 1 model and the ℓ 1 -ℓ 2 model, we set the the referenced signal of x i by w i with (w i ) j = (r i ) j when j ∈ K and 0 otherwise. As before, we generate the measurement matrix A ∈ R m×n with m = ⌈n/4⌉ whose elements are generated independently from the standard norm distribution. As to the other parameters, we set them as we have claimed in Sect. 3.1. Obviously, once we obtain the recovered signal one by one, denoted by x ♯ i the ith recovered signal for i = 1, 2, · · · , d , by any of the four models, we can thus obtain the recovered image G ♯ by using To eliminate the column effect on each recovered image, we consider recovering the original images by column and by row, respectively, and then use their average values as the final output. Moreover, to evaluate the quality of the recovered images, we consider using two popular indices, i.e., the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM), more details on these two indices can be found in [37]. Table 1 lists the obtained PSNR|SSIM results of ten test images recovered by three different models, where the highest PSNR and SSIM values are marked in bold. 2 It is easy to see that the ℓ 1 -ℓ 1 model performs best among all the models, followed by the weighted ℓ 1 model. The ℓ 1 model and the ℓ 1 -ℓ 2 model perform almost be same, but are all far worse than the ℓ 1 -ℓ 1 model. There results further confirm the claims we have drawn previously. Note that, in this sort of experiments, we still set β = 10 5 for the ℓ 1 -ℓ 1