 Research
 Open access
 Published:
On recovery of blocksparse signals via mixed l _{ 2 } /l _{ q } (0 < q ≤ 1) norm minimization
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 76 (2013)
Abstract
Compressed sensing (CS) states that a sparse signal can exactly be recovered from very few linear measurements. While in many applications, realworld signals also exhibit additional structures aside from standard sparsity. The typical example is the socalled blocksparse signals whose nonzero coefficients occur in a few blocks. In this article, we investigate the mixed l _{2}/l _{ q }(0 < q ≤ 1) norm minimization method for the exact and robust recovery of such blocksparse signals. We mainly show that the nonconvex l _{2}/l _{ q }(0 < q < 1) minimization method has stronger sparsity promoting ability than the commonly used l _{2}/l _{1} minimization method both practically and theoretically. In terms of a block variant of the restricted isometry property of measurement matrix, we present weaker sufficient conditions for exact and robust blocksparse signal recovery than those known for l _{2}/l _{1} minimization. We also propose an efficient Iteratively Reweighted LeastSquares (IRLS) algorithm for the induced nonconvex optimization problem. The obtained weaker conditions and the proposed IRLS algorithm are tested and compared with the mixed l _{2}/l _{1} minimization method and the standard l _{ q } minimization method on a series of noiseless and noisy blocksparse signals. All the comparisons demonstrate the outperformance of the mixed l _{2}/l _{ q }(0 < q < 1) method for blocksparse signal recovery applications, and meaningfulness in the development of new CS technology.
1 Introduction
According to the Shannon/Nyquist sampling theorem [1, 2], if we would like to avoid lose of information when capturing a signal, we must sample the signal at the socalled Nyquist rate, which means twice the highest frequency of the signal. Since the theorem only exploits the bandlimitedness of a signal and most realworld signals are sparse or compressible, the process of massive data acquisition based on Shannon/Nyquist sampling theorem usually samples too many useless information and eventually we have to compress to store or encode a very few essential information of the signal. Obviously, this process is extremely wasteful and therefore, a more effective sampling way to directly acquire the essential information of a signal has been expected.
Compressed sensing (CS) [3–5] was motivated by this purpose and it can completely acquire the essential information of a signal by exploiting its compressibility. In a word, the main contribution of CS is that it presents a new efficient scheme to capture and recover the compressible or sparse signals at a reduced sampling rate far below the Nyquist rate. The basic principle of CS is that it first employs nonadaptive linear projections to preserve the structure of the signal; then one can exactly recover these signals from a surprisingly small number of random linear measurements through a nonlinear optimization procedure (such as l _{1}minimization) if the measurement matrix satisfies some suitable sufficient conditions in terms of restricted isometry property (RIP, [6]). Consequently, CS implies that it is indeed possible to acquire the data in already compressed form. Nowadays, driven by a wide range of applications, CS and other related problems have attracted much interest in various communities, such as in signal processing, machine learning, and statistics.
Different from general sparse signals in conventional sense, some realworld signals may exhibit some additional structures, i.e., the nonzero coefficients appear in a few fixed blocks, we refer to these signals as blocksparse signals in this article. Such blocksparse signals arise in various application problems, say, DNA microarrays [7, 8], equalization of sparse communication channels [9], source localization [10], wideband spectrum sensing [11], and color imaging [12].
Using standard convex relaxation ( l _{1}minimization) as conventional CS framework for recovering the blocksparse signal does not exploit the fact that the nonzero elements of the signal appear in consecutive positions. Therefore, one natural idea is considering the block version of l _{1}minimization, i.e., the mixed l _{2}/l _{1}minimization, to exploit the blocksparsity. Many previous works have shown that the mixed l _{2}/l _{1}minimization is superior to standard l _{1}minimization when dealing with such blocksparse signals [13–15]. Huang and Zhang [13] developed a theory for the mixed l _{2}/l _{1}minimization by using a concept called strong group sparsity and they demonstrated that the mixed norm minimization is very efficient for recovering strongly groupsparse signals. Stojnic et al. [15] obtained an optimal number of Gaussian measurements for uniquely recovering a blocksparse signal through the mixed l _{2}/l _{1} norm minimization. By generalizing the conventional RIP notion to the blocksparse case, Eldar and Mishali [14] showed that if the measurement matrix D has the same restricted isometry constant as that in the l _{1} case, then the mixed norm method is guaranteed to exactly recover any blocksparse signals in the noiseless case. Furthermore, they also showed that the blocksparse signal recovery would be robust in the noisy case under the same recovery condition. Another common approach to deal with the blocksparsity problem is by suitably extending the standard greedy methods, such as orthogonal matching pursuit, iterative hard thresholding (IHT), and compressive sampling matching pursuit (CoSaMP), to the blocksparse case. In [16], the CoSaMP algorithm and the IHT algorithm were extended to the modelbased setting, treating the blocksparsity as a special case. It was also shown that the new recovery algorithms demonstrate provable performance guarantees and robustness properties. Eldar et al. [17] generalized the notion of coherence to the blocksparse setting and proved that a blockversion of the orthogonal matching pursuit (BOMP) algorithm can exactly recover any blocksparse signal if the blockcoherence is sufficiently small. In addition, the mixed l _{2}/l _{1}minimization approach was certified to guarantee successful recovery with the same condition on blockcoherence. BenHaim and Eldar [18] examined the ability of greedy algorithms to estimate a blocksparse signal from noisy measurements. They derived some nearoracle results for the blocksparse version of greedy pursuit algorithms both in the adversarial noise and the Gaussian noise cases. Majumdar and Ward [19] used BOMP to deal with the blocksparse representationbased classification problem. The validity and robustness of these new methods were theoretically proved.
In recent years, several studies [20–27] have showed that the nonconvex l _{ q } (0 < q < 1) minimization allows the exact recovery of sparse signals from fewer linear measurements than that by l _{1}minimization. Chartrand and Staneva [21] provided a weaker condition to guarantee perfect recovery for the nonconvex l _{ q } (0 < q < 1) minimization method using a l _{ q } variant of the RIP. They obtained the number of random Gaussian measurements necessary for the successful recovery of sparse signals via l _{ q } (0 < q < 1) minimization with high probability. Sun [27] used the conventional RIP as in the l _{1} case to prove that whenever q is chosen to be about 0.6796 × (1  δ _{2k }), every ksparse signal can exactly be recovered via l _{ q } minimization, here δ _{2k } is the restricted isometry constant for the measurement matrix. Xu et al. [24–26] considered a specially important case ( q = 1/2) of l _{ q } minimization. They developed a thresholding representation theory for l _{1/2} minimization and conducted a phase diagram study to demonstrate the merits of l _{1/2} minimization.
This article presents an ongoing effort to extend the nonconvex l _{ q }(0 < q < 1) minimization methodology to the setting of blocksparsity. Specifically, we will study the performance of the blocksparse signal recovery via the mixed l _{2}/l _{ q } (0 < q < 1) norm minimization by means of the block RIP (blockRIP). We first exhibit that under similar RIP conditions with that in the standard l _{ q } case, the mixed l _{2}/l _{ q } recovery method can assuredly to recover any blocksparse signal, irrespective to the locations of nonzero blocks. In addition, the method is robust in the presence of noise. Our formulated recovery conditions will show that the nonconvex l _{2}/l _{ q }(0 < q < 1) minimization is superior to the convex l _{2}/l _{1} minimization within blockRIP framework. Furthermore, we will compare the sparse signal recovery ability of the nonconvex l _{2}/l _{ q }(0 < q < 1) method to the convex l _{2}/l _{1} method and the standard l _{ q } method by conducting a series of simulation studies. To the best of the authors’ knowledge, although Majumdar and Ward [12] first proposed the nonconvex l _{2}/l _{ q }(0 < q < 1) method in CS literature for color imaging and showed that the l _{2}/l _{0.4} minimization has the best performance on some imaging experiments, their works were experimental in feature and lack of convincing theoretical assessment. As compared, our work not only highlights theoretical merits of the nonconvex block optimization method, but also makes a more intensive study on the blocksparse signal recovery capabilities for some different values of q via numerical experiments.
We begin with our study in Section 2,3 and 4 by presenting the problem setting. In Section 5, we establish the sufficient conditions for the mixed l _{2}/l _{ q }(0 < q < 1) optimization approach to guarantee exact and robust recovery of blocksparse signals in terms of blockRIP. We also develop an efficient Iteratively Reweighted LeastSquares (IRLS) algorithm to recover blocksparse signals from given fewer measurements, which generalizes the algorithm of [28] to the unconstrained l _{2}/l _{ q }(0 < q ≤ 1) norm minimization case in Section 6. In Section 7, we show that the nonconvex l _{2}/l _{ q }(0 < q < 1) method has stronger blocksparsity promoting ability than the convex l _{2}/l _{1} method and the standard l _{ q } method through a series of simulations. Finally, we conclude the article in Section 8 with some useful remarks.
2 Blocksparsity
The conventional CS only consider the sparsity that the signal x has at most k nonzero elements, and it does not take into account any further structure. However, in many practical scenarios, the nonzero elements are aligned to blocks, meaning that they appear in regions. These signals are referred to the blocksparse signals. Mathematically speaking, a blocksparse signal \mathbf{x}\in {\mathbb{R}}^{N} over block index set \mathcal{I}=\{{d}_{1},\dots ,{d}_{m}\} can be modeled as follows:
Here, x[i] denotes the i th block of x and d _{ i } is the block size for the i th block. The blocksparsity we consider in this article means that there are at most k < m nonzero blocks. Obviously, if d _{1} = ⋯ = d _{ m } = 1, the blocksparse signals degenerate to the conventional sparse signals well studied in CS.
Definition 1.
([14]). A block ksparse signal over index set \mathcal{I}=\{{d}_{1},\dots ,{d}_{m}\} is a signal of the form (1) in which x[i] is nonzero for at most k indices i, i ∈ {1,2,…,m}.
The main focus of this study is to recover a blocksparse signal x from random linear measurement y = Φ x (noiseless case) or y = Φ x + z(noisy case). Here, \mathbf{y}\in {\mathbb{R}}^{M} is a vector, \Phi \in {\mathbb{R}}^{M\times N} is a measurement matrix, whose entries are usually randomly drawn from a Gaussian or a Bernoulli distribution, and z is an unknown bounded noise. We represent Φ as a concatenation of columnblocks Φ[i] of size M × d _{ i }, that is,
Then we are interested in formulating sufficient conditions on the measurement matrix Φ under which a blocksparse signal x can assuredly be and stably recovered from its fewer noiseless measurements \mathbf{y}=\sum _{i=1}^{m}\Phi \left[i\right]\mathbf{x}\left[i\right] or noisy measurements \mathbf{y}=\sum _{i=1}^{m}\Phi \left[i\right]\mathbf{x}\left[i\right]+\mathbf{z}. Denote
where I(∥x[i]∥_{2} > 0) is an indicator function, we then notice that a block ksparse signal x can be defined as a vector that satisfies ∥x∥_{2,0} ≤ k. In the remainder of the article, we will restrict our attention to how and in what conditions these blocksparse signals can be recovered exactly and stably in noiseless and noisy scenarios respectively.
3 BlockRIP
Candes and Tao [6] first introduced the notion of RIP of a matrix to characterize the condition under which the sparsest solution of an underdetermined linear system exists and can be found. And then the RIP was used as a powerful tool to study CS in several previous works [4, 5, 21, 29]. Let Φ be a matrix of size M × N, where M < N, we say that matrix Φ vsatisfies RIP of order k if there exists a constant δ _{ k } ∈ [0,1) such that for every \mathbf{x}\in {\mathbb{R}}^{N}(\parallel \mathbf{x}{\parallel}_{0}\le k),
Obviously, δ _{ k } quantifies how close to isometric the all M × k submatrices of Φ should be. Since the blocksparse signals exhibit additional structure, Eldar and Mishali [14] extended the standard RIP to the blocksparse setting and showed that the new blockRIP constant is typically smaller than the standard RIP constant. Now we state the new definition in blocksparse setting.
Definition 2.
([14]). Let \Phi :{\mathbb{R}}^{N}\to {\mathbb{R}}^{M} be a M × N measurement matrix. Then Φ is said to have the blockRIP over \mathcal{I}=\{{d}_{1},\dots ,{d}_{m}\} with constant {\delta}_{k\mathcal{I}} if for every vector x ∈ R ^{N} that is block ksparse over \mathcal{I}, it satisfies
For convenience, in the remainder of the article, we still use δ _{ k }, instead of {\delta}_{k\mathcal{I}}, to represent the blockRIP constant whenever the confusion is not caused.
With the new notion, Eldar and Mishali [29] generalized the sufficient recovery conditions to the blocksparse signals both in noiseless and noisy settings. They showed that if Φ is taken random as conventional CS, it satisfies the blockRIP with overwhelming probability. All these results illustrated that one can recover a blocksparse signal exactly and stably via the convex mixed l _{1}/l _{2} minimization method whenever the measurement matrix Φ is constructed from a random ensemble (i.e., Gaussian ensemble).
4 Nonconvex recovery method
It is known from [14] that whenever Φ satisfies the blockRIP with δ _{2k } < 1, there is a unique blocksparse signal x which can be recovered by solving the following problem:
Unfortunately, the problem (6) is an NPhard problem and finding the optimal solution of (6) has exponential complexity. In principle, one only can solve the problem exactly by searching over all possible sets of k blocks whether there exists a vector consistent with the measurements. Obviously, this approach is unable to deal with highdimensional signals.
One natural idea to find x more efficiently is to employ a convex relaxation technique, namely, to replace the l _{2}/l _{0} norm by its closest convex surrogate l _{2}/l _{1} norm, thus resulting in the following model:
where \parallel \mathbf{x}{\parallel}_{2,1}=\sum _{i=1}^{m}\parallel \mathbf{x}\left[i\right]{\parallel}_{2}. This model can be treated as a secondorder cone program (SOCP) problem and many standard software packages can be used for the solutions very efficiently. In many practical cases, the measurements y are corrupted by bounded noise, then we can apply the modified SOCP or the group version of basis pursuit denoising [30] program as the following:
where λ is a tuning parameter, which controls the tolerance of the noise term. There are also many methods to solve this optimization problem efficiently, such as the blockcoordinate descent technique [31] and the Landweber iterations technique [32].
As mentioned before, recent studies on nonconvex CS have indicated that one can reduce the number of required linear measurements for successful recovery of a general sparse signal by replacing the l _{0} norm by a nonconvex surrogate l _{ q }(0 < q < 1) quasinorm, which motivates us to generalize the better recovery capability of the nonconvex CS to the blocksparse setting. Therefore, we suggest the use of the following nonconvex optimization model for recovery of blocksparse signals, that is,
where ϵ ≥ 0 controls the noise error term ( ϵ = 0 means noiseless case) and \parallel \mathbf{x}{\parallel}_{2,q}={(\sum _{i=1}^{m}\parallel \mathbf{x}[i\left]{\parallel}_{2}^{q}\right)}^{1/q} is a generalization of standard l _{ q } quasinorm for 0 < q < 1. We will show that this new nonconvex recovery approach can achieve better blocksparse recovery performance both practically and theoretically when compared with the commonly used convex l _{2}/l _{1} minimization approach. In the following section, we will provide some sufficient conditions for exact and stable recovery of blocksparse signals through the mixed l _{2}/l _{ q }(0 < q < 1) norm minimization, and further develop a similar IRLS algorithm as in [28, 33] for solutions of such nonconvex optimization problem.
5 Sufficient blocksparse recovery conditions
In this section, we first consider the recovery problem of a highdimensional signal \mathbf{x}\in {\mathbb{R}}^{N} in the noiseless setting. Thus, we propose the constrained mixed l _{2}/l _{ q } norm minimization with 0 < q < 1:
where \mathbf{y}\in {\mathbb{R}}^{M} are available measurements, Φ is a known M × N measurement matrix.
To state our main results, we need more notations. We first consider the case where x is exactly block ksparse. We use Null( Φ) to denote the null space of Φ and
T to denote the block index set of nonzero blocks of the signal x. Let x ^{∗} be a solution of the minimization problem (10). From [15], it is known that x ^{∗} is the unique sparse solution of (10) being equal to x if and only if
for all nonzero vector h in the null space of Φ. This is called the null space property (NSP). In order to characterize more accurately the NSP, one can consider the following equivalent form: There exists a smallest constant ρ satisfying 0 < ρ < 1 such that
When x is not exactly block sparse, Aldroubi et al. [34] also showed that NSP actually guarantees stability. Precisely, if we use T to denote the block index set over the k blocks with largest l _{2} norm of x, then the NSP (11) gives
here C is a constant. Indeed, from (11), it is easy to see that the following equality holds
which is denoted by ρ. In general, for h = x ^{∗}x, we let
Therefore, Our main point of the study is to show how to make γ(h,q) < 1 for all nonzero vector h in the null space of Φ. Our first conclusion is the following theorem.
Theorem 1.
(Noiseless recovery). Let y = Φ x be measurements of a signal x. If the matrix Φ satisfies the block RIP (5) with
then there exists a number q _{0}(δ _{2k }) ∈ (0,1] such that for any q < q _{0}, the solution x ^{ ∗ } o the mixed l _{2}/l _{ q } problem (10) obeys to
where C _{0}(q,δ _{2k }) and C _{1}(q,δ _{2k }) are positive constants dependent on q and δ _{2k }, T _{0} is the block index set over the k blocks with largest l _{2} norm of the original signal x. In particular, if x is block ksparse, the recovery is exact.
Remark 1.
Theorem 1 provides a sufficient condition for the recovery of a signal x via l _{ 2 } / l _{ q } minimization with 0 < q < 1 in the noiseless setting. Focusing on the case where x is block ksparse, it is known [14] that when the l _{2}/l _{1} minimization scheme (7) is employed to recover x, the sufficient condition on the blockRIP is that δ _{2k } < 0.414. As compared, Theorem 1 says that, whenever the nonconvex l _{2}/l _{ q } minimization scheme (10) is used, this constant can be relaxed to δ _{2k } < 0.5 for some q < 1. This shows that similar as the standard sparse signal recovery, when compared with the convex minimization method, the nonconvex minimization method can enhance performance of blocksparse signal recovery.
To prove Theorem 1, we need the following Lemmas:
Lemma 1.
([14]).
for all x,x ^{′} supported on disjoint subsets T,T ^{′} ⊆ {1,2,…,N} with T < k and T ^{′} < k ^{′}.
Lemma 2.
([35]). For any fixed q ∈ (0,1) and \mathbf{x}\in {\mathbb{R}}^{N},
Proof of Theorem 1.
Set x ^{∗} = x + h be a solution of (10), where x is the original signal we need to reconstruct. Throughout the article, x _{ T } will denote the vector equal to x on an index set T and zero elsewhere. Let T _{0} be the block index set over the k blocks with largest l _{2} norm of x. And we decompose h into a series of vectors {\mathbf{h}}_{{T}_{0}},{\mathbf{h}}_{{T}_{1}},{\mathbf{h}}_{{T}_{2}},\dots ,{\mathbf{h}}_{{T}_{J}}, such that
Here {\mathbf{h}}_{{T}_{i}} is the restriction of h to the set T _{ i } and each T _{ i } consists of k blocks (except possibly T _{ J }). Rearranging the block indices such that \parallel {\mathbf{h}}_{{T}_{j}}\left[1\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{j}}\left[2\right]{\parallel}_{2}\ge \cdots \ge \parallel {\mathbf{h}}_{{T}_{j}}\left[k\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{j+1}}\left[1\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{j+1}}\left[2\right]{\parallel}_{2}\ge \cdots \phantom{\rule{0.3em}{0ex}}, for any j ≥ 1.
Note that
For any j ≥ 2, if we let {\mathbf{d}}_{j}=(\parallel {\mathbf{h}}_{{T}_{j}}[1]{\parallel}_{2},\parallel {\mathbf{h}}_{{T}_{j}}[2]{\parallel}_{2},\dots , \parallel {\mathbf{h}}_{{T}_{j}}\left[k\right]{\parallel}_{2}), then we have
From Lemma 2, it follows that
that is,
From Equation (18), we obtain
where we have used the fact that (a + b)^{q} ≤ a ^{q} + b ^{q} for nonnegative a and b. Therefore, we have
On the other hand, let
Since \parallel {\mathbf{h}}_{{T}_{2}}\left[1\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{2}}\left[2\right]{\parallel}_{2}\ge \cdots \ge \parallel {\mathbf{h}}_{{T}_{2}}\left[k\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{3}}\left[1\right]{\parallel}_{2}\ge \parallel {\mathbf{h}}_{{T}_{3}}\left[2\right]{\parallel}_{2}\ge \cdots \phantom{\rule{0.3em}{0ex}}, it is easy to see that
By the definition of the blockRIP, Lemma 1, (20) and (22), it then implies that
Since Φ x = Φ x ^{∗}, we have Φ h = 0, thus \Phi ({\mathbf{h}}_{{T}_{0}}+{\mathbf{h}}_{{T}_{1}})= \Phi \left(\sum _{j\ge 2}{\mathbf{h}}_{{T}_{j}}\right). Therefore,
By the definition of δ _{2k } and using H\xf6lder’s equality, we then further have
Through a straightforward calculation, it is easy to get that the maximum of f(t) occurs at {t}_{0}=\frac{1q/2}{2{\delta}_{2k}} and
If f(t _{0}) < 1, then we have γ(h,q) < 1. However, f(t _{0}) < 1 amounts to
or, equivalently,
Since the second term on the lefthand side of (26) goes to zero as q → 0_{+} whenever q ≤ 1, δ _{2k } < 1, and
We thus obtain that for δ _{2k } < 1/2, there exists a value q _{0} = q _{0}(δ _{2k }) ∈ (0,1] such that for all q ∈ (0,q _{0}) the above inequality (26) is true.
From the definition of γ(h,q), we have
As ∥x ^{∗}∥_{ q } = ∥x + h∥_{ q } is the minimum, using the equation {\mathbf{x}}^{\ast}=\mathbf{x}+{\mathbf{h}}_{{T}_{0}}+{\mathbf{h}}_{{T}_{0}^{c}}, we get
Since \parallel \mathbf{x}{\parallel}_{2,q}^{q}=\parallel {\mathbf{x}}_{{T}_{0}}{\parallel}_{2,q}^{q}+\parallel {\mathbf{x}}_{{T}_{0}^{c}}{\parallel}_{2,q}^{q}, this then implies
That is,
Thus, we have
This justifies that the first equality of (13) is proved. In the following, we further prove the second equality of (13).
In effect, from (24) and (25), we have
which, together with (22) and (30), then implies
where we have used the fact that
By (28), we further have
Thus, it follows from (31) and (32) that
That is, the second inequality in (13) holds. With this, the proof of Theorem 1 is completed.
We further consider the recovery problem of any highdimensional signals in the presence of noise. In this situation, the measurement can be expressed as
where \mathbf{\text{z}}\in {\mathbb{R}}^{N} is an unknown noise term. In order to reconstruct x, we adopt the unconstraint mixed l _{2}/l _{ q } minimization scheme with 0 < q < 1:
where ϵ > 0 is a bound on the noisy level. We below show that one can also recover x stably and robustly under the same assumption as those in Theorem 1.
Theorem 2.
(Noisy recovery). Let y = Φ x+z( ∥z∥_{2} ≤ ϵ) be noisy measurements of a signal x. If the matrix Φ satisfies the block RIP (5) with
then there exists a number q _{0}(δ _{2k }) ∈ (0,1] such that for any q < q _{0}, the mixed l _{2}/l _{ q } method (34) can stably and robustly recover the original signal x. More precisely, the solution x ^{∗} of (34) obeys to
where C _{1}(q,δ _{2k }) and C _{2}(q,δ _{2k }) are positive constants dependent on q and δ _{2k }, T _{0} is the block index set over the k blocks with largest l _{2} norm of the original signal x.
Remark 2.
The inequality (35) in Theorem 2 offers an upper bound estimation on the recovery of the mixed l _{2}/l _{ q }minimization ( q ∈ (0,q _{0})). In particular, this estimation shows that the recovery accuracy of the mixed l _{2}/l _{ q } minimization can be controlled by the degree of sparsity of the signal and the exponential number q. It reveals also the close connections among the recovery precision of the mixed l _{2}/l _{ q } minimization method may achieve, the sparsity of the signal and the index q used in the recovery procedure. In addition, the estimation (35) shows that the recovery precision of the method (34) can be bounded by the noise level. In this sense, Theorem 2 shows that under certain conditions, a block ksparse signal can be robustly recovered by the method (34).
Proof of Theorem 2.
The proof of Theorem 2 is similar to the procedure of the proof of Theorem 1 with minor differences.
To be more detail, we also set x ^{∗} = x + h. Due to the existence of noise, in this case, h is not necessarily in the null space of Φ any more. But we can still prove Theorem 2 under the same assumption.
We still denote by T _{0} the block index set over the k blocks with largest l _{2} norm of x, and {\mathbf{h}}_{{T}_{0}} the restriction of h onto these blocks. We also denote {\mathbf{h}}_{{T}_{j}}(j\ge 1) similar to the proof of Theorem 1. By ∥z∥_{2}≤ ϵ and the triangle inequality, we first have
Since \Phi ({\mathbf{h}}_{{T}_{0}}+{\mathbf{h}}_{{T}_{1}})=\Phi \left(\mathbf{h}\right)\Phi \left(\sum _{j\ge 2}{\mathbf{h}}_{{T}_{j}}\right), from definition of the blockRIP, we get
Hence,
On the other hand,
From (38) and (39), we thus have
Let \parallel {\mathbf{h}}_{{T}_{1}}{\parallel}_{2,q}^{q}=t\sum _{i\ge 1}\parallel {\mathbf{h}}_{{T}_{i}}{\parallel}_{2,q}^{q},t\in [0,1] and we also have (23), thus
If we denote
then, by a easy calculation, we can easily obtain that the maximum of g(t) occurs at {t}_{1}=\frac{2q}{2} and
Therefore, we have
By (23) and the condition of f(t) in the proof of Theorem 1, we also have
Plugging (42) and (43) into (40), it is easy to see that
Consequently, we obtain
In the following, we further prove that f(t _{0}) < 1 one can obtain the conclusion of Theorem 2. Precisely, under the same condition on δ _{2k }, we can prove Theorem 2.
Note that from (27), we also have
Plugging (44) into the above inequality and by f(t _{0}) < 1, one can show that
Since ∥v∥_{ q } ≤ 2^{1/q1}∥v∥_{1} for \mathbf{v}\in {\mathbb{R}}^{2}, we further have
and by (37), (22), and (23), we also have
since \sqrt{{(a+\sqrt{b})}^{2}+c}\le a+\sqrt{b+c}(a,b,c>0), it gives
where we have used the fact that
Thus, it then follows from (45) and (47) that
This arrives to the conclusion of Theorem 2.
6 An IRLS algorithm
Inspired by the ideas of [28, 33], in this section, we present an efficient IRLS algorithm for the solution of the mixed l _{2}/l _{ q } norm minimization problem (34). We first rewrite the problem as the following regularized version of the unconstrained smoothed l _{2}/l _{ q } minimization:
where τ is an regularization parameter and
Let J _{2,q }(x,ϵ,τ) be the objective function associated with (49), that is,
Then, the firstorder necessary optimality condition for the solution of x is
Hence, we define the diagonal weighting matrix W as W _{ i } = d i a g(q ^{1/2}(ϵ ^{2} + ∥ x[i] ∥ 22)^{q/41/2}) for i th block, and after simple calculations, we can obtain the following necessary optimality condition:
Due to the nonlinearity, there is no straightforward method to solve the above system of equations. But if we fix W = W ^{(t)} to be that determined already in the t th iteration step, the solution of (51), set as the (t + 1)th iterate, then can be found to be
This defines a nature iterative method for solution of (49). We formalize such reweighted algorithm as the following:
Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem

Step 1. Initialize {\mathbf{x}}^{\left(0\right)}=arg\underset{\mathbf{x}}{\text{min}}\parallel \mathbf{y}\Phi \mathbf{x}\underset{2}{\overset{2}{\parallel}} and \widehat{k} be the estimated blocksparsity. Set ϵ _{0} = 1,t = 0.

Step 2. while loop do
\begin{array}{ll}{W}^{\left(t\right)}& \leftarrow \text{diag}\left({q}^{1/2}{({\u03f5}_{t}^{2}+\parallel {\mathbf{x}}^{\left(t\right)}[i\left]{\parallel}_{2}^{2}\right)}^{q/41/2}\right),\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{2em}{0ex}}i=1,\dots ,m\phantom{\rule{2em}{0ex}}\\ {A}^{\left(t\right)}& \leftarrow \Phi {\left({W}^{\left(t\right)}\right)}^{1}\phantom{\rule{2em}{0ex}}\\ {\mathbf{x}}^{(t+1)}& \leftarrow {\left({W}^{\left(t\right)}\right)}^{1}{\left({\left({A}^{\left(t\right)}\right)}^{T}\right({A}^{\left(t\right)})+\tau \mathbf{I})}^{1}{\left({A}^{\left(t\right)}\right)}^{T}\mathbf{y}\phantom{\rule{2em}{0ex}}\\ {\u03f5}_{t+1}& \leftarrow \text{min}\{{\u03f5}_{t},\mathrm{\alpha r}{\left({\mathbf{x}}^{(t+1)}\right)}_{\widehat{k}+1}/N\}\phantom{\rule{2em}{0ex}}\\ t& \leftarrow t+1\phantom{\rule{2em}{0ex}}\end{array} 
until ϵ _{ t + 1} = 0 otherwise repeat the above computation within a reasonable time

Step 3. Output x ^{(t + 1)} to be an approximation solution.
In Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem, r{\left(\mathbf{x}\right)}_{\widehat{k}+1} is the (\widehat{k}+1)th largest l _{2} norm value of the block of x in the decreasing order, α ∈ (0,1) is a number such that α r(x ^{(1)})/N < 1, and τ is an appropriately chosen parameter which controls the tolerance of noise term. Although the best τ may change continuously with respect to noise level, we used the fixed value of τ in our numerical implementations.
Obviously, from the update equation (52), we can see that x ^{(t)} can be understood as the minimizer of
Due to the iteratively fixing feature of W ^{(t)}, this gives the name IRLS. Majumdar and Ward [12] first adopted the IRLS methodology to solve a nonconvex l _{2}/l _{ q } blocksparse optimization problem. But their algorithm is unsuitable for noisy blocksparse recovery. Lai et al. [28] proposed an IRLS algorithm for the unstrained l _{ q }(0 < q ≤ 1) minimization problem and they made an detailed analysis which includes convergence, local convergence rate and error bound of the algorithm. To some extent, our proposed algorithms can be considered as a generalization of their algorithm to the setting of blocksparse recovery.
Note that {ϵ _{ t }} in Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem is a bounded nonincreasing sequence, which must converge to some ϵ _{∗} ≥ 0. Then using a similar argument as that in [28], we can prove that x ^{(t)} must have a convergent subsequence and the limit of the subsequence is a critical point of (50) whenever ϵ _{∗} > 0. In addition, there exists a convergent subsequence whose limit x _{∗} is a sparse vector with blocksparsity \parallel {\mathbf{x}}_{\ast}{\parallel}_{2,0}\le \widehat{k} when ϵ _{∗} = 0. Furthermore, we can also verify the superlinear local convergence rate for the proposed Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem. Due to space limitations, we leave the detailed analysis to the interested reader.
7 Numerical experiments
In this section, we conduct two numerical experiments to compare the nonconvex l _{2}/l _{ q }(0 < q < 1) minimization method with the l _{2}/l _{1} minimization method and the standard l _{ q }(0 < q ≤ 1) method in the context of blocksparse signal recovery. Note that this is possible because Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem can apply to the standard l _{ q }(0 < q ≤ 1) minimization method too. For all compared methods, we use the same starting point x ^{(0)} = arg minx \parallel \mathbf{y}\Phi \mathbf{x}{\parallel}_{2}^{2}.
In our experiments, the measurement matrix Φ was generated by creating an M × N matrix with i.i.d draws from a standard Gaussian distribution N (0,1). We considered four different values of q = 0.1,0.5,0.7,1 for both the l _{2}/l _{ q } minimization method and the l _{ q } minimization method. The purpose of experiments was to compare the recovery performance of the mixed l _{2}/l _{ q } method and the l _{ q } method for blocksparse signals without noise and with noise respectively.
7.1 Noiseless recovery
In this set of experiments, we considered the case that the signals were perfectly measured without noise. We first randomly generated the blocksparse signal x with values chosen from a Gaussian distribution of mean 0 and standard deviation 1 and then randomly drew a measurement matrix Φ from Gaussian ensemble. Then we observed the measurements y from the model y = Φ x. In all the experiment cases, if ϵ _{ t+1} < 10^{7} or ∥x ^{(t+1)}x ^{(t)} ∥ _{2} < 10^{8}, the iteration terminates and outputs x ^{(t+1)} as an approximation solution of original signal x; otherwise, we let the algorithms run to the maximum number of iterations m a x = 2000. We also set parameters τ and α to 10^{5} and 0.7. We tested Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l _{ 2 } /l _{ q } (0 < q < ≤ 1) minimization problem for different initial blocksparsity estimates and did find that any overestimated \widehat{k} of k would yield similar results. A typical simulation result is shown in Figure 1. Therefore, for simplicity, we set \widehat{k}=k+1 in our implementation.
Figure 2a depicts an instance of the generated blocksparse signal with signal length N = 512. There are 128 blocks with uneven block size and 16 active blocks with the sparsity: k _{0} = ∥x ∥_{0} = 101. Figure 2b–d shows that the recovery results by the standard l _{ q } method with q = 1, the standard l _{ q } method with q = 0.5, and the mixed l _{2}/l _{1} method, respectively, when M = 225. Since the sample size M is only around 2.2 times the signal sparsity k _{0}, the standard l _{1} method does not yield good recovery results, whereas the mixed l _{2}/l _{1} method and the nonconvex l _{ q }(q = 0.5) method achieve near perfect recovery of the original signal. The results illustrate that if one incorporates the blocksparsity structure into the recovery procedure, the block version of convex l _{1} minimization does also reduce the number of measurement as the standard nonconvex l _{ q } minimization with some q < 1.
We further compared the recovery performance of the standard l _{ q } method and the mixed l _{2}/l _{ q } method for different values of q. Figure 3 shows a similar instance as Figure 2. We generated a blocksparse signal with the same nonzero block locations as in Figure 2a, and we observed M = 144 measurements that is only around 1.4 times the signal sparsity k _{0}. From Figure 3, we can see that only the nonconvex l _{2}/l _{ q } method with q = 0.5,0.1 achieve near optimal recovery results while other methods fail. The results illustrate that for any q ≤ 0.5, the mixed l _{2}/l _{ q } method can exactly recover the original signal. In addition, the results also demonstrate the outperformance of the nonconvex mixed l _{2}/l _{ q }(0 < q < 1) method over the standard nonconvex l _{ q }(0 < q < 1) method.
Figure 4a shows the effect of sample size, where we report the average root mean squares error (RMSE) over 100 independent random trails in the logarithmic scale for each sample size. In this case, we set signal length N = 256, and there are 64 blocks with uneven block size and the k = 4 active blocks were randomly extracted from the 64 blocks. The figure indicates the decay in recovery error as a function of sample size for all the algorithms. We can observe that both the l _{ q } and the mixed l _{2}/l _{ q } methods improve the recovery performance as q decreases, and for a fixed q, the mixed l _{2}/l _{ q } method is clearly superior to the standard l _{ q } method in this blocksparse setting. To further study the effect of the active block number k (with k _{0} fixed), we drew a matrix Φ of size 128×256 from Gaussian ensemble. We also set the signal x with the even block size and the total sparsity k _{0} = 64. The block size was changed while keeping other parameters unchanged. Figure 4b shows the average RMSE over 100 independent random runs in the logarithmic scale. One can easily see that the recovery performance for the standard l _{ q } method is independent to the active block number, while the recovery errors for the mixed l _{2}/l _{ q } method are significantly better when the active block number k is far smaller than the total signal sparsity k _{0}. As expected, the performance of the mixed l _{2}/l _{ q } method becomes identical to the standard l _{ q } method when k = k _{0}. This illustrates that the mixed method favors large sized block when the total sparsity k _{0} is fixed. Moreover, similar to the standard l _{ q } method, the mixed l _{2}/l _{ q } method performs better and better as q decreases.
7.2 Noisy recovery
In this experiment, we considered the case of recovering the blocksparse signals in the presence of noise. We observed the measurements y from the model y = Φ x+z, here Φ and x were generated as the last subsection and z was zeromean Gaussian noise with standard deviation σ. In our implementation of this experiment, we set τ = 10^{1} maxΦ ^{T} y and kept other parameters unchanged.
Table 1 lists the comparison results of the relative errors of the true solutions and the approximate solutions yielded, respectively, by the mixed l _{2}/l _{ q } method and the l _{ q } method with active block number k varying in {4,12}, sample size r = M/k _{0} in {3,4}, and the noise level σ in {0.02,0.05,0.10}. Here, the relative errors are defined as ∥xx ^{∗}∥_{2}/∥x∥_{2}. It is reported in Table 1 that the average relative errors and the standard deviations over 100 random trails. From the table, it is seen that, for a fixed q, the mixed l _{2}/l _{ q } method always get better results than the standard l _{ q } method. And in the lownoisy cases (say, σ = 0.02,0.05), as q decreases, the mixed l _{2}/l _{ q } method improves the recovery performance. However, when σ=0.10, the mixed l _{2}/l _{ q } method is not always able to improve the recovery results when q ≤ 0.7. Thus, we may reasonably infer that there exits a q _{0} ≤ 0.1 such that for any q < q _{0}, all the l _{2}/l _{ q } minimization can obtain similar recovery results when the noise level is low ( σ = 0.02,0.05); while there exits a q _{0} ≤ 0.7 such that as q < q _{0} decreases, the mixed l _{2}/l _{ q } method is unable to improve the recovery results when σ = 0.10.
8 Conclusion
In this article, we have investigated the blocksparse recovery performance of the mixed l _{2}/l _{ q } minimization approach, especially for the nonconvex case of 0 < q < 1. Under the assumption that the measurement matrix Φ has the RIP with δ _{2k } < 1/2, we have proved that the nonconvex l _{2}/l _{ q }(0 < q < 1) method can exactly and stably recover original blocksparse signals in noiseless case and noisy case, respectively. The sufficient recovery condition we obtained is weaker than those of l _{2}/l _{1} method ( δ _{2k } < 0.414), which implies the better blocksparse recovery ability of the mixed l _{2}/l _{ q }(0 < q < 1) method. We have conducted a series of numerical experiments to support the correctness of the theory and the outperformance of the mixed method.
Our study so far is only concerned with the blocksparse signal recovery without overlapping blocks. While in many real applications, such as the gene expression data in bioinformatics, the blocks of elements could potentially be overlapped. Rao et al. [36] derived some tight bounds for the number of measurements required for exact and stable recovery of blocksparse signals with overlapping blocks by the mixed l _{2}/l _{1} minimization method. Their analysis can naturally be extended to the nonconvex l _{2}/l _{ q }(0 < q < 1) method. All these extensions will be part of our future research.
Although our simulation studies in this article demonstrate clearly that the recovery ability of the mixed l _{2}/l _{ q } method would be better and better as q decreases, there is still lack of further theoretical analysis to support for such observation. The works of [37, 38] addressed the blocksparse recovery ability by an accurate analysis of the breakdown behavior of the mixed l _{2}/l _{1} method. One could then have an interest in extending these results to the case of the mixed l _{2}/l _{ q }(0 < q < 1) method as well. It is noted that since the resultant mixed minimization problem is nonconvex, it seems very difficult to make such similar theoretical analysis and perhaps we need more powerful tools of geometric functional analysis for the extension.
Abbreviations
 CS:

Compressed sensing
 IHT:

Iterative hard thresholding
 IRLS:

Iteratively Reweighted LeastSquares
 NSP:

Null space property
 OMP:

Orthogonal matching pursuit.
References
Shannon C: Communication in the presence of noise. Proc. IRE 1949, 37: 1021.
Jerri A: The Shannon sampling theorem—its various extensions and applications: a tutorial review. Proc. IEEE 1977, 65(11):15651596.
Donoho D: Compressed sensing. IEEE Trans. Inf. Theory 2006, 52(4):12891306.
Candes E, Romberg J, Tao T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52(2):489509.
Candes E, Tao T: Nearoptimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inf. Theory 2006, 52(12):54065425.
Candes E, Tao T: Decoding by linear programming. IEEE Trans. Inf. Theory 2005, 51(12):42034215. 10.1109/TIT.2005.858979
Parvaresh F, Vikalo H, Misra S, Hassibi B: Recovering sparse signals using sparse measurement matrices in compressed DNA microarrays. IEEE J. Sel. Topics Signal Process 2008, 2(3):275285.
Erickson S, Sabatti C: Empirical Bayes estimation of a sparse vector of gene expression changes. Stat. Appl. Genetics Mol. Biol 2005, 4: 22.