Skip to main content

On recovery of block-sparse signals via mixed l 2 /l q (0 < q ≤ 1) norm minimization

Abstract

Compressed sensing (CS) states that a sparse signal can exactly be recovered from very few linear measurements. While in many applications, real-world signals also exhibit additional structures aside from standard sparsity. The typical example is the so-called block-sparse signals whose non-zero coefficients occur in a few blocks. In this article, we investigate the mixed l 2/l q (0 < q ≤ 1) norm minimization method for the exact and robust recovery of such block-sparse signals. We mainly show that the non-convex l 2/l q (0 < q < 1) minimization method has stronger sparsity promoting ability than the commonly used l 2/l 1 minimization method both practically and theoretically. In terms of a block variant of the restricted isometry property of measurement matrix, we present weaker sufficient conditions for exact and robust block-sparse signal recovery than those known for l 2/l 1 minimization. We also propose an efficient Iteratively Reweighted Least-Squares (IRLS) algorithm for the induced non-convex optimization problem. The obtained weaker conditions and the proposed IRLS algorithm are tested and compared with the mixed l 2/l 1 minimization method and the standard l q  minimization method on a series of noiseless and noisy block-sparse signals. All the comparisons demonstrate the outperformance of the mixed l 2/l q (0 < q < 1) method for block-sparse signal recovery applications, and meaningfulness in the development of new CS technology.

1 Introduction

According to the Shannon/Nyquist sampling theorem [1, 2], if we would like to avoid lose of information when capturing a signal, we must sample the signal at the so-called Nyquist rate, which means twice the highest frequency of the signal. Since the theorem only exploits the bandlimitedness of a signal and most real-world signals are sparse or compressible, the process of massive data acquisition based on Shannon/Nyquist sampling theorem usually samples too many useless information and eventually we have to compress to store or encode a very few essential information of the signal. Obviously, this process is extremely wasteful and therefore, a more effective sampling way to directly acquire the essential information of a signal has been expected.

Compressed sensing (CS) [35] was motivated by this purpose and it can completely acquire the essential information of a signal by exploiting its compressibility. In a word, the main contribution of CS is that it presents a new efficient scheme to capture and recover the compressible or sparse signals at a reduced sampling rate far below the Nyquist rate. The basic principle of CS is that it first employs non-adaptive linear projections to preserve the structure of the signal; then one can exactly recover these signals from a surprisingly small number of random linear measurements through a nonlinear optimization procedure (such as l 1-minimization) if the measurement matrix satisfies some suitable sufficient conditions in terms of restricted isometry property (RIP, [6]). Consequently, CS implies that it is indeed possible to acquire the data in already compressed form. Nowadays, driven by a wide range of applications, CS and other related problems have attracted much interest in various communities, such as in signal processing, machine learning, and statistics.

Different from general sparse signals in conventional sense, some real-world signals may exhibit some additional structures, i.e., the non-zero coefficients appear in a few fixed blocks, we refer to these signals as block-sparse signals in this article. Such block-sparse signals arise in various application problems, say, DNA microarrays [7, 8], equalization of sparse communication channels [9], source localization [10], wideband spectrum sensing [11], and color imaging [12].

Using standard convex relaxation ( l 1-minimization) as conventional CS framework for recovering the block-sparse signal does not exploit the fact that the non-zero elements of the signal appear in consecutive positions. Therefore, one natural idea is considering the block version of l 1-minimization, i.e., the mixed l 2/l 1-minimization, to exploit the block-sparsity. Many previous works have shown that the mixed l 2/l 1-minimization is superior to standard l 1-minimization when dealing with such block-sparse signals [1315]. Huang and Zhang [13] developed a theory for the mixed l 2/l 1-minimization by using a concept called strong group sparsity and they demonstrated that the mixed norm minimization is very efficient for recovering strongly group-sparse signals. Stojnic et al. [15] obtained an optimal number of Gaussian measurements for uniquely recovering a block-sparse signal through the mixed l 2/l 1 norm minimization. By generalizing the conventional RIP notion to the block-sparse case, Eldar and Mishali [14] showed that if the measurement matrix D has the same restricted isometry constant as that in the l 1 case, then the mixed norm method is guaranteed to exactly recover any block-sparse signals in the noiseless case. Furthermore, they also showed that the block-sparse signal recovery would be robust in the noisy case under the same recovery condition. Another common approach to deal with the block-sparsity problem is by suitably extending the standard greedy methods, such as orthogonal matching pursuit, iterative hard thresholding (IHT), and compressive sampling matching pursuit (CoSaMP), to the block-sparse case. In [16], the CoSaMP algorithm and the IHT algorithm were extended to the model-based setting, treating the block-sparsity as a special case. It was also shown that the new recovery algorithms demonstrate provable performance guarantees and robustness properties. Eldar et al. [17] generalized the notion of coherence to the block-sparse setting and proved that a block-version of the orthogonal matching pursuit (BOMP) algorithm can exactly recover any block-sparse signal if the block-coherence is sufficiently small. In addition, the mixed l 2/l 1-minimization approach was certified to guarantee successful recovery with the same condition on block-coherence. Ben-Haim and Eldar [18] examined the ability of greedy algorithms to estimate a block-sparse signal from noisy measurements. They derived some near-oracle results for the block-sparse version of greedy pursuit algorithms both in the adversarial noise and the Gaussian noise cases. Majumdar and Ward [19] used BOMP to deal with the block-sparse representation-based classification problem. The validity and robustness of these new methods were theoretically proved.

In recent years, several studies [2027] have showed that the non-convex l q (0 < q < 1) minimization allows the exact recovery of sparse signals from fewer linear measurements than that by l 1-minimization. Chartrand and Staneva [21] provided a weaker condition to guarantee perfect recovery for the non-convex l q (0 < q < 1) minimization method using a l q  variant of the RIP. They obtained the number of random Gaussian measurements necessary for the successful recovery of sparse signals via l q  (0 < q < 1) minimization with high probability. Sun [27] used the conventional RIP as in the l 1 case to prove that whenever q is chosen to be about 0.6796 × (1 - δ 2k ), every k-sparse signal can exactly be recovered via l q minimization, here δ 2k  is the restricted isometry constant for the measurement matrix. Xu et al. [2426] considered a specially important case ( q = 1/2) of l q minimization. They developed a thresholding representation theory for l 1/2 minimization and conducted a phase diagram study to demonstrate the merits of l 1/2 minimization.

This article presents an ongoing effort to extend the non-convex l q (0 < q < 1) minimization methodology to the setting of block-sparsity. Specifically, we will study the performance of the block-sparse signal recovery via the mixed l 2/l q  (0 < q < 1) norm minimization by means of the block RIP (block-RIP). We first exhibit that under similar RIP conditions with that in the standard l q  case, the mixed l 2/l q  recovery method can assuredly to recover any block-sparse signal, irrespective to the locations of non-zero blocks. In addition, the method is robust in the presence of noise. Our formulated recovery conditions will show that the non-convex l 2/l q (0 < q < 1) minimization is superior to the convex l 2/l 1 minimization within block-RIP framework. Furthermore, we will compare the sparse signal recovery ability of the non-convex l 2/l q (0 < q < 1) method to the convex l 2/l 1 method and the standard l q  method by conducting a series of simulation studies. To the best of the authors’ knowledge, although Majumdar and Ward [12] first proposed the non-convex l 2/l q (0 < q < 1) method in CS literature for color imaging and showed that the l 2/l 0.4 minimization has the best performance on some imaging experiments, their works were experimental in feature and lack of convincing theoretical assessment. As compared, our work not only highlights theoretical merits of the non-convex block optimization method, but also makes a more intensive study on the block-sparse signal recovery capabilities for some different values of q via numerical experiments.

We begin with our study in Section 2,3 and 4 by presenting the problem setting. In Section 5, we establish the sufficient conditions for the mixed l 2/l q (0 < q < 1) optimization approach to guarantee exact and robust recovery of block-sparse signals in terms of block-RIP. We also develop an efficient Iteratively Reweighted Least-Squares (IRLS) algorithm to recover block-sparse signals from given fewer measurements, which generalizes the algorithm of [28] to the unconstrained l 2/l q (0 < q ≤ 1) norm minimization case in Section 6. In Section 7, we show that the non-convex l 2/l q (0 < q < 1) method has stronger block-sparsity promoting ability than the convex l 2/l 1 method and the standard l q  method through a series of simulations. Finally, we conclude the article in Section 8 with some useful remarks.

2 Block-sparsity

The conventional CS only consider the sparsity that the signal x has at most k non-zero elements, and it does not take into account any further structure. However, in many practical scenarios, the non-zero elements are aligned to blocks, meaning that they appear in regions. These signals are referred to the block-sparse signals. Mathematically speaking, a block-sparse signal x R N over block index set I={ d 1 ,, d m } can be modeled as follows:

x= [ x 1 x d 1 x [ 1 ] x d 1 + 1 x d 1 + d 2 x [ 2 ] x N - d m + 1 x N x [ m ] ] T .
(1)

Here, x[i] denotes the i th block of x and d i  is the block size for the i th block. The block-sparsity we consider in this article means that there are at most k < m non-zero blocks. Obviously, if d 1 =  = d m  = 1, the block-sparse signals degenerate to the conventional sparse signals well studied in CS.

Definition 1.

 ([14]). A block k-sparse signal over index set I={ d 1 ,, d m } is a signal of the form (1) in which x[i] is non-zero for at most k indices i, i {1,2,…,m}.

The main focus of this study is to recover a block-sparse signal x from random linear measurement y = Φ x (noiseless case) or y = Φ x + z(noisy case). Here, y R M is a vector, Φ R M × N is a measurement matrix, whose entries are usually randomly drawn from a Gaussian or a Bernoulli distribution, and z is an unknown bounded noise. We represent Φ as a concatenation of column-blocks Φ[i] of size M × d i , that is,

Φ=[ ϕ 1 ϕ d 1 Φ [ 1 ] ϕ d 1 + 1 ϕ d 1 + d 2 Φ [ 2 ] ϕ N - d m + 1 ϕ N Φ [ m ] ].
(2)

Then we are interested in formulating sufficient conditions on the measurement matrix Φ under which a block-sparse signal x can assuredly be and stably recovered from its fewer noiseless measurements y= i = 1 m Φ[i]x[i] or noisy measurements y= i = 1 m Φ[i]x[i]+z. Denote

x 2 , 0 = i = 1 m I(x[i] 2 >0),
(3)

where I(x[i]2 > 0) is an indicator function, we then notice that a block k-sparse signal x can be defined as a vector that satisfies x2,0 ≤ k. In the remainder of the article, we will restrict our attention to how and in what conditions these block-sparse signals can be recovered exactly and stably in noiseless and noisy scenarios respectively.

3 Block-RIP

Candes and Tao [6] first introduced the notion of RIP of a matrix to characterize the condition under which the sparsest solution of an underdetermined linear system exists and can be found. And then the RIP was used as a powerful tool to study CS in several previous works [4, 5, 21, 29]. Let Φ be a matrix of size M × N, where M < N, we say that matrix Φ vsatisfies RIP of order k if there exists a constant δ k  [0,1) such that for every x R N (x 0 k),

(1- δ k )x 2 2 Φx 2 2 (1+ δ k )x 2 2
(4)

Obviously, δ k  quantifies how close to isometric the all M × k submatrices of Φ should be. Since the block-sparse signals exhibit additional structure, Eldar and Mishali [14] extended the standard RIP to the block-sparse setting and showed that the new block-RIP constant is typically smaller than the standard RIP constant. Now we state the new definition in block-sparse setting.

Definition 2.

 ([14]). Let Φ: R N R M be a M × N measurement matrix. Then Φ is said to have the block-RIP over I={ d 1 ,, d m } with constant δ k | I if for every vector xR N that is block k-sparse over I, it satisfies

(1- δ k | I )x 2 2 Φx 2 2 (1+ δ k | I )x 2 2 .
(5)

For convenience, in the remainder of the article, we still use δ k , instead of δ k | I , to represent the block-RIP constant whenever the confusion is not caused.

With the new notion, Eldar and Mishali [29] generalized the sufficient recovery conditions to the block-sparse signals both in noiseless and noisy settings. They showed that if Φ is taken random as conventional CS, it satisfies the block-RIP with overwhelming probability. All these results illustrated that one can recover a block-sparse signal exactly and stably via the convex mixed l 1/l 2 minimization method whenever the measurement matrix Φ is constructed from a random ensemble (i.e., Gaussian ensemble).

4 Non-convex recovery method

It is known from [14] that whenever Φ satisfies the block-RIP with δ 2k  < 1, there is a unique block-sparse signal x which can be recovered by solving the following problem:

min x x 2 , 0 s.t. y = Φ x .
(6)

Unfortunately, the problem (6) is an NP-hard problem and finding the optimal solution of (6) has exponential complexity. In principle, one only can solve the problem exactly by searching over all possible sets of k blocks whether there exists a vector consistent with the measurements. Obviously, this approach is unable to deal with high-dimensional signals.

One natural idea to find x more efficiently is to employ a convex relaxation technique, namely, to replace the l 2/l 0 norm by its closest convex surrogate l 2/l 1 norm, thus resulting in the following model:

min x x 2 , 1 s.t. y = Φ x ,
(7)

where x 2 , 1 = i = 1 m x[i] 2 . This model can be treated as a second-order cone program (SOCP) problem and many standard software packages can be used for the solutions very efficiently. In many practical cases, the measurements y are corrupted by bounded noise, then we can apply the modified SOCP or the group version of basis pursuit denoising [30] program as the following:

min x y-Φx 2 2 +λx 2 , 1 ,
(8)

where λ is a tuning parameter, which controls the tolerance of the noise term. There are also many methods to solve this optimization problem efficiently, such as the block-coordinate descent technique [31] and the Landweber iterations technique [32].

As mentioned before, recent studies on non-convex CS have indicated that one can reduce the number of required linear measurements for successful recovery of a general sparse signal by replacing the l 0 norm by a non-convex surrogate l q (0 < q < 1) quasi-norm, which motivates us to generalize the better recovery capability of the non-convex CS to the block-sparse setting. Therefore, we suggest the use of the following non-convex optimization model for recovery of block-sparse signals, that is,

min x x 2 , q q s.t. y - Φ x 2 2 ϵ ,
(9)

where ϵ ≥ 0 controls the noise error term ( ϵ = 0 means noiseless case) and x 2 , q = ( i = 1 m x [ i ] 2 q ) 1 / q is a generalization of standard l q  quasi-norm for 0 < q < 1. We will show that this new non-convex recovery approach can achieve better block-sparse recovery performance both practically and theoretically when compared with the commonly used convex l 2/l 1 minimization approach. In the following section, we will provide some sufficient conditions for exact and stable recovery of block-sparse signals through the mixed l 2/l q (0 < q < 1) norm minimization, and further develop a similar IRLS algorithm as in [28, 33] for solutions of such non-convex optimization problem.

5 Sufficient block-sparse recovery conditions

In this section, we first consider the recovery problem of a high-dimensional signal x R N in the noiseless setting. Thus, we propose the constrained mixed l 2/l q  norm minimization with 0 < q < 1:

min x x 2 , q q s.t. y = Φ x ,
(10)

where y R M are available measurements, Φ is a known M × N measurement matrix.

To state our main results, we need more notations. We first consider the case where x is exactly block k-sparse. We use Null( Φ) to denote the null space of Φ and

T to denote the block index set of non-zero blocks of the signal x. Let x  be a solution of the minimization problem (10). From [15], it is known that x  is the unique sparse solution of (10) being equal to x if and only if

h T 2 , q < h T c 2 , q

for all non-zero vector h in the null space of Φ. This is called the null space property (NSP). In order to characterize more accurately the NSP, one can consider the following equivalent form: There exists a smallest constant ρ satisfying 0 < ρ < 1 such that

h T 2 , q ρ h T c 2 , q .
(11)

When x is not exactly block sparse, Aldroubi et al. [34] also showed that NSP actually guarantees stability. Precisely, if we use T to denote the block index set over the k blocks with largest l 2 norm of x, then the NSP (11) gives

x- x 2 C x T c 2 , q k 1 / q - 1 / 2
(12)

here C is a constant. Indeed, from (11), it is easy to see that the following equality holds

sup h Null ( Φ ) , h 0 ( i T h [ i ] 2 q ) 1 / q ( i T c h [ i ] 2 q ) 1 / q = max h Null ( Φ ) , h 2 = 1 ( i T h [ i ] 2 q ) 1 / q ( i T c h [ i ] 2 q ) 1 / q

which is denoted by ρ. In general, for h = x -x, we let

h T 2 , q = γ ( h , q ) h T c 2 , q .

Therefore, Our main point of the study is to show how to make γ(h,q) < 1 for all non-zero vector h in the null space of Φ. Our first conclusion is the following theorem.

Theorem 1.

 (Noiseless recovery). Let y = Φx be measurements of a signal x. If the matrix Φ satisfies the block RIP (5) with

δ 2 k < 1 / 2 ,

then there exists a number q 0(δ 2k )  (0,1] such that for any q < q 0, the solution x  o the mixed l 2/l q  problem (10) obeys to

x - x 2 , q C 0 ( q , δ 2 k ) x - x T 0 2 , q x - x 2 C 1 ( q , δ 2 k ) k 1 / 2 - 1 / q x - x T 0 2 , q
(13)

where C 0(q,δ 2k ) and C 1(q,δ 2k ) are positive constants dependent on q and δ 2k , T 0 is the block index set over the k blocks with largest l 2 norm of the original signal x. In particular, if x is block k-sparse, the recovery is exact.

Remark 1.

 Theorem 1 provides a sufficient condition for the recovery of a signal x via l 2 / l q  minimization with 0 < q < 1 in the noiseless setting. Focusing on the case where x is block k-sparse, it is known [14] that when the l 2/l 1 minimization scheme (7) is employed to recover x, the sufficient condition on the block-RIP is that δ 2k  < 0.414. As compared, Theorem 1 says that, whenever the non-convex l 2/l q minimization scheme (10) is used, this constant can be relaxed to δ 2k  < 0.5 for some q < 1. This shows that similar as the standard sparse signal recovery, when compared with the convex minimization method, the non-convex minimization method can enhance performance of block-sparse signal recovery.

To prove Theorem 1, we need the following Lemmas:

Lemma 1.

 ([14]).

|Φx,Φ x | δ k + k x 2 x 2
(14)

for all x,x  supported on disjoint subsets T,T  {1,2,…,N} with |T| < k and |T | < k .

Lemma 2.

 ([35]). For any fixed q (0,1) and x R N ,

x 2 x q N 1 / q - 1 / 2 + N ( max 1 i N | x i |- min 1 i N | x i |).
(15)

Proof of Theorem 1.

 Set x  = x + h be a solution of (10), where x is the original signal we need to reconstruct. Throughout the article, x T  will denote the vector equal to x on an index set T and zero elsewhere. Let T 0 be the block index set over the k blocks with largest l 2 norm of x. And we decompose h into a series of vectors h T 0 , h T 1 , h T 2 ,, h T J , such that

h = i J h T i .

Here h T i is the restriction of h to the set T i  and each T i  consists of k blocks (except possibly T J ). Rearranging the block indices such that h T j [1] 2 h T j [2] 2 h T j [k] 2 h T j + 1 [1] 2 h T j + 1 [2] 2 , for any j ≥ 1.

Note that

h 2 = h T 0 T 1 + h ( T 0 T 1 ) c 2 h T 0 T 1 2 + h ( T 0 T 1 ) c 2 .
(16)

For any j ≥ 2, if we let d j =( h T j [1] 2 , h T j [2] 2 ,, h T j [k] 2 ), then we have

d j 2 = i = 1 k h T j [ i ] 2 2 1 / 2 = h T j 2 , d j q = i = 1 k h T j [ i ] 2 q 1 / q = h T j 2 , q , max 1 i k | d j ( i ) | = max 1 i k h T j [ i ] 2 = h T j , T , min 1 i k | d j ( i ) | = min 1 i k h T j [ i ] 2 .
(17)

From Lemma 2, it follows that

d j 2 d j q k 1 / q - 1 / 2 + k 1 / 2 ( max 1 i k | d j ( i ) | - min 1 i k | d j ( i ) | ) ,

that is,

h T j 2 h T j 2 , q k 1 / q - 1 / 2 + k 1 / 2 ( h T j , T - min 1 i k h T j [i] 2 ).
(18)

From Equation (18), we obtain

k 1 / q - 1 / 2 j 2 h T j 2 j 2 h T j 2 , q + k 1 / q ( h T j , T - min 1 i k h T j [ i ] 2 ) j 2 h T j 2 , q + k 1 / q h T 2 , T j 2 h T j 2 , q + k 1 / q h T 1 2 , q / k 1 / q = j 1 h T j 2 , q j 1 h T j 2 , q q 1 / q ,
(19)

where we have used the fact that (a + b)q ≤ a q + b q for non-negative a and b. Therefore, we have

j 2 h T j 2 1 k 1 / q - 1 / 2 j 1 h T j 2 , q q 1 / q .
(20)

On the other hand, let

h T 0 2 , q = γ ( h , q ) h T 0 c 2 , q , h T 1 2 , q q = t i 1 h T i 2 , q q , t [ 0 , 1 ] .
(21)

Since h T 2 [1] 2 h T 2 [2] 2 h T 2 [k] 2 h T 3 [1] 2 h T 3 [2] 2 , it is easy to see that

i 2 h T i 2 2 h T 2 [ 1 ] 2 2 - q j 2 h T j 2 , q q ( ( h T 1 [ 1 ] 2 q + h T 1 [ 2 ] 2 q + + h T 1 [ k ] 2 q ) / k ) ( 2 - q ) / q j 2 h T j 2 , q q = ( h T 1 2 , q q / k ) 2 - q q j 2 h T j 2 , q q = ( h T 1 2 , q q / k ) 2 - q q j 1 h T j 2 , q q - h T 1 2 , q q = ( h T 1 2 , q q / k ) 2 - q q 1 t h T 1 2 , q q - h T 1 2 , q q = 1 - t t k 2 - q q h T 1 2 , q 2 = 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q .
(22)

By the definition of the block-RIP, Lemma 1, (20) and (22), it then implies that

Φ ( j 2 h T j ) 2 2 = i , j 2 Φ ( h T i ) , Φ ( h T j ) = j 2 Φ ( h T j ) , Φ ( h T j ) + 2 i , j 2 , i < j Φ ( h T i ) , Φ ( h T j ) ( 1 + δ k ) j 2 h T j 2 2 + 2 δ 2 k j > i 2 h T i 2 h T j 2 j 2 h T j 2 2 + δ 2 k j 2 h T j 2 2 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q j 1 h T j 2 , q q 2 / q .
(23)

Since Φ x = Φ x , we have Φ h = 0, thus Φ( h T 0 + h T 1 )=- Φ j 2 h T j . Therefore,

Φ ( h T 0 + h T 1 ) 2 2 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q × j 1 h T j 2 , q q 2 / q ,
(24)

By the definition of δ 2k  and using Hölder’s equality, we then further have

Φ ( h T 0 + h T 1 ) 2 2 ( 1 - δ 2 k ) h T 0 + h T 1 2 2 = ( 1 - δ 2 k ) ( h T 0 2 2 + h T 1 2 2 ) ( 1 - δ 2 k ) ( k 1 - 2 / q h T 0 2 , q 2 + k 1 - 2 / q h T 1 2 , q 2 ) = ( 1 - δ 2 k ) k 1 - 2 / q × γ ( h , q ) 2 h T 0 c 2 , q 2 + t 2 / q i 1 h T i 2 , q q 2 / q = ( 1 - δ 2 k ) k 1 - 2 / q ( γ ( h , q ) 2 + t 2 / q ) i 1 h T i 2 , q q 2 / q .
(25)

By Equations (24) and (25),

γ ( h , q ) 2 1 - t t 1 - 2 / q ( 1 - δ 2 k ) - t 2 / q f ( t ) .

Through a straightforward calculation, it is easy to get that the maximum of f(t) occurs at t 0 = 1 - q / 2 2 - δ 2 k and

f ( t 0 ) = δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) .

If f(t 0) < 1, then we have γ(h,q) < 1. However, f(t 0) < 1 amounts to

δ 2 k 2 - q 2 - δ 2 k + q 1 - q / 2 2 - δ 2 k 2 / q < 2 - q 2 - δ 2 k ( 1 - δ 2 k ) .

or, equivalently,

δ 2 k + q 2 2 / q + 1 2 - q 2 - δ 2 k 2 / q - 1 < 1 2
(26)

Since the second term on the left-hand side of (26) goes to zero as q → 0+ whenever q ≤ 1, δ 2k  < 1, and

q 2 2 / q + 1 2 - q 2 - δ 2 k 2 / q - 1 q 2 - q 2 2 / q q e

We thus obtain that for δ 2k  < 1/2, there exists a value q 0 = q 0(δ 2k )  (0,1] such that for all q (0,q 0) the above inequality (26) is true.

From the definition of γ(h,q), we have

h T 0 2 , q 2 δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) h T 0 c 2 , q 2 .

As x q  = x + h q  is the minimum, using the equation x =x+ h T 0 + h T 0 c , we get

x 2 , q q = x T 0 2 , q q + x T 0 c 2 , q q x 2 , q q = x + h T 0 + h T 0 c 2 , q q = ( x + h T 0 + h T 0 c ) T 0 2 , q q + ( x + h T 0 + h T 0 c ) T 0 c 2 , q q = x + h T 0 2 , q q + h T 0 c 2 , q q x 2 , q q - h T 0 2 , q q + h T 0 c 2 , q q x T 0 2 , q q - h T 0 2 , q q + h T 0 c 2 , q q - x T 0 c 2 , q q .
(27)

Since x 2 , q q = x T 0 2 , q q + x T 0 c 2 , q q , this then implies

h T 0 c 2 , q q h T 0 2 , q q + 2 x T 0 c 2 , q q δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) q / 2 h T 0 c 2 , q q + 2 x T 0 c 2 , q q ,

That is,

h T 0 c 2 , q q 2 1 - δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) q / 2 x T 0 c 2 , q q .
(28)

Thus, we have

h 2 , q q = h T 0 2 , q q + h T 0 c 2 , q q δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) q / 2 + 1 h T 0 c 2 , q q 2 δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) q / 2 + 1 1 - δ 2 k 2 - q 2 - δ 2 k + q ( 1 - q / 2 2 - δ 2 k ) 2 / q 2 - q 2 - δ 2 k ( 1 - δ 2 k ) q / 2 x T 0 c 2 , q q = C q q x T 0 c 2 , q q .
(29)

This justifies that the first equality of (13) is proved. In the following, we further prove the second equality of (13).

In effect, from (24) and (25), we have

h T 0 T 1 2 2 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q j 1 h T j 2 , q q 2 / q 1 - δ 2 k
(30)

which, together with (22) and (30), then implies

h 2 h T 0 T 1 2 2 + h ( T 0 T 1 ) c 2 2 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q j 1 h T j 2 , q q 2 / q 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q = 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q = k 1 - 2 / q ( 1 - t t 1 - 2 / q + δ 2 k + ( 1 - δ 2 k ) 1 - t t 1 - 2 / q ) 1 - δ 2 k j 1 h T j 2 , q q 1 / q = k 1 / 2 - 1 / q ( 2 - δ 2 k ) ( 1 - t ) t 2 / q - 1 + δ 2 k 1 - δ 2 k j 1 h T j 2 , q q 1 / q k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 2 / q j 1 h T j 2 , q q 1 / q
(31)

where we have used the fact that

max t [ 0 , 1 ] ( 1 - t ) t 2 / q - 1 = ( 1 - t ) t 2 / q - 1 | t = 1 - q / 2 = q 2 1 - q 2 2 / q - 1

By (28), we further have

j 1 h T j 2 , q q 1 / q 2 1 / q x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q 2 2 / q - 1 x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q
(32)

Thus, it follows from (31) and (32) that

h 2 k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 2 / q 2 2 / q - 1 x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q = C 1 ( q , δ 2 k ) x T 0 c 2 , q k 1 / q - 1 / 2
(33)

That is, the second inequality in (13) holds. With this, the proof of Theorem 1 is completed.

We further consider the recovery problem of any high-dimensional signals in the presence of noise. In this situation, the measurement can be expressed as

y = Φ x + z ,

where z R N is an unknown noise term. In order to reconstruct x, we adopt the unconstraint mixed l 2/l q  minimization scheme with 0 < q < 1:

min x x 2 , q q s.t. y - Φ x 2 ϵ ,
(34)

where ϵ > 0 is a bound on the noisy level. We below show that one can also recover x stably and robustly under the same assumption as those in Theorem 1.

Theorem 2.

 (Noisy recovery). Let y = Φ x+z( z2 ≤ ϵ) be noisy measurements of a signal x. If the matrix Φ satisfies the block RIP (5) with

δ 2 k < 1 / 2 ,

then there exists a number q 0(δ 2k )  (0,1] such that for any q < q 0, the mixed l 2/l q  method (34) can stably and robustly recover the original signal x. More precisely, the solution x  of (34) obeys to

x- x 2 C 1 ( q , δ 2 k ) x T 0 c 2 , q k 1 / q - 1 / 2 + C 2 (q, δ 2 k )ϵ
(35)

where C 1(q,δ 2k ) and C 2(q,δ 2k ) are positive constants dependent on q and δ 2k , T 0 is the block index set over the k blocks with largest l 2 norm of the original signal x.

Remark 2.

 The inequality (35) in Theorem 2 offers an upper bound estimation on the recovery of the mixed l 2/l q minimization ( q (0,q 0)). In particular, this estimation shows that the recovery accuracy of the mixed l 2/l q  minimization can be controlled by the degree of sparsity of the signal and the exponential number q. It reveals also the close connections among the recovery precision of the mixed l 2/l q  minimization method may achieve, the sparsity of the signal and the index q used in the recovery procedure. In addition, the estimation (35) shows that the recovery precision of the method (34) can be bounded by the noise level. In this sense, Theorem 2 shows that under certain conditions, a block k-sparse signal can be robustly recovered by the method (34).

Proof of Theorem 2.

 The proof of Theorem 2 is similar to the procedure of the proof of Theorem 1 with minor differences.

To be more detail, we also set x  = x + h. Due to the existence of noise, in this case, h is not necessarily in the null space of Φ any more. But we can still prove Theorem 2 under the same assumption.

We still denote by T 0 the block index set over the k blocks with largest l 2 norm of x, and h T 0 the restriction of h onto these blocks. We also denote h T j (j1) similar to the proof of Theorem 1. By z2≤ ϵ and the triangle inequality, we first have

Φh 2 =Φ(x- x ) 2 Φx-y 2 +Φ x -y 2 2ϵ.
(36)

Since Φ( h T 0 + h T 1 )=Φ(h)-Φ j 2 h T j , from definition of the block-RIP, we get

( 1 - δ 2 k ) h T 0 T 1 2 2 = ( 1 - δ 2 k ) ( h T 0 2 2 + h T 1 2 2 ) Φ ( h T 0 T 1 ) 2 2 = Φ h - j 2 Φ ( h T j ) 2 2 Φ h 2 + j 2 Φ ( h T j ) 2 2 2 ϵ + j 2 Φ ( h T j ) 2 2
(37)

Hence,

h T 0 2 2 + h T 1 2 2 2 ϵ + j 2 Φ ( h T j ) 2 2 1 - δ 2 k
(38)

On the other hand,

h T 0 2 2 + h T 1 2 2 k 1 - 2 / q h T 0 2 , q 2 + k 1 - 2 / q h T 1 2 , q 2
(39)

From (38) and (39), we thus have

h T 0 2 , q 2 k 2 / q - 1 2 ϵ + j 2 Φ ( h T j ) 2 2 - ( 1 - δ 2 k ) h T 1 2 , q 2 1 - δ 2 k ( 2 ϵ ( 2 k ) 1 / q - 1 / 2 ) 2 + 4 ( 2 k ) 1 / q - 1 / 2 ϵ · ( k / 2 ) 1 / q - 1 / 2 j 2 Φ ( h T j ) 2 + k 2 / q - 1 j 2 Φ ( h T j ) 2 2 - ( 1 - δ 2 k ) h T 1 2 , q 2 1 - δ 2 k
(40)

Let h T 1 2 , q q =t i 1 h T i 2 , q q ,t[0,1] and we also have (23), thus

( k / 2 ) 1 / q - 1 / 2 j 2 Φ ( h T j ) 2 ( 1 - t ) t 2 / q - 1 + δ 2 k 2 2 / q - 1 j 1 h T j 2 , q q 1 / q
(41)

If we denote

g ( t ) ( 1 - t ) t 2 / q - 1 + δ 2 k 2 2 / q - 1 ,

then, by a easy calculation, we can easily obtain that the maximum of g(t) occurs at t 1 = 2 - q 2 and

g ( t 1 ) = δ 2 k + q 2 ( 2 - q 2 ) 2 / q - 1 2 2 / q - 1 δ 2 k + q 2 2 / q ( 2 - q 2 - δ 2 k ) 2 / q - 1

Therefore, we have

( k / 2 ) 1 / q - 1 / 2 j 2 Φ ( h T j ) 2 δ 2 k + q 2 2 / q 2 - q 2 - δ 2 k 2 / q - 1 j 1 h T j 2 , q q 1 / q
(42)

By (23) and the condition of f(t) in the proof of Theorem 1, we also have

k 2 / q - 1 j 2 Φ ( h T j ) 2 2 - ( 1 - δ 2 k ) h T 1 2 , q 2 ( 1 - δ 2 k ) f ( t ) j 1 h T j 2 , q q 2 / q ( 1 - δ 2 k ) f ( t 0 ) j 1 h T j 2 , q q 2 / q = δ 2 k + q 2 2 / q 2 - q 2 - δ 2 k 2 / q - 1 j 1 h T j 2 , q q 2 / q
(43)

Plugging (42) and (43) into (40), it is easy to see that

h T 0 2 , q 2 ( 2 k ) 1 / q - 1 / 2 ϵ 1 - δ 2 k + f ( t 0 ) j 1 h T j 2 , q q 1 / q

Consequently, we obtain

h T 0 2 , q q ( 2 ( 2 k ) 1 / q - 1 / 2 ϵ ) q ( 1 - δ 2 k ) q / 2 + ( f ( t 0 ) ) q / 2 j 1 h T j 2 , q q
(44)

In the following, we further prove that f(t 0) < 1 one can obtain the conclusion of Theorem 2. Precisely, under the same condition on δ 2k , we can prove Theorem 2.

Note that from (27), we also have

j 1 h T j 2 , q q h T 0 2 , q q + 2 x T 0 c 2 , q q

Plugging (44) into the above inequality and by f(t 0) < 1, one can show that

j 1 h T j 2 , q q ( 2 ( 2 k ) 1 / q - 1 / 2 ϵ ) q ( 1 - δ 2 k ) q / 2 ( 1 - ( f ( t 0 ) ) q / 2 ) + 2 x T 0 c 2 , q q 1 - ( f ( t 0 ) ) q / 2

Since v q  ≤ 21/q-1v1 for v R 2 , we further have

j 1 h T j 2 , q q 1 / q 2 2 / q - 1 / 2 k 1 / q - 1 / 2 ϵ 1 - δ 2 k ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q + 2 2 / q - 1 x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q
(45)

and by (37), (22), and (23), we also have

h 2 2 = h T 0 T 1 2 2 + h ( T 0 T 1 ) c 2 2 ( 2 ϵ + j 2 Φ ( h T j ) 2 ) 2 1 - δ 2 k + j 2 h T j 2 2 2 ϵ + ( 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q ) 1 / 2 j 1 h T j 2 , q q 1 / q 2 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q
(46)

since ( a + b ) 2 + c a+ b + c (a,b,c>0), it gives

h 2 2 ϵ + 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q 1 / 2 j 1 h T j 2 , q q 1 / q 2 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q 2 ϵ 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q + δ 2 k k 1 - 2 / q j 1 h T j 2 , q q 2 / q 1 - δ 2 k + 1 - t t 1 - 2 / q k 2 - q q j 1 h T j 2 , q q 2 / q = 2 ϵ 1 - δ 2 k + k 1 - 2 / q 1 - t t 1 - 2 / q + δ 2 k + ( 1 - δ 2 k ) 1 - t t 1 - 2 / q 1 - δ 2 k j 1 h T j 2 , q q 1 / q = 2 ϵ 1 - δ 2 k + k 1 / 2 - 1 / q ( 2 - δ 2 k ) ( 1 - t ) t 2 / q - 1 + δ 2 k 1 - δ 2 k j 1 h T j 2 , q q 1 / q 2 ϵ 1 - δ 2 k + k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 2 / q j 1 h T j 2 , q q 1 / q
(47)

where we have used the fact that

max t [ 0 , 1 ] ( 1 - t ) t 2 / q - 1 ( 1 - t ) t 2 / q - 1 | t = 1 - q / 2 = q 2 1 - q 2 2 / q - 1

Thus, it then follows from (45) and (47) that

h 2 2 ϵ 1 - δ 2 k + k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 2 / q × 2 2 / q - 1 / 2 k 1 / q - 1 / 2 ϵ 1 - δ 2 k ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q + 2 2 / q - 1 x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q = k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 1 / q - 1 x T 0 c 2 , q ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q + 2 ϵ 1 - δ 2 k 1 + k 1 / 2 - 1 / q ( 2 - δ 2 k ) q ( 2 - q ) 2 / q - 1 + 2 2 / q δ 2 k ( 1 - δ 2 k ) 2 1 / q - 3 / 2 ( 1 - ( f ( t 0 ) ) q / 2 ) 1 / q = C 1 ( q , δ 2 k ) x T 0 c 2 , q k 1 / q - 1 / 2 + C 2 ( q , δ 2 k ) ϵ
(48)

This arrives to the conclusion of Theorem 2.

6 An IRLS algorithm

Inspired by the ideas of [28, 33], in this section, we present an efficient IRLS algorithm for the solution of the mixed l 2/l q  norm minimization problem (34). We first rewrite the problem as the following regularized version of the unconstrained smoothed l 2/l q  minimization:

min x x 2 , q ϵ , q + 1 2 τ y-Φx 2 2 ,
(49)

where τ is an regularization parameter and

x 2 , q ϵ , q = i = 1 m ( x [ i ] 2 2 + ϵ 2 ) q / 2 .

Let J 2,q (x,ϵ,τ) be the objective function associated with (49), that is,

J 2 , q (x,ϵ,τ)= i = 1 m ( x [ i ] 2 2 + ϵ 2 ) q / 2 + 1 2 τ y-Φx 2 2 .
(50)

Then, the first-order necessary optimality condition for the solution of x is

q x [ i ] ( ϵ 2 + x [ i ] 2 2 ) 1 - q / 2 1 i m + 1 τ Φ T ( Φ x - y ) = 0 .

Hence, we define the diagonal weighting matrix W as W i  = d i a g(q 1/2(ϵ 2 + x[i]  22)q/4-1/2) for i th block, and after simple calculations, we can obtain the following necessary optimality condition:

(τ W 2 + Φ T Φ)x= Φ T y.
(51)

Due to the nonlinearity, there is no straightforward method to solve the above system of equations. But if we fix W = W (t) to be that determined already in the t th iteration step, the solution of (51), set as the (t + 1)th iterate, then can be found to be

x ( t + 1 ) = ( W ( t ) ) - 1 ( Φ ( W ( t ) ) - 1 ) T × ( Φ ( W ( t ) ) - 1 + τ I ) - 1 ( Φ ( W ( t ) ) - 1 ) T y .
(52)

This defines a nature iterative method for solution of (49). We formalize such reweighted algorithm as the following:

Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem

  •  Step 1. Initialize  x ( 0 ) =arg min x y-Φx 2 2 and k ̂ be the estimated block-sparsity. Set ϵ 0 = 1,t = 0.

  •  Step 2. while loop do

    W ( t ) diag ( q 1 / 2 ( ϵ t 2 + x ( t ) [ i ] 2 2 ) q / 4 - 1 / 2 ) , i = 1 , , m A ( t ) Φ ( W ( t ) ) - 1 x ( t + 1 ) ( W ( t ) ) - 1 ( ( A ( t ) ) T ( A ( t ) ) + τ I ) - 1 ( A ( t ) ) T y ϵ t + 1 min { ϵ t , αr ( x ( t + 1 ) ) k ̂ + 1 / N } t t + 1
  • until ϵ t + 1 = 0 otherwise repeat the above computation within a reasonable time

  •  Step 3. Output x (t + 1) to be an approximation solution.

In Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem, r ( x ) k ̂ + 1 is the ( k ̂ +1)th largest l 2 norm value of the block of x in the decreasing order, α (0,1) is a number such that α r(x (1))/N < 1, and τ is an appropriately chosen parameter which controls the tolerance of noise term. Although the best τ may change continuously with respect to noise level, we used the fixed value of τ in our numerical implementations.

Obviously, from the update equation (52), we can see that x (t) can be understood as the minimizer of

1 2 τ y - Φ x 2 2 + 1 2 W ( t ) x 2 2 .

Due to the iteratively fixing feature of W (t), this gives the name IRLS. Majumdar and Ward [12] first adopted the IRLS methodology to solve a non-convex l 2/l q  block-sparse optimization problem. But their algorithm is unsuitable for noisy block-sparse recovery. Lai et al. [28] proposed an IRLS algorithm for the unstrained l q (0 < q ≤ 1) minimization problem and they made an detailed analysis which includes convergence, local convergence rate and error bound of the algorithm. To some extent, our proposed algorithms can be considered as a generalization of their algorithm to the setting of block-sparse recovery.

Note that {ϵ t } in Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem is a bounded non-increasing sequence, which must converge to some ϵ  ≥ 0. Then using a similar argument as that in [28], we can prove that x (t) must have a convergent subsequence and the limit of the subsequence is a critical point of (50) whenever ϵ  > 0. In addition, there exists a convergent subsequence whose limit x  is a sparse vector with block-sparsity x 2 , 0 k ̂ when ϵ  = 0. Furthermore, we can also verify the super-linear local convergence rate for the proposed Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem. Due to space limitations, we leave the detailed analysis to the interested reader.

7 Numerical experiments

In this section, we conduct two numerical experiments to compare the non-convex l 2/l q (0 < q < 1) minimization method with the l 2/l 1 minimization method and the standard l q (0 < q ≤ 1) method in the context of block-sparse signal recovery. Note that this is possible because Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem can apply to the standard l q (0 < q ≤ 1) minimization method too. For all compared methods, we use the same starting point x (0) = arg minx y-Φx 2 2 .

In our experiments, the measurement matrix Φ was generated by creating an M × N matrix with i.i.d draws from a standard Gaussian distribution N (0,1). We considered four different values of q = 0.1,0.5,0.7,1 for both the l 2/l q  minimization method and the l q  minimization method. The purpose of experiments was to compare the recovery performance of the mixed l 2/l q  method and the l q  method for block-sparse signals without noise and with noise respectively.

7.1 Noiseless recovery

In this set of experiments, we considered the case that the signals were perfectly measured without noise. We first randomly generated the block-sparse signal x with values chosen from a Gaussian distribution of mean 0 and standard deviation 1 and then randomly drew a measurement matrix Φ from Gaussian ensemble. Then we observed the measurements y from the model y = Φ x. In all the experiment cases, if ϵ t+1 < 10-7 or x (t+1)-x (t)2 < 10-8, the iteration terminates and outputs x (t+1) as an approximation solution of original signal x; otherwise, we let the algorithms run to the maximum number of iterations m a x = 2000. We also set parameters τ and α to 10-5 and 0.7. We tested Algorithm Algorithm 1. An IRLS algorithm for the unconstrained smoothed l 2 /l q (0 < q < ≤ 1) minimization problem for different initial block-sparsity estimates and did find that any overestimated k ̂ of k would yield similar results. A typical simulation result is shown in Figure 1. Therefore, for simplicity, we set k ̂ =k+1 in our implementation.

Figure 1
figure1

A typical simulation result with different estimated block-sparsity. The randomly generated signals have length N = 256, m = 64 uneven size blocks, and k = 9 active blocks with total sparsity k 0 = 22. We observed m = 64 measurements and the average RMSE over 100 independent random trails in the logarithmic scale is shown.

Figure 2a depicts an instance of the generated block-sparse signal with signal length N = 512. There are 128 blocks with uneven block size and 16 active blocks with the sparsity: k 0 = x0 = 101. Figure 2b–d shows that the recovery results by the standard l q  method with q = 1, the standard l q  method with q = 0.5, and the mixed l 2/l 1 method, respectively, when M = 225. Since the sample size M is only around 2.2 times the signal sparsity k 0, the standard l 1 method does not yield good recovery results, whereas the mixed l 2/l 1 method and the non-convex l q (q = 0.5) method achieve near perfect recovery of the original signal. The results illustrate that if one incorporates the block-sparsity structure into the recovery procedure, the block version of convex l 1 minimization does also reduce the number of measurement as the standard non-convex l q  minimization with some q < 1.

Figure 2
figure2

Recovery results with  M = 225 . (a) Original signal; (b) results with l 1 minimization (RMSE = 0.1301); (c) results with l q (q = 0.5) minimization (RMSE = 0.000061); (d) results with l 2/l 1 minimization(RMSE = 0.000033).

We further compared the recovery performance of the standard l q  method and the mixed l 2/l q  method for different values of q. Figure 3 shows a similar instance as Figure 2. We generated a block-sparse signal with the same non-zero block locations as in Figure 2a, and we observed M = 144 measurements that is only around 1.4 times the signal sparsity k 0. From Figure 3, we can see that only the non-convex l 2/l q  method with q = 0.5,0.1 achieve near optimal recovery results while other methods fail. The results illustrate that for any q ≤ 0.5, the mixed l 2/l q  method can exactly recover the original signal. In addition, the results also demonstrate the outperformance of the non-convex mixed l 2/l q (0 < q < 1) method over the standard non-convex l q (0 < q < 1) method.

Figure 3
figure3

Recovery results with  M = 144 . (a) Original signal; (b) results with l q (q = 0.7) minimization (RMSE = 0.3589); (c) results with l q (q = 0.5) minimization (RMSE = 0.3857); (d) results with l q (q = 0.1) minimization (RMSE = 0.4124); (e) results with l 2/l 1 minimization (RMSE = 0.1688); (f) results with l 2/l q (q = 0.7) minimization (RMSE = 0.1164); (g) results with l 2/l q (q = 0.5) minimization (RMSE = 4.21 × 10-7); (h) results with l 2/l q (q = 0.1) minimization (RMSE = 8.23 × 10-8).

Figure 4a shows the effect of sample size, where we report the average root mean squares error (RMSE) over 100 independent random trails in the logarithmic scale for each sample size. In this case, we set signal length N = 256, and there are 64 blocks with uneven block size and the k = 4 active blocks were randomly extracted from the 64 blocks. The figure indicates the decay in recovery error as a function of sample size for all the algorithms. We can observe that both the l q and the mixed l 2/l q  methods improve the recovery performance as q decreases, and for a fixed q, the mixed l 2/l q  method is clearly superior to the standard l q  method in this block-sparse setting. To further study the effect of the active block number k (with k 0 fixed), we drew a matrix Φ of size 128×256 from Gaussian ensemble. We also set the signal x with the even block size and the total sparsity k 0 = 64. The block size was changed while keeping other parameters unchanged. Figure 4b shows the average RMSE over 100 independent random runs in the logarithmic scale. One can easily see that the recovery performance for the standard l q  method is independent to the active block number, while the recovery errors for the mixed l 2/l q  method are significantly better when the active block number k is far smaller than the total signal sparsity k 0. As expected, the performance of the mixed l 2/l q  method becomes identical to the standard l q  method when k = k 0. This illustrates that the mixed method favors large sized block when the total sparsity k 0 is fixed. Moreover, similar to the standard l q  method, the mixed l 2/l q  method performs better and better as q decreases.

Figure 4
figure4

Recovery performance: (a) average RMSE (log-scale) versus sample size ratio  M/k 0 ; (b) average RMSE (log-scale) versus active block number k .