Enhanced block sparse signal recovery based on $q$-ratio block constrained minimal singular values

In this paper we introduce the $q$-ratio block constrained minimal singular values (BCMSV) as a new measure of measurement matrix in compressive sensing of block sparse/compressive signals and present an algorithm for computing this new measure. Both the mixed $\ell_2/\ell_q$ and the mixed $\ell_2/\ell_1$ norms of the reconstruction errors for stable and robust recovery using block Basis Pursuit (BBP), the block Dantzig selector (BDS) and the group lasso in terms of the $q$-ratio BCMSV are investigated. We establish a sufficient condition based on the $q$-ratio block sparsity for the exact recovery from the noise free BBP and developed a convex-concave procedure to solve the corresponding non-convex problem in the condition. Furthermore, we prove that for sub-Gaussian random matrices, the $q$-ratio BCMSV is bounded away from zero with high probability when the number of measurements is reasonably large. Numerical experiments are implemented to illustrate the theoretical results. In addition, we demonstrate that the $q$-ratio BCMSV based error bounds are tighter than the block restricted isotropic constant based bounds.


Introduction
Compressive sensing (CS) [3,6] aims to recover an unknown sparse signal x ∈ R N from m noisy measurements y ∈ R m : where A ∈ R m×N is a measurement matrix with m N , and ∈ R m is additive noise such that 2 ≤ ζ for some ζ ≥ 0. It has been proven that if A satisfies the (stable/robust) null space property (NSP) or restricted isometry property (RIP), (stable/robust) recovery can be achieved [8,Chapter 4 and 6]. However, it is computationally hard to verify NSP and compute the restricted isometry constant (RIC) for an arbitrarily chosen A [1, 22]. To overcome the drawback, a new class of measures for the measurement matrix has been developed during the last decade. To be specific, [18] introduced a new measure called 1 -constrained minimal singular value (CMSV): ρ s (A) = min z =0, z 2 1 / z 2 2 ≤s Az 2 z 2 and obtained the 2 recovery error bounds in terms of the proposed measure for the Basis Pursuit (BP) [5], the Dantzig selector (DS) [4], and the 2 q-ratio block sparsity and q-ratio BCMSV -definition and property In this section, we introduce the definitions of the q-ratio block sparsity and the q-ratio BCMSV, and present their fundamental properties. A sufficient condition for block sparse signal recovery via the noise free BBP using the q-ratio block sparsity and an inequality for the q-ratio BCMSV are established.
Throughout the paper, we denote vectors by bold lower case letters or bold numbers, and matrices by upper case letters. x T denotes the transpose of a column vector x. For any vector x ∈ R N , we partition it into p blocks, each of length n, so we have x = [x T 1 , x T 2 , · · · , x T p ] T and x i ∈ R n denotes the i-th block of x. We define the mixed 2 / 0 norm x 2,0 = p i=1 1{x i = 0}, the mixed 2 / ∞ norm x 2,∞ = max 1≤i≤p x i 2 and the mixed 2 / q norm x 2,q = ( p i=1 x i q 2 ) 1/q for 0 < q < ∞. A signal x is block k-sparse if x 2,0 ≤ k.
[p] denotes the set {1, 2, · · · , p} and |S| denotes the cardinality of a set S. Furthermore, we use S c for the complement [p] \ S of a set S in [p]. The block support is defined by bsupp(x) := {i ∈ [p] : x i 2 = 0}. If S ⊂ [p], then x S is the vector coincides with x on the block indices in S and is extended to zero outside S. For any matrix A ∈ R m×N , kerA := {x ∈ R N : Ax = 0}, A T is the transpose. ·, · is the inner product function.
We first introduce the definition of the q-ratio block sparsity and its properties.
Definition 1 ( [25]). For any non-zero x ∈ R N and non-negative q / ∈ {0, 1, ∞}, the q-ratio block sparsity of x is defined as (2) The cases of q ∈ {0, 1, ∞} are evaluated by limits: Here π(x) ∈ R p with entries π i (x) = x i 2 / x 2,1 and H 1 is the ordinary Shannon entropy This is an extension of the sparsity measures proposed in [13,14], where estimation and statistical inference via α-stable random projection method were investigated. In fact, this kind of sparsity measure is based on entropy, which measures energy of blocks of x via π i (x). Formally, we can express the q-ratio block sparsity by where H q is the Rényi entropy of order q ∈ [0, ∞] [15,23]. When q / ∈ {0, 1, ∞}, the Rényi entropy is given by H q (π(x)) = 1 1−q log( p i=1 π i (x) q ), and for the cases of q ∈ {0, 1, ∞}, the Rényi entropy is evaluated by limits and results in (3), (4) and (5), respectively. The sparsity measure k q (x) has the following basic properties (see also [13,14,25]): • Continuity: unlike traditional block sparsity measure using the mixed 2 / 0 norm, k q (x) is continuous on R N \ {0} for all q > 0. Thus, it is stable with respect to small perturbations of a signal.
• Scale-invariance: for any c = 0, it holds that k q (cx) = k q (x). This property is in line with the common sense that the measure should not depend on absolute magnitude of a signal.
• Non-increasing with respect to q: For any q ≥ q ≥ 0, we have which follows from the non-increasing property of the Rényi entropy H q with respect to q.
• Range equals to [1, p]: for all x ∈ R N \ {0} with p blocks and all q ∈ [0, ∞], we have 1 ≤ Next, we present a sufficient condition for the exact recovery via the noise free BBP in terms of the q-ratio block sparsity. Recall that when the true signal x is block k-sparse, the sufficient and necessary condition for the exact recovery via the noise free BBP: in terms of the block NSP of order k was given by [9,17] z S 2,1 < z S c 2,1 , ∀z ∈ kerA \ {0}, S ⊂ [p] and |S| ≤ k.
Proposition 1. If x is block k-sparse and there exists at least one q ∈ (1, ∞] such that k is strictly less than then the unique solution to problem (7) is the true signal x.
Remark 1. This proposition is an extension of Proposition 1 in [27] from simple sparse signals to block sparse signals. In Section 5, we adopt a convex-concave procedure algorithm to solve (8) approximately.
Now we are ready to present the definition of the q-ratio BCMSV, which is developed based on the q-ratio block sparsity.

Definition 2.
For any real number s ∈ [1, p], q ∈ (1, ∞] and matrix A ∈ R m×N , the q-ratio block constrained minimal singular value (BCMSV) of A is defined as Remark 2. For measurement matrix A with unit norm columns, it is obvious that β q,s (A) ≤ 1 since Ae i 2 = 1, e i 2,q = 1 and k q (e i ) = 1, where e i is the i-th canonical basis for R N . Moreover, when q and A are fixed, β q,s (A) is non-increasing with respect to s. Besides, it is worth noticing that the q-ratio BCMSV depends also on the block size n, we choose to not show this parameter for the sake of simplicity. Another interesting finding is that for any α ∈ R, we have β q,s (αA) = |α|β q,s (A). This fact together with Theorem 1 in Section 3 implies that in the case of adopting a measurement matrix αA, increasing the measurement energy through |α| will proportionally reduce the mixed 2 / q norm of reconstruction errors.
Comparing to the block RIP [9], there are three main advantages by using the q-ratio BCMSV: • It is computable (see the algorithm in Section 5).
• The proof procedures and results of recovery error bounds are more concise (details in next section).
• The q-ratio BCMSV based recovery bounds are smaller (better) than the block RIC based bounds (shown in Section 5) [see also 20, 27, for another two specific examples] As for different q, we have the following important inequality, which plays a crucial role in deriving the probabilistic behavior of β q,s (A) via the existing results established in [20].

Recovery error bounds
In this section, we derive the recovery error bounds in terms of the mixed 2 / q norm and the mixed 2 / 1 norm via the q-ratio BCMSV of the measurement matrix. We focus on three renowned convex relaxation algorithms for block sparse signal recovery from (1): the BBP, the BDS and the group lasso.
Group lasso: min Here ζ and µ are parameters used in the constraints to control the noise level. We first present the following main results of recovery error bounds for the case when the true signal x is block k-sparse.
Theorem 1. Suppose x is block k-sparse. For any q ∈ (1, ∞], we have 1) If 2 ≤ ζ, then the solutionx to the BBP obeys x 2) If the noise in the BDS satisfies A T 2,∞ ≤ µ, then the solutionx to the BDS obeys x 3) If the noise in the group lasso satisfies A T 2,∞ ≤ κµ for some κ ∈ (0, 1), then the solutionx to the group lasso obeys x (11) and (12), then the noise free BBP (7) can uniquely recover any block k-sparse signal by letting ζ = 0.
Remark 5. The mixed 2 / q norm error bounds are generalized from the existing results in [20] (q = 2 and ∞) to any 1 < q ≤ ∞ and from [27] (simple sparse signal recovery) to block sparse signal recovery. The mixed 2 / q norm error bounds depend on the q-ratio BCMSV of the measurement matrix A, which is bounded away from zero for sub-gaussian random matrix and can be computed approximately by using a specific algorithm, which are discussed in the later sections.
Remark 6. As shown in literature, the block RIC based recovery error bounds for the BBP [9], the BDS [12] and the group lasso [10] are complicated. In contrast, as presented in this theorem, the q-ratio BCMSV based bounds are much more concise and corresponding derivations are much less complicated, which are given in the Appendix.
Next, we extend Theorem 1 to the case when the signal is block compressible, in the sense that it can be approximated by a block k-sparse signal. Given a block compressible signal x, let the mixed 2 / 1 error of the best block k-sparse approximation of x is to the block k-sparse signal.

Theorem 2.
Suppose that x is block compressible. For any 1 < q ≤ ∞, we have 1) If 2 ≤ ζ, then the solutionx to the BBP obeys x 2) If the noise in the BDS satisfies A T 2,∞ ≤ µ, then the solutionx to the BDS obeys x 3) If the noise in the group lasso satisfies A T 2,∞ ≤ κµ for some κ ∈ (0, 1), then the solutionx to the group lasso obeys x Remark 7. All the error bounds consist of two components, one is caused by the measurement error, and another one is due to the sparsity defect.

Random matrices
In this section, we study the properties of the q-ratio BCMSV of sub-gaussian random matrix. A random vector x ∈ R N is called isotropic and sub-gaussian with constant L if it holds for all u ∈ R N that E| x, u | 2 = u 2 2 and P (| x, u | ≥ t) ≤ 2 exp(− t 2 L u 2 ). Then as shown in Theorem 2 of [20], we have the following lemma.
Lemma 1 ( [20]). Suppose the rows of the scaled measurement matrix √ mA to be i.i.d isotropic and subgaussian random vectors with constant L. Then there exists constants c 1 and c 2 such that for any η > 0 and m ≥ 1 satisfying Then as a direct consequence of Proposition 2 ) and Lemma 1, we have the following probabilistic statements for β q,s (A).
Theorem 3. Under the assumptions and notations of Lemma 1, it holds that 1) When 1 < q < 2, there exist constants c 1 and c 2 such that for any η > 0 and m ≥ 1 satisfying m ≥ c 1 L 2 (sn + s log p) η 2 we have 2) When 2 ≤ q ≤ ∞, there exist constants c 1 and c 2 such that for any η > 0 and m ≥ 1 satisfying Remark 9. Theorem 3 shows that for sub-gaussian random matrix, the q-ratio BCMSV is bounded away from zero as long as the number of measurements is large enough. Sub-gaussian random matrices include Gaussian and Bernoulli ensembles.

Numerical experiments
In this section, we introduce a convex-concave method to solve the sufficient condition (8) so as to achieve the maximal block sparsity k and present an algorithm to compute the q-ratio BCMSV. We also conduct comparisons between the q-ratio BCMSV based bounds and block RIC based bounds through the BBP.

Solving the optimization problem (8)
According to Proposition 1, given a q ∈ (1, ∞] we need to solve the optimization problem (8) to obtain the maximal block sparsity k which guaranties that all block k-sparse signals can be uniquely recovered by (7). Solving (8) is equivalent to solve the problem: max z∈R N z 2,q s.t. Az = 0 and z 2,1 ≤ 1.
However, maximizing mixed 2 / q norm over a polyhedron is non-convex. Here we adopt the convexconcave procedure (CCP) (see [11] for details) to solve the problem (27) for any q ∈ (1, ∞]. The algorithm is presented as follows: Algorithm: CCP to solve (27).
Give an initial point to z l with l = 0. Iterate 1. Linearity. Approximate z 2,q using the first order Taylor expansion where z l b = [ z l 1 2 , · · · , z l 1 2 n , z l 2 2 , · · · , z l 2 2 n , · · · , z lp 2 , · · · , z lp 2 n ] with z l i 2 denoting the 2 norm of the i-th block of z l for i in [p]. 2. Maximization. Set z l+1 to be the result of 3. Updating iteration. Let l = l + 1. until stopping criterion is satisfied and k is the largest integer smaller than z l .
We implement the algorithm to solve (27) under the following settings. Let A be either Bernoulli or Gaussian random matrix with N = 256, varying m, block size n and q. Specifically, m = 64, 128, 192, n = 1, 2, 4, 8 and q = 2, 4, 16, 128, respectively. The results are summarized in Table 1. Note that when n = 1, the algorithm (28) is identical to the one in [27]. The main findings are as follows: (i) by comparing the results between Bernoulli and Gaussian random matrices under the same settings, there is no substantial difference. Thus we can now merely focus on the left part of the table, i.e. Bernoulli random matrix part; (ii) it can be seen that the results are not monotone with respect to q (see the row with n = 4, m = 192), which verifies the conclusion in Remark 3 ; (iii) when m is the only variable, it is easy to notice that the maximal block sparsity increases as m increases; (iv) conversely, when n is the only variable, the maximal block sparsity decreases as n increases, which is in line with the main result in [16, Theorem 3.1].

Computing the q-atio BCMSVs
Computing the q-ratio BCMSV (9) is equivalent to solve Since the constraint set is not convex, this is a non-convex optimization problem. In order to solve (29), we use Matlab function fmincon as in [27] and define z = z + −z − with z + = max(z, 0) and z − = max(−z, 0). Consequently, (29) can be reformulated to: Due to the existence of local minima, we perform an experiment to decide a reasonable number of iterations needed to achieve the 'global' minima shown in Figure 1. In the experiment, we calculate the q-ratio BCMSV of a fixed unit norm columns Bernoulli random matrix of size 40 × 64, n = s = 4 and varying q = 2, 4, 8, respectively. 50 iterations are carried out for each q. The figure shows that after about 30 experiments, the estimate of β q,s ,β q,s , becomes convergent, so in the following experiments we repeat the algorithm 40 times and choose the smallest valueβ q,s as the 'global' minima. We test indeed to vary m, s, n, respectively, all indicate 40 is a reasonable number to be chosen (not shown).
Next, we illustrate the properties of β q,s , which have been pointed out in Remarks 2 and 3, through experiments. We set N = 64 with three different block sizes n = 1, 4, 8 (i.e. number of blocks p = 64, 16,8), three different m = 40, 50, 60, three different q = 2, 4, 8 and three different s = 2, 4, 8. Unit norm columns Bernoulli random matrices are used. Results are listed in Table 2. They are inline with the theoretical results: (i) β q,s increases as m increases for all cases given that other parameters are fixed.  (ii) β q,s decreases as s increases for most of cases given that other parameters are fixed. There are exceptions when m = 40, n = 8 with s = 4 and s = 8 under q = 4, 8, respectively. However, the difference is about 0.0002, which is possibly caused by numerical approximation.
(iii) Monotonicity of β q,s does not hold with respect to q even given that other parameters are fixed.

Comparing error bounds
Here we compare the q-ratio BCMSV based bounds against the block RIC based bounds from the BBP under different settings. The block RIC based bound is if A satisfies the block RIP of order 2k, i.e. the block RIC δ 2k (A) < √ 2 − 1 [7,20]. By using the Hölder's inequality, one can obtain the mixed 2/ q norm for 0 < q ≤ 2. We compare the two bounds (32) and (12). Without loss of generality, let ζ = 1. δ 2k (A) is approximated using Monte Carlo simulations. Specifically, we randomly choose 1000 sub-matrices of A ∈ R m×N of size m×2nk to compute δ 2k (A) using the maximum of max(σ 2 max −1, 1−σ 2 min ) among all sampled sub-matrices. It turns out that this approximated block RIC is always smaller than or equal to the exact block RIC, thus the error bounds based on the exact block RIC are always larger than those based on the approximated block RIC. Therefore, it would be enough to show that the q-ratio BCMSV gives a sharper error bound than the approximated block RIC We use unit norm columns sub-matrices of a row-randomly-permuted Hadamard matrix (an orthogonal Bernoulli matrix) with N = 64, k = 1, 2, 4, n = 1, 2, q = 1.8 and a variety of m ≤ 64 to approximate the q-ratio BCMSV and the block RIC. Besides the Hadamard matrix, we also test Bernoulli random matrices and Gaussian random matrices with different configurations, which only return very fewer qualified block RICs. In the simulation results of [20], the authors showed that under all considered cases for Gaussian random matrices, δ 2k (A) > √ 2 − 1, which is coincident with our finding. Figure 2 shows that the q-ratio BCMSV based bounds are smaller than those based on the approximated block RIC. Note that when m approaches N , β q,s (A) → 1 and δ 2k (A) → 0, as a result, the q-ratio BCMSV based bounds are smaller than 2.2, while the block RIC based bounds are larger than or equal to 4.

Conclusion
In this study, we introduce the q-ratio block sparsity measure and the q-ratio BCMSV. Theoretically, through the q-ratio block sparsity measure and the q-ratio BCMSV, we (i) establish the sufficient condition for the unique noise free BBP recovery; (ii) derive both the mixed 2 / q norm and the mixed 2 / 1 norm bounds of recovery errors for the BBP, the BDS and the group lasso estimator; (iii) prove the q-ratio BCMSV is bounded away from zero if the number of measurements is relatively large for sub-gaussian random matrix. Afterwards, we use numerical experiments via two algorithms to illustrate theoretical results. In addition, we demonstrate that the q-ratio BCMSV based error bounds are much tighter than those based on block RIP through simulations.
There are still some issues left for future work. For example, analogue to the case for the q-ratio CMSV, the geometrical property of the q-ratio BCMSV can be investigated to derive sufficient conditions and error bounds for block sparse signal recovery. Error bound m k=1,n=1,q−ratio BCMSV k=2,n=1,q−ratio BCMSV k=4,n=1,q−ratio BCMSV k=1,n=2,q−ratio BCMSV k=2,n=2,q−ratio BCMSV k=4,n=2,q−ratio BCMSV k=1,n=2,block RIC k=2,n=2,block RIC k=4,n=2,block RIC which can be simplified to h S c 2,1 ≤ h S 2,1 . Thereby, we can obtain the following inequality: which is equivalent to For the group lasso, since the noise satisfies A T 2,∞ ≤ κµ for κ ∈ (0, 1) andx is a solution of the group lasso, we have Substituting y by Ax + leads to The last second inequality follows by applying Cauchy-Swcharz inequality block wise and the last inequality can be written as Therefore, it holds that which can be simplified to Thus we can obtain which can be reformulated by Step 2 : Obtain upper bound of Ah 2 and then construct the mixed 2 / q norm and the mixed 2 / 1 norm of the recovery error vector h via the q-ratio BCMSV for each algorithm.
(i) For the BBP, since both x andx satisfy the constraint y−Az 2 ≤ ζ, by using the triangle inequality we can get Following from the definition of the q-ratio BCMSV and k q (h) ≤ 2 q q−1 k, we have .
(ii) Similarly for the BDS, since both x andx satisfy the constraint A T (y − Az) 2,∞ ≤ µ, we have By applying the Cauchy-Swcharz inequality again as in Step 1, we obtain At last, with the definition of the q-ratio BCMSV, k q (h) ≤ 2 q q−1 k and h 2,1 ≤ 2k 1−1/q h 2,q , we get the upper bounds of the mixed 2 / q norm and the mixed 2 / 1 norm for h : (iii) For the group lasso, with A T 2,∞ ≤ κµ, we have Moreover, sincex is the solution of the group lasso, the optimality condition yields that where the sub-gradients in ∂ x 2,1 for the i-th block arex i / x i 2 ifx i = 0, and is some vector g satisfying g 2 ≤ 1 ifx i = 0 (which follows from the definition of sub-gradient). Thus, we have A T (y−Ax) 2,∞ ≤ µ, which leads to A T Ah 2,∞ ≤ (κ + 1)µ.
As a result, since k q (h) ≤ 2 1−κ q q−1 k and h 2,1 ≤ 2 1−κ k 1−1/q h 2,q , we can obtain which is equivalent to Proof of Theorem 2. Since the infimum of φ k (x) is achieved by an block k-sparse signal z whose non-zero blocks equal to the largest k blocks, indexed by S, of x, so φ k (x) = x S c 2,1 and let h =x − x. Similar as the proof procedure for Theorem 1, the derivations also have two steps.
Therefore, we have Step 2 : Verify that the q-ratio block sparsity of h has lower bound in the form of h 2,q for each algorithm, when h 2,q is larger than the part of recovery bounds caused by the measurement error.