Enhanced multi-task compressive sensing using Laplace priors and MDL-based task classification

In multi-task compressive sensing (MCS), the original signals of multiple compressive sensing (CS) tasks are assumed to be correlated. This is explored to recover signals in a joint manner to improve signal reconstruction performance. In this paper, we first develop an improved version of MCS that imposes sparseness over the original signals using Laplace priors. The newly proposed technique, termed as the Laplace prior-based MCS (LMCS), adopts a hierarchical prior model, and the MCS is shown analytically to be a special case of LMCS. This paper next considers the scenario where the CS tasks belong to different groups. In this case, the original signals from different task groups are not well correlated, which would degrade the signal recovery performance of both MCS and LMCS. We propose the use of the minimum description length (MDL) principle to enhance the MCS and LMCS techniques. New algorithms, referred to as MDL-MCS and MDL-LMCS, are developed. They first classify tasks into different groups and then reconstruct signals from each cluster jointly. Simulations demonstrate that the proposed algorithms have better performance over several state-of-art benchmark techniques.


Introduction
If a signal is compressible in the sense that its representation in a certain linear canonical basis is sparse, it can then be recovered from measurements obtained at a rate much lower than the Nyquist frequency using the technique of compressive sensing (CS) [1][2][3]. Mathematically, in CS, the signal is measured via y = 0 θ + n = θ + n (1) where θ is the N × 1 original signal vector, 0 denotes the M × N measurement matrix, denotes the N × N linear basis, = 0 , y is the M × 1 compressive measurement vector, and n is the additive noise. Since M is far smaller than N, the original signal is now compressively represented, but the inverse problem, namely recovering θ from y, is in general ill-posed. If θ is sparse (i.e., most of where · 2 and · 1 denote the l 2 -norm and the l 1norm, respectively, and the scalar ε is a small constant. Equation 2 has been the starting point for the development of many signal recovery methods in the literature. Among them, the recovery algorithms under the Bayesian framework provide some advantages over other formulations. These include providing probabilistic predictions, automatic estimation of model parameters, and the evaluation of the uncertainty of reconstruction. The existing Bayesian approaches include the Bayesian compressive sensing (BCS) [4] that stems from the relevance vector machine [5] and the Laplace prior-based BCS [6]. In [7], multi-task compressive sensing (MCS) was introduced within the Bayesian framework. In this work, a CS task refers to the union of an original signal vector, http://asp.eurasipjournals.com/content/2013/1/160 the measurement matrix, and the associated compressive measurement vector obtained using Equation 1. In contrast to the CS aim of recovering a single signal from its compressive measurements, MCS exploits the statistical correlation among the original signals of multiple CS tasks and recovers them jointly to improve the signal reconstruction performance. It has been shown in [7] that MCS allows recovering in a robust manner the signals whose compressive measurements are insufficient when they are reconstructed separately. The MCS technique has been investigated extensively in machine learning literature, where it was referred to as simultaneous sparse approximation (SSA) [8][9][10][11][12] as well as distributed compressed sensing [13]. In [14], an empirical Bayesian strategy for SSA was developed.
The contribution of this paper is twofold. We shall first extend the work of [6] on the Laplace prior-based BCS to the MCS scenario. A new MCS algorithm for signal recovery, termed as the Laplace prior-based MCS (LMCS), is developed. We impose Laplace priors on the original signals in a hierarchical manner and show that the MCS is indeed a special case of LMCS. The incorporation of Laplace priors enforces signal sparsity to a higher extent [15] and offers posterior distributions rather than point estimates as in MCS. Another advantage comes from the log-concavity of the Laplace distribution, which leads to unimodal posterior distribution and eliminates the presence of local minima as a result.
The second part of this work comes from the following observation. Specifically, in order to provide satisfactory signal reconstruction performance, the MCS technique from [7], together with the newly proposed LMCS method, requires that the original signals of the multiple CS tasks are well correlated statistically. This assumption may not be fulfilled in many practical applications. For instance, some original signals may be realizations of different signal templates that differ in their supports. In other words, they could belong to different signal groups, and the statistical correlation among them is weak, which would degrade the signal recovery performance. A possible approach to address this problem is to group the CS tasks before the signal reconstruction stage, as in the MCS with Dirichlet process priors (DP-MCS) [16].
The second contribution of this paper is the use of the minimum description length (MDL) principle to augment the MCS and LMCS methods. The obtained techniques are referred to as the MDL-MCS and MDL-LMCS algorithms. The MDL principle has been adopted to solve the model selection problem [17][18][19] and can also be used in other aspects, such as sparse coding and dictionary learning [20] and radar emitter classification [21][22][23]. In MDL, the best model for a given data y is the solution to the minimization problem ω = arg min ω∈ DL y, ω . Here, represents the set of possible models and DL y, ω is a codelength assignment function which defines the theoretical codelength required to describe y uniquely, which is the key component in any MDL-based classification technique. Common practice in MDL uses the ideal Shannon codelength assignment [24] to define DL y, ω in terms of a probability assignment p y, ω as DL y, ω = − log 2 p y, ω . Applying p y, ω = p y |ω p (ω), we have ω = arg min ω∈ − log 2 p y |ω − log 2 p (ω), where − log 2 p (ω) represents the model complexity. Note that the MCS and the new LMCS methods are both under the Bayesian framework, which enables their integration with the statistical MDL technique. Compared with the DP-MCS technique that utilizes variational Bayes (VB) inference and could suffer from local convergence, the newly proposed MDL-MCS and MDL-LMCS methods offer improved correct signal classification rate and better signal reconstruction performance. This is also illustrated via computer simulations in Section 5.
The remainder of this paper is structured as follows. In Section 2, we review the prior sharing concept in MCS and present the prior sharing framework in LMCS. Section 3 develops the proposed LMCS algorithm. We describe in Section 4 the MDL-based MCS and LMCS techniques, namely, the MDL-MCS and MDL-LMCS algorithms. Simulations are given in Section 5 to illustrate the performance of the proposed algorithms. Section 6 concludes the paper.

Prior sharing in MCS and LMCS
In the area of machine learning, information sharing among tasks is a well-known technique [25]. Typical approaches, to name a few, include sharing hidden nodes in neural networks [26,27], assigning a common prior in hierarchical Bayesian models [28][29][30], placing a common structure on the predictor space [31], and the structured regularization in kernel methods [32]. Among them, the use of hierarchical Bayesian models with shared priors is one of the most important methods for multi-task learning [33][34][35][36][37], which is also essential for the development of MCS in [7] and the LMCS algorithm in this paper. For the sake of clarity, in the rest of this section, we shall first review the prior sharing in the MCS algorithm and then proceed to present the hierarchical Bayesian framework of LMCS.
To facilitate the presentation, suppose there are L CS tasks where i = 1, 2, . . . , L, y i is the M i × 1 compressive measurement vector and i is the M i × N matrix (M i N) whose columns are i,j , j = 1, 2, . . . , N such that i = http://asp.eurasipjournals.com/content/2013/1/160 T is the original signal for task i and the measurement noise n i is assumed to follow an i.i.d. Gaussian distribution with zero mean vector and covariance matrix β −1 I. The conditional likelihood function of y i is where N y i | i θ i , β −1 I represents a Gaussian distribution with mean vector i θ i and covariance matrix β −1 I. The noise precision β follows a Gamma distribution where a and b are the shape and scale parameters of the Gamma distribution and (a) is the Gamma function.

Prior sharing in MCS
In MCS [7], the elements in θ i are statistically independent, and they follow a joint Gaussian distribution: Here, α = [α 1 , α 2 , . . . , α N ] T is the information vector shared by the original signals θ i of all the L tasks. Its distribution function is given by Ga α j |c, d .
In [7], the general strategy of setting the hyper-parameters a, c to ones and b, d to zeros in Equations 5 and 7 was adopted so that the prior of α and β are both uniformly distributed. As a result, they can be found via maximizing the following likelihood function: This is equivalent to maximizing the posterior distribution of α and β. The original signals θ i are then reconstructed using the estimated values of α and β.

Prior sharing in LMCS
Within the LMCS framework, the original signals are assigned Laplace priors. A possible approach to achieve this is to impose Laplace priors directly on the original signal, or mathematically, let p (θ i |λ) = λ 2 exp − λ 2 θ i 1 as in [6]. However, this formulation is not conjugate to the conditional distribution in Equation 4, which would render the Bayesian analysis intractable. Therefore, we adopt the hierarchical prior given by where γ = [γ 1 , . . . , γ N ] T , p γ j |λ , and p (λ|ν) are the prior distributions of γ j and λ, respectively. Compared with the MCS model given in Equation 6, Equation 7 reveals that in LMCS, information sharing is realized via the vector γ and the hyper-parameter λ. We have from Equations 9 to 11 This verifies that the used hierarchical prior model results in Laplace priors for the original signals θ i . As in MCS, LMCS recovers the original signals in a twostep manner. In particular, it first estimates γ , λ, β, and ν via maximizing the posterior distribution Taking the logarithm on both sides of the above equation yields ln p γ , λ, β, ν|y i . (14) http://asp.eurasipjournals.com/content/2013/1/160 It is straightforward to verify that p θ i |γ , λ, β, . Furthermore, from Equations 4 and 9, we have that it also has a Gaussian distribution N θ i |μ i , i with the mean vector and covariance matrix equal to With the estimated γ , λ, and ν, LMCS then proceeds to reconstruct the original signals from all the L CS tasks.
We illustrate the hierarchical prior model adopted in LMCS in Figure 1. It can be observed that, as in MCS, the distribution of the measurement noise n i is dependent on the noise precision β while the prior distribution functions of the original signals θ i depend on the information sharing vector γ . The difference here is that LMCS has one more layer of prior information, which is embedded in λ. The introduction of λ makes the prior distribution of the original signal Laplace, which is already shown in Equation 12. As a result, the proposed LMCS would promote the sparsity of the recovered signal, as pointed out in [15].

Multi-task compressive sensing using Laplace priors
We shall present the proposed LMCS algorithm in this section. The LMCS method differs from the MCS technique only in the step of identifying the information sharing vector γ and the parameters λ and ν while their signal recovery steps are the same. As a result, we shall focus on the estimation of γ , λ, and ν. Interested readers are directed to [7] for details on the signal recovery process.
As shown in previous works [7,38,39], the signal reconstruction performance would be degraded if the noise precision β is not properly initialized. Therefore, in this work, we consider β as a nuisance parameter and integrate it out to reduce the number of unknowns and improve the robustness of the algorithm. For this purpose, the prior distributions of the original signals θ i are rewritten as in [7]: where β has a Gamma prior distribution Note that in this case, p θ i |γ , λ, β, y i given above Equation 15 is still Gaussian with the mean vector and the covariance matrix given in Equation 15 and 16. After taking integration with respect to β, we have where det (·) is the determinant operator and Note that p θ i |γ , λ, y i has the functional form of a Student's t distribution, which is heavy tailed and as a result makes the LMCS algorithm more robust to the presence of outliers in the measurement noise in y i if any, as pointed out in [40]. Taking integration with respect to β on both sides of Equation 13, using Equation 19, and applying the logarithm yields the posterior distribution ln p γ , λ, ν|y i . We shall maximize it to estimate the information sharing vector γ and the parameter λ. We begin with integrating θ i out and applying the relation- where The matrices 0 and i are defined under Equation 16 and in Equation 21, respectively.
In the rest of this section, we shall present two methods for identifying γ and λ. The first technique, described in Section 3.1 iteratively maximizes L (γ , λ, ν) to find the accurate solution. It has high computational complexity, which motivates the development of an alternative method with much lower complexity in Section 3.2.

Iterative solution
Differentiating L (γ , λ, ν) with respect to γ j , j = 1, 2, . . . , N and setting the result to zero yield After some straightforward manipulations, we obtain where μ i,j is the jth element of μ i and i,jj is the jth diagonal element of i . Following a similar approach, λ can be found to be As in [6], we evaluate ν by solving where ψ (ν/2) denotes the derivative of ln (ν/2) with respect to ν/2. The iterative algorithm starts with an initial solution guess on γ , λ and ν. We next update the estimates of γ i using Equation 24 first, then proceed to evaluate λ and ν using Equations 25 and 26. The above process would be repeated until convergence. The iterative algorithm is based on alternating optimization and is computationally intensive. One of the computational burdens lies in the evaluation of Equations 20 and 21 required in the evaluation of Equation 24, where inverting matrices of size N × N is needed. This motivates the development of the following alternative algorithm.

Fast alternative solution
We start with decomposing B i defined under Equation 22 It can be verified via applying the matrix inversion lemma that the inverse of B i is equal to With the above notations, we are able to introduce L 0 (γ ) that collects the terms relating to γ in L (γ , λ, ν) in Equation 22, which is defined as http://asp.eurasipjournals.com/content/2013/1/160 Differentiating L 0 (γ ) with respect to γ j and setting the result to zero, we obtain Dividing both sides with γ 2 i , we can transform Equation 28 into Applying the approximation s i,j 1/γ j , which is generally valid numerically (e.g., typically we have s i,j > 20/γ j [7]), simplifies the denominator of and C 0 Lλ, and as a result, Equation 29 becomes The approximate solution of γ −1 i from Equation 30 has the form where 0 = B 2 0 − 4A 0 C 0 and C 0 ≥ 0. As shown in Appendix 1, on the basis of the fact that γ i ≥ 0, the estimate from Equation 31 can only take two possible values, i.e., When γ −1 j = ∞, it is equivalent to setting θ i,j to zero (see Equation 17). This indicates that i,j can be deleted from the matrix i . As a result, in contrast to the iterative approach for estimating γ i and λ (see Section 3.1), the alternative algorithm would have a complexity depending on the number of retained columns in the matrix i . Moreover, the evaluation of Equations 32 and 33 is relatively easy since computing s i,j and q i,j required in A 0 and B 0 can be achieved via [7]: where We summarize the procedure of the fast algorithm in Algorithm 1. The convergence criterion there is where L 0 γ k is the increment of L 0 (γ ) in the kth iteration and thresh denotes a pre-specified threshold value. To improve the convergence speed, in step 5 of Algorithm 1, we select the γ k j that leads to the largest increase in L 0 (γ ). Other steps in the algorithm, including updating μ i , i , s i,j , q i,j , and g i,j in steps 10 to 11 and changing the model as in steps 6 to 8, are the same as those detailed in 6 of [7]. Before the end of this section, we shall illustrate the relationship between the MCS algorithm and the newly proposed LMCS technique, in order to gain more insights. Within the MCS framework, the elements γ j in the information sharing vector γ are found via [7]:

Algorithm 1 FAST LMCS
On the other hand, from Equation 27, we have LMCS that obtains the estimate of γ j through Clearly, LMCS would reduce to MCS if λ = 0. This is somewhat expected from the comparison presented at the end of Section 2, where we show that, compared with MCS, LMCS introduces another layer of prior information embedded in the parameter λ. When λ = 0, we can As a result, Equations 32 and 33 would become which are identical to the approximate solutions to Equation 39 established in [7] (see Equations 39 and 40 in [7]). This corroborates the validity of the Bayesian derivations that lead to LMCS.

MDL-based task classification and signal reconstruction
The MCS algorithm and the newly proposed LMCS method both assume that the original signals of the L CS tasks are statistically correlated. In other words, the original signals belong to the same cluster or group, from the viewpoint of signal classification. When the above assumption is not fulfilled, the signal reconstruction performance of MCS and LMCS would be degraded. We shall develop in this section novel signal classification and recovery algorithms on the basis of the MDL principle. The new methods are referred to as MDL-MCS or MDL-LMCS so as to reflect the fact that we augment the MCS and LMCS techniques with MDL. We start this http://asp.eurasipjournals.com/content/2013/1/160 section with the theoretical derivation of the MDL-based classification for MCS and LMCS.

MDL-based classification
This subsection presents the basic MDL-based task classification framework. With MDL, the best model out of a family of competing statistical models for a given data is the one that yields the minimum description length for the data. Let Y= y 1 , . . . , y L be the set collecting the compressive measurements of the L CS tasks in consideration and ι = [ι 1 , . . . , ι L ] be the partition of Y into K clusters, where ι i = k means that y i belongs to the kth cluster, i = 1, . . . , L, and k = 1, . . . , K. Assuming statistical independence among signals from two different clusters, we can express the likelihood function of Y as  [20]. Here, p (D) and p (ι) are the priors of D and ι. S Y , S D , and S ι denote the numbers of elements in Y, D, and ι. As a result, the description length of Y can be rewritten as We proceed to evaluate Equation 45 for the cases of LMCS and MCS sequentially. In particular, as shown in Appendix 2, we have that for LMCS, N ), and L k is the number of tasks in the kth cluster. Other variables are the same as those in Equation 22.
For MCS, according to Equation 30 in [7], we have MCS is distributed uniformly, so − log 2 p D MCS would be a constant (see Section 2.1).
We now compute − log 2 p (ι) to complete the evaluation of Equation 45 for LMCS and MCS. Let n (L, ι) be the number of different ways to partition L tasks into K groups with each group having L k CS tasks and The numerator represents the number of different partitions if we generate them by taking sequentially L k tasks out of the L CS tasks while the denominator removes the partitions produced by simply swapping the tasks within a cluster without changing the clustering structure. Assuming that the ι has the prior of a uniform distribution, we have (50)

MDL-LMCS and MDL-MCS
Solving Equation 50 directly may be computationally prohibitive since it requires calculating the description length of Y for all possible clustering structures. To address this difficulty, we shall propose the new MDL-LMCS and MDL-MCS algorithms for classifying the CS tasks and recovering all original signals in a joint and iterative manner. The algorithm flow is summarized in Algorithm 2. It takes as its input the sets Y and that collect the compressive measurement vectors and the measurement matrices in the L CS tasks, respectively.
Since the tasks have not been classified at the beginning, the algorithm considers that they belong to a single cluster clust{1} = {Y, }, and as a result, it sets K, the number of obtained clusters, to be 1, and num, the number of unclassified tasks, to be L. The algorithm also initializes Y and , the sets that collect the compressive measurements and the measurement matrices of the unclassified tasks, as Y = Y and = . Signal reconstruction via LMCS or MCS for MDL-LMCS or MDL-MCS is then performed using Y and to obtain the reconstructed signal set 1 and the sharing parameter set D 1 . We plug D 1 into Equation 46 or 47 to calculate the total description length (TDL) mdl 1 for all the compressive measurements in Y. This completes the initialization stage of the algorithm.
The proposed algorithm proceeds to classify the L tasks as follows. In the first iteration, it first applies the operation CLASSIFY(·) to form a new cluster Y min , min from the unclassified task set Y. Y min hasL min tasks and their measurement matrices are collected in min . We remove Y min and min from Y and to update them, while the number of remaining unclassified task becomes num − L min . Now, we have K = 2 clusters, clust{1} = {Ŷ,ˆ }, and clust{2} = {Ŷ min ,ˆ min } a . LMCS or MCS is then applied to both clusters to identify their original signals and sharing parameters. The results are kept in 2 and D 2 , the latter of which is substituted into (46) or (47) for MDL-LMCS or MDL-MCS to compute again the TDL of Y, denoted by mdl 2 . This completes the processing of iteration 1. We then compare mdl 1 with mdl 2 and if mdl 2 < mdl 1 , the algorithm would start its second iteration to continue the task classification, where CLASSIFY(·) will be applied toŶ and yield clust{3}. The above process is repeated until mdl K > mdl K−1 occurs, which implies the appearance of over-fitting. The algorithm finally outputs the clusters available in the (K − 2)th iteration.
The function CLASSIFY (·) runs as follows. Each time when CLASSIFY (·) is executed, it first selects randomly a task out of the unclassified task set Y. With slight abuse of notation, we denote it as y i . It is paired with every of the remaining tasks in Y, and this yields num−1 two-task clusters. In the case of MDL-LMCS, we then apply LMCS to estimate the sharing parameters γ (t) , λ (t) , ν (t) of the two-task cluster t, t = 1, 2, . . . , num − 1 and compute the corresponding description length for y i via http://asp.eurasipjournals.com/content/2013/1/160 We next perform a grouping operation on the obtained num − 1 description length DL (t) LMCS y i to identify those tasks inŶ that are likely to correlate well with y i and should be grouped with y i in a new clusterŶ min (see Algorithm 2). Recall that each description length indeed corresponds to a task inŶ other than y i . The grouping procedure is based on the well-known K-mean technique. The difference here is that before the application of the K-mean, we first compute the algorithmic mean of DL (t) LMCS y i and set those above the mean value to be equal to the mean. This is equivalent to excluding the tasks that lead to large value of DL (t) LMCS y i when being paired with y i because they are unlikely to be well correlated with y i . We next apply K-mean to the remaining description length to obtain two groups. The mean description length for both groups are found. The tasks belonging to the group with a smaller mean description length are combined with y i to produce the output of CLASSIFY(·),Ŷ min .
In the case of MDL-MCS, CLASSIFY(·) is realized in the same manner as described above, except that the description length for y i is evaluated over every two-task cluster using

Implementation aspect
The development of MDL-LMCS and MDL-MCS presented in the previous subsection implicitly assumes that the quantization precision δ is known a priori. Nevertheless, in an ideal case, δ should be determined jointly with the optimal number of clusters K through minimizing the right-hand side of Equation 50 with respect to δ and K.
We shall follow the approach similar to the one adopted in [20] to determine the quantization precision. First, it can be verified that the value of δ would have no impact on locating the unclassified tasks that are correlated with a randomly selected one if the compressive measurement vectors of all the tasks have the same dimension. This is because, in this case, the term depending on δ in Equations 51 and 52 would be the same for any value of t. As a result, δ will affect the task classification performance via Equations 46 and 47 only, from which it can be seen that a very fine quantization would lead to a smaller number of clusters. This may degrade the signal reconstruction performance as weakly correlated signals may be recovered jointly. A large value of δ would not necessarily improve performance, as in this case, the original signals may tend to be recovered separately. Our experiments suggest that δ be within the range of 0.01 to 0.1, depending on the type of data to be processed. Throughout the experiments in Section 5, we shall fix δ to be 0.1, instead of attempting to optimize it for different experiments.

Simulations
Monte Carlo (MC) simulations using synthetic data and images are performed to illustrate the performance of the LMCS algorithm developed in Section 3 and the MDLaugmented MCS algorithms, namely, the MDL-LMCS and MDL-MCS techniques presented in Section 4.

Synthetic signals
In each simulation of this subsection, the length of the original signals of all the CS tasks is fixed at N = 512, and we generate two sets of results. One set of results is produced when the non-zero elements of the original signals take binary values ±1 in a random manner. The other set is generated with the non-zero elements of the original signals being independently drawn from zero-mean Gaussian distribution with unit variance. The elements of the measurement matrix of any CS task, on the other hand, can only be drawn from a Gaussian distribution with zero mean and variance one. Each column of any measurement matrix is normalized to have a unit norm.
For the purpose of comparison, we implement the BCS and MCS techniques developed in [4] and [7]. We shall denote them as ST-BCS and MCS in the figures. Here, ST stands for single task, and it is introduced to highlight that ST-BCS and MCS recover the original signals separately and jointly. We also implement the Laplace prior-based BCS proposed in [6] and denote it as LST-BCS. When implementing the three benchmark algorithms (ST-BCS, MCS, and LST-BCS) and the three proposed methods (LMCS, MDL-LMCS, and MDL-MCS), we always initialize a = 10 3 and b = 1 so that the noise precision β has the same prior distribution for all the algorithms considered (see Equation 5).
We shall follow the previous works [4,6,7] that proposed the three benchmark methods and use the average normalized signal reconstruction error as the primary performance metric. It is defined as 1 where θ i andθ i are the true and the estimated original signal vectors of the ith CS task. Note that the average normalized signal reconstruction error measures the Euclidean distance between the waveforms of the recovered and the original signals. It is not very informative regarding the quality of the recovered signal supports. Therefore, we shall also include in some experiments performance results of different algorithms in recovering the signal supports, which are quantified by the average incorrect support recovery ratio 1 Here, || · || 0 denotes the l 0 -norm and S(x) sets all the non-zero elements in x to be 1.

LMCS
We consider the case of L = 2 CS tasks as in [7], in order to illustrate the performance of the proposed LMCS technique and the existing methods under a simulation setup already used in the literature. The original signal of each task contains 64 non-zero elements at random locations. Zero-mean Gaussian noise with a standard deviation of 0.01 is added to the two obtained compressive measurement vectors b .
In the first simulation, we illustrate in Figure 2 the impact of different choices of the parameters λ and ν on the performance of LMCS. The two signals are assumed to have 75% of their non-zero elements overlapped. We realize LMCS with λ = 0, λ = 1, λ = 2, and λ estimated using Equation 25. The results shown are averaged over 200 runs. In particular, Figure 2a,b plots the average signal reconstruction error as a function of the number of compressive measurements for the two cases where the original signals are random binary numbers ±1 and zeromean Gaussian random variables with unit variance. The results show that in both cases, the reconstruction error of LMCS gradually improves as the number of compressive measurements increases, and the best performance is obtained when λ is estimated using Equation 25. Moreover, we can see that the LMCS with ν = 0 and ν estimated using Equation 26 yields similar signal reconstruction performance. The underlying reason is that the value of λ estimated jointly with ν is nearly identical to that obtained with ν = 0. This can be better explained http://asp.eurasipjournals.com/content/2013/1/160 as follows. The value of ν, when it is identified together with λ, is generally non-zero but less than one in this simulation. Careful examination of Equation 25 that gives the estimate of λ reveals that the impact of a small nonzero ν on λ is negligible, when the original signal length N is large (in this section, N = 512) and the measurement noise level is low, which implies a large value of the noise precision β, and as a result, large values of the hyper-parameters γ j for original signals having a unit variance (see Equation 17). Therefore, in the remaining simulations, we fix ν at zero when realizing LMCS and MDL-LMCS.
It is worthwhile to point out that rigorously, ν = 0 is a boundary value for the Gamma distribution. As ν approaches 0, the prior distribution of λ would provide vague information on λ as p(λ) ∝ 1/λ (also see Equation 19 in [6]). However, this would not change the fact that Laplace prior is imposed on the original signals, as shown in Equation 12. In other words, LMCS would still outperform MCS because it enhances the sparsity constraints on the non-zero elements of the original signals. This is also supported by the following simulation results (see Figures 3 and 4). Figure 3 demonstrates the impact of the correlation between the two original signals on the performance of LMCS. It considers the cases when the two original signals have binary non-zero elements, and they have 75% and 50% of their non-zero elements overlapped. Figure 3a of their non-zeros overlapped. The performance enhancement mainly comes from the use of Laplace priors on the original signals in LMCS. Compared with MCS, LMCS imposes another layer of prior information on the hyperparameters of the original signals, which makes MCS a special case of LMCS as shown in Equations 39 and 40 at the end of Section 4. As a result, LMCS offers more flexibility in modeling the sparsity of the original signals. This is also corroborated by Figure 3b, where it shows that in the case where the two original signals have 75% of their non-zero elements colocated, LMCS can provide a lower incorrect support recovery ratio and can better recover the sparse signal support. Figure 4 repeats the simulation experiment in Figure 3, but it considers the case where the two original signals have the non-zero elements drawn from zero-mean Gaussian distribution with unit variance. The obtained observations are similar to those in Figure 3.

MDL-based task classification and signal reconstruction
In this subsection, we present simulation results to illustrate the performance of MDL-MCS and MDL-LMCS developed in Section 4. For the purpose of comparison, we also show the results of the ST-BCS, LST-BCS, MCS, and LMCS methods as well as the DP-MCS technique. The simulated algorithms are used to recover the original signals of L = 40 CS tasks that belong to eight clusters with five tasks each. Every cluster has its own signal template that differs in the signal supports. All the original signals have 64 non-zero components, and their locations are initially chosen so that the correlation between any two original signals from different clusters is zero. Later, we perform the following perturbation to induce slight correlation among clusters. Specifically, in each ensemble run, six non-zero elements in each signal template are selected randomly and set to zero elements, while at the same time, six elements that are zeros in the original template are reset to be non-zeros. In this way, the five signals within the same cluster are highly correlated, but the signals from different clusters have distinct sparsity structures. The simulation results are obtained via averaging over 50 ensemble runs.
In Figure 5a,b, we plot as a function of the number of compressive measurements the binary signal reconstruction error and the correct task classification ratio of the simulated seven algorithms. As we can see from Figure 5a, pretending that the 40 CS tasks belong to the same group and recovering the original signals using LMCS or MCS would lead to a signal reconstruction error even higher than reconstructing the original signals separately via LST-BCS. This clearly demonstrates the impact of incorrect task classification on the signal recovery performance. On the other hand, the proposed MDL-LMCS and MDL-MCS algorithms outperform the DP-MCS technique in terms of lower signal reconstruction error. The performance improvement can be better explained by examining Figure 5b. We can see that the application of the MDL principle to augment LMCS and MCS leads to a greatly improved correct task classification ratio, compared with the DP-MCS technique. With the CS task correctly grouped, MDL-LMCS and MDL-MCS can better recover the original signals of every group.
We repeat the simulation used to generate Figure 5 with the original signals being zero-mean Gaussian random variables with unit variance. The obtained results are summarized in Figure 6. The observations obtained are similar to those in Figure 5.

Images
In this subsection, we compare the performance of MDL-MCS and MDL-LMCS with that of DP-MCS in recovering 2-D images of random bars. In this experiment, the elements of the measurement matrices of the three http://asp.eurasipjournals.com/content/2013/1/160 algorithms in consideration are drawn from a uniform spherical distribution. Figure 7 summarizes the reconstruction results from a particular run. The first three images in Figure 7a, labeled as tasks 1 to 3, are taken from [7], and they belong to the same cluster. The remaining six images in Figure 7a forms another two clusters, where one cluster consists of tasks 4 to 6 and the other is composed of tasks 7 to 9. These six images are modified from the first three images via permuting randomly the intensities of the rectangles and shifting their positions by distance randomly sampled from a uniform distribution.
All original images have the dimension of 1,024×1,024. Here, we utilize the Haar wavelet expansion with a coarsest scale of 3 and a finest scale of 6. Figure 7a gives the result of the inverse wavelet transform with 4,096 samples, denoted as linear in the figure. This is the best performance achievable by all the CS algorithms considered here. The reconstruction result from DP-MCS is shown in Figure 7b, where we adopted the hybrid CS scheme that compresses the fine-scale coefficients only as in [7] into M i = 680 (i = 1, . . . , 9) measurements for each task. Figure 7c,d gives the recovery results of MDL-MCS and MDL-LMCS, respectively.
We fix the original images and repeat the above experiment 20 times, each time with independently generated measurement matrices for all the three algorithms. In every run, the image reconstruction error for each task is evaluated and averaged to obtain the normalized image reconstruction error, which is again averaged over 20 runs to yield the average image reconstruction error summarized in Table 1. We also include in Table 1 the correct classification ratio.
The results in Figure 7 and Table 1 show that MDL-LMCS has the best image reconstruction and classification performance, while MDL-MCS yields a better performance than DP-MCS. This is consistent with the observations obtained from Figures 5 and 6.

Conclusions
In this paper, we first extended previous works on the Laplace prior-based Bayesian CS to the scenario of multiple CS tasks and developed the LMCS technique. The hierarchical prior model was adopted to impose the Laplace priors, and it was shown that the MCS algorithm is indeed a special case of LMCS. Next, this paper considered the scenario where the multiple CS tasks are from different groups, under which the performance of both MCS and LMCS would be degraded, since they attempt to recover the uncorrelated signals jointly. We proposed the MDL-based MCS techniques, namely, MDL-MCS and MDL-LMCS, which first classify tasks into different groups using the MDL principle and then reconstruct signals of every cluster. Simulations verified the enhanced performance of MDL-MCS and MDL-LMCS in terms of lower signal reconstruction error over the benchmark MCS and DP-MCS techniques as well as single-task CS algorithms.
Endnotes a It can be easily verified that in our algorithm, K is equal to the iteration index plus one. Besides, clust{1} always contains all the unclassified tasks and clust{K} is the newest cluster formed in the current iteration. b Our choice of the noise standard deviation of 0.01 is on the same order of the values adopted in the literature. For example, in [6] and [7], the noise standard deviation is set to be 0.03 and 0.005.

Derivation and analysis of Equations 32 and 33
In this appendix, we shall present the derivation that leads to Equations 32 and 33 and show that it is only a suboptimal solution to the maximization of Equation 27.
Our derivation applies the approximation that s i,j 1/γ j , which has been found to be valid numerically [7]. This results in the estimate of γ j having the functional form given in Equation 30. When A 0 > 0, both solutions in Equation 30 would be negative, which violates the requirement that γ j must be positive. If We next show that the solution in Equation 32 and 33 is suboptimal. For this purpose, utilizing the approximation s i,j 1/γ j transforms Equation 28 into (53) http://asp.eurasipjournals.com/content/2013/1/160 We can also obtain easily

Derivation of Equation 46
To avoid confusion, we use superscript (k) to denote the kth cluster in the following derivation. For mathematical tractability, besides the independence among signals from two different clusters, we also assume the independence among signals within the same cluster. As a result, we have where L k is the number of tasks in the kth cluster such Carrying out the integration, simplifying and applying some straightforward manipulations give Equation 46.