- Research
- Open Access

# Enhanced multi-task compressive sensing using Laplace priors and MDL-based task classification

- Ying-Gui Wang
^{1}Email author, - Le Yang
^{1, 2}, - Liang Tang
^{3}, - Zheng Liu
^{1}and - Wen-Li Jiang
^{1}

**2013**:160

https://doi.org/10.1186/1687-6180-2013-160

© Wang et al.; licensee Springer. 2013

**Received:**28 April 2013**Accepted:**1 October 2013**Published:**17 October 2013

## Abstract

In multi-task compressive sensing (MCS), the original signals of multiple compressive sensing (CS) tasks are assumed to be correlated. This is explored to recover signals in a joint manner to improve signal reconstruction performance. In this paper, we first develop an improved version of MCS that imposes sparseness over the original signals using Laplace priors. The newly proposed technique, termed as the Laplace prior-based MCS (LMCS), adopts a hierarchical prior model, and the MCS is shown analytically to be a special case of LMCS. This paper next considers the scenario where the CS tasks belong to different groups. In this case, the original signals from different task groups are not well correlated, which would degrade the signal recovery performance of both MCS and LMCS. We propose the use of the minimum description length (MDL) principle to enhance the MCS and LMCS techniques. New algorithms, referred to as MDL-MCS and MDL-LMCS, are developed. They first classify tasks into different groups and then reconstruct signals from each cluster jointly. Simulations demonstrate that the proposed algorithms have better performance over several state-of-art benchmark techniques.

## Keywords

- Multi-task
- Compressive sensing
- Laplace priors
- Minimum description length
- Task classification

## 1 Introduction

*N*× 1 original signal vector,

**Φ**

_{0}denotes the

*M*×

*N*measurement matrix, Ψ denotes the

*N*×

*N*linear basis,

**Φ**=

**Φ**

_{0}Ψ, y is the

*M*× 1 compressive measurement vector, and n is the additive noise. Since

*M*is far smaller than

*N*, the original signal is now compressively represented, but the inverse problem, namely recovering θ from y, is in general ill-posed. If θ is sparse (i.e., most of its elements are zero), the signal reconstruction problem could become feasible. An approximation to the original signal in this case can be obtained through the technique of basis pursuit that solves

where ∥·∥_{2} and ∥·∥_{1} denote the *l*
_{2}-norm and the *l*
_{1}-norm, respectively, and the scalar *ε* is a small constant. Equation 2 has been the starting point for the development of many signal recovery methods in the literature. Among them, the recovery algorithms under the Bayesian framework provide some advantages over other formulations. These include providing probabilistic predictions, automatic estimation of model parameters, and the evaluation of the uncertainty of reconstruction. The existing Bayesian approaches include the Bayesian compressive sensing (BCS) [4] that stems from the relevance vector machine [5] and the Laplace prior-based BCS [6].

In [7], multi-task compressive sensing (MCS) was introduced within the Bayesian framework. In this work, a CS task refers to the union of an original signal vector, the measurement matrix, and the associated compressive measurement vector obtained using Equation 1. In contrast to the CS aim of recovering a single signal from its compressive measurements, MCS exploits the statistical correlation among the original signals of multiple CS tasks and recovers them jointly to improve the signal reconstruction performance. It has been shown in [7] that MCS allows recovering in a robust manner the signals whose compressive measurements are insufficient when they are reconstructed separately. The MCS technique has been investigated extensively in machine learning literature, where it was referred to as simultaneous sparse approximation (SSA) [8–12] as well as distributed compressed sensing [13]. In [14], an empirical Bayesian strategy for SSA was developed.

The contribution of this paper is twofold. We shall first extend the work of [6] on the Laplace prior-based BCS to the MCS scenario. A new MCS algorithm for signal recovery, termed as the Laplace prior-based MCS (LMCS), is developed. We impose Laplace priors on the original signals in a hierarchical manner and show that the MCS is indeed a special case of LMCS. The incorporation of Laplace priors enforces signal sparsity to a higher extent [15] and offers posterior distributions rather than point estimates as in MCS. Another advantage comes from the log-concavity of the Laplace distribution, which leads to unimodal posterior distribution and eliminates the presence of local minima as a result.

The second part of this work comes from the following observation. Specifically, in order to provide satisfactory signal reconstruction performance, the MCS technique from [7], together with the newly proposed LMCS method, requires that the original signals of the multiple CS tasks are well correlated statistically. This assumption may not be fulfilled in many practical applications. For instance, some original signals may be realizations of different signal templates that differ in their supports. In other words, they could belong to different signal groups, and the statistical correlation among them is weak, which would degrade the signal recovery performance. A possible approach to address this problem is to group the CS tasks before the signal reconstruction stage, as in the MCS with Dirichlet process priors (DP-MCS) [16].

The second contribution of this paper is the use of the minimum description length (MDL) principle to augment the MCS and LMCS methods. The obtained techniques are referred to as the MDL-MCS and MDL-LMCS algorithms. The MDL principle has been adopted to solve the model selection problem [17–19] and can also be used in other aspects, such as sparse coding and dictionary learning [20] and radar emitter classification [21–23]. In MDL, the best model for a given data y is the solution to the minimization problem $\hat{\omega}=arg\underset{\omega \in \Omega}{\text{min}}\mathit{\text{DL}}\left(\mathit{y},\omega \right)$. Here, *Ω* represents the set of possible models and *D* *L*(y, *ω*) is a codelength assignment function which defines the theoretical codelength required to describe y uniquely, which is the key component in any MDL-based classification technique. Common practice in MDL uses the ideal Shannon codelength assignment [24] to define *D* *L*(y, *ω*) in terms of a probability assignment *p*(y, *ω*) as $\mathit{\text{DL}}\left(\mathit{y},\omega \right)=-{log}_{2}p\left(\mathit{y},\omega \right)$. Applying *p*(y, *ω*) = *p*(y|*ω*)*p*(*ω*), we have $\hat{\omega}=arg\underset{\omega \in \Omega}{\text{min}}-{log}_{2}p\left(\mathit{y}\left|\omega \right.\right)-{log}_{2}p\left(\omega \right)$, where $-{log}_{2}p\left(\omega \right)$ represents the model complexity. Note that the MCS and the new LMCS methods are both under the Bayesian framework, which enables their integration with the statistical MDL technique. Compared with the DP-MCS technique that utilizes variational Bayes (VB) inference and could suffer from local convergence, the newly proposed MDL-MCS and MDL-LMCS methods offer improved correct signal classification rate and better signal reconstruction performance. This is also illustrated via computer simulations in Section 5.

The remainder of this paper is structured as follows. In Section 2, we review the prior sharing concept in MCS and present the prior sharing framework in LMCS. Section 3 develops the proposed LMCS algorithm. We describe in Section 4 the MDL-based MCS and LMCS techniques, namely, the MDL-MCS and MDL-LMCS algorithms. Simulations are given in Section 5 to illustrate the performance of the proposed algorithms. Section 6 concludes the paper.

## 2 Prior sharing in MCS and LMCS

In the area of machine learning, information sharing among tasks is a well-known technique [25]. Typical approaches, to name a few, include sharing hidden nodes in neural networks [26, 27], assigning a common prior in hierarchical Bayesian models [28–30], placing a common structure on the predictor space [31], and the structured regularization in kernel methods [32]. Among them, the use of hierarchical Bayesian models with shared priors is one of the most important methods for multi-task learning [33–37], which is also essential for the development of MCS in [7] and the LMCS algorithm in this paper. For the sake of clarity, in the rest of this section, we shall first review the prior sharing in the MCS algorithm and then proceed to present the hierarchical Bayesian framework of LMCS.

*L*CS tasks

*i*= 1, 2, …,

*L*, y

_{ i }is the

*M*

_{ i }× 1 compressive measurement vector and

**Φ**

_{ i }is the

*M*

_{ i }×

*N*matrix (

*M*

_{ i }≪

*N*) whose columns are

**Φ**

_{ i,j },

*j*= 1, 2, …,

*N*such that

**Φ**

_{ i }= [

**Φ**

_{ i,1}, …,

**Φ**

_{ i,N }]. Here, θ

_{ i }= [

*θ*

_{ i,1}, …,

*θ*

_{ i,N }]

^{ T }is the original signal for task

*i*and the measurement noise n

_{ i }is assumed to follow an i.i.d. Gaussian distribution with zero mean vector and covariance matrix

*β*

^{-1}

**I**. The conditional likelihood function of y

_{ i }is

**Φ**

_{ i }θ

_{ i }and covariance matrix

*β*

^{-1}

**I**. The noise precision

*β*follows a Gamma distribution

where *a* and *b* are the shape and scale parameters of the Gamma distribution and *Γ*(*a*) is the Gamma function.

### 2.1 Prior sharing in MCS

_{ i }are statistically independent, and they follow a joint Gaussian distribution:

*α*

_{1},

*α*

_{2}, …,

*α*

_{ N }]

^{ T }is the information vector shared by the original signals θ

_{ i }of all the

*L*tasks. Its distribution function is given by

*a*,

*c*to ones and

*b*,

*d*to zeros in Equations 5 and 7 was adopted so that the prior of α and

*β*are both uniformly distributed. As a result, they can be found via maximizing the following likelihood function:

This is equivalent to maximizing the posterior distribution of α and *β*. The original signals θ
_{
i
} are then reconstructed using the estimated values of α and *β*.

### 2.2 Prior sharing in LMCS

*γ*

_{1}, …,

*γ*

_{ N }]

^{ T },

*p*(

*γ*

_{ j }|

*λ*), and

*p*(

*λ*|

*ν*) are the prior distributions of

*γ*

_{ j }and

*λ*, respectively. Compared with the MCS model given in Equation 6, Equation 7 reveals that in LMCS, information sharing is realized via the vector γ and the hyper-parameter

*λ*. We have from Equations 9 to 11

This verifies that the used hierarchical prior model results in Laplace priors for the original signals θ
_{
i
}.

*λ*,

*β*, and

*ν*via maximizing the posterior distribution

where **Γ**
_{0} = diag(1/*γ*
_{1}, …, 1/*γ*
_{
N
}).

With the estimated γ, *λ*, and *ν*, LMCS then proceeds to reconstruct the original signals from all the *L* CS tasks.

_{ i }is dependent on the noise precision

*β*while the prior distribution functions of the original signals θ

_{ i }depend on the information sharing vector γ. The difference here is that LMCS has one more layer of prior information, which is embedded in

*λ*. The introduction of

*λ*makes the prior distribution of the original signal Laplace, which is already shown in Equation 12. As a result, the proposed LMCS would promote the sparsity of the recovered signal, as pointed out in [15].

## 3 Multi-task compressive sensing using Laplace priors

We shall present the proposed LMCS algorithm in this section. The LMCS method differs from the MCS technique only in the step of identifying the information sharing vector γ and the parameters *λ* and *ν* while their signal recovery steps are the same. As a result, we shall focus on the estimation of γ, *λ*, and *ν*. Interested readers are directed to [7] for details on the signal recovery process.

*β*is not properly initialized. Therefore, in this work, we consider

*β*as a nuisance parameter and integrate it out to reduce the number of unknowns and improve the robustness of the algorithm. For this purpose, the prior distributions of the original signals θ

_{ i }are rewritten as in [7]:

*β*has a Gamma prior distribution

*p*(θ

_{ i }|γ,

*λ*,

*β*,y

_{ i }) given above Equation 15 is still Gaussian with the mean vector and the covariance matrix given in Equation 15 and 16. After taking integration with respect to

*β*, we have

Note that *p*(θ
_{
i
}|γ, *λ*, y
_{
i
}) has the functional form of a Student’s *t* distribution, which is heavy tailed and as a result makes the LMCS algorithm more robust to the presence of outliers in the measurement noise in y
_{
i
} if any, as pointed out in [40].

*β*on both sides of Equation 13, using Equation 19, and applying the logarithm yields the posterior distribution function $\sum _{i=1}^{L}lnp\left({\mathit{\theta}}_{i},\mathit{\gamma},\lambda ,\nu |{\mathit{y}}_{i}\right)=\sum _{i=1}^{L}lnp\left({\mathit{\theta}}_{i}|\mathit{\gamma},\lambda ,{\mathit{y}}_{i}\right)+\sum _{i=1}^{L}lnp\left(\mathit{\gamma},\lambda ,\nu |{\mathit{y}}_{i}\right)$. We shall maximize it to estimate the information sharing vector γ and the parameter

*λ*. We begin with integrating θ

_{ i }out and applying the relationship $p\left(\mathit{\gamma},\lambda ,\nu |{\mathit{y}}_{i}\right)=p\left({\mathit{y}}_{i},\mathit{\gamma},\lambda ,\nu \right)/p\left({\mathit{y}}_{i}\right)\propto p\left({\mathit{y}}_{i},\mathit{\gamma},\lambda ,\nu \right)$ to obtain

where ${\mathbf{B}}_{i}={\mathbf{I}+\mathbf{\Phi}}_{i}{\mathbf{\Gamma}}_{0}^{-1}{\mathbf{\Phi}}_{i}^{T}$, ${\mathbf{B}}_{i}^{-1}={\left({\mathbf{I}+\mathbf{\Phi}}_{i}{\mathbf{\Gamma}}_{0}^{-1}{\mathbf{\Phi}}_{i}^{T}\right)}^{-1}=\mathbf{I}-{\mathbf{\Phi}}_{i}{\mathbf{\Sigma}}_{i}{\mathbf{\Phi}}_{i}^{T}$, det(**B**
_{
i
}) = (det(**Γ**
_{0}))^{-1}(det(**Σ**
_{
i
}))^{-1}. The matrices **Γ**
_{0} and **Σ**
_{
i
} are defined under Equation 16 and in Equation 21, respectively.

In the rest of this section, we shall present two methods for identifying γ and *λ*. The first technique, described in Section 3.1 iteratively maximizes $\mathcal{L}\left(\mathit{\gamma},\lambda ,\nu \right)$ to find the accurate solution. It has high computational complexity, which motivates the development of an alternative method with much lower complexity in Section 3.2.

### 3.1 Iterative solution

*γ*

_{ j },

*j*= 1, 2, …,

*N*and setting the result to zero yield

_{ i,j }is the

*j*th element of μ

_{ i }and

**Σ**

_{ i,j j }is the

*j*th diagonal element of

**Σ**

_{ i }. Following a similar approach,

*λ*can be found to be

*ν*by solving

where *ψ*(*ν* / 2) denotes the derivative of ln*Γ*(*ν* / 2) with respect to *ν* / 2.

The iterative algorithm starts with an initial solution guess on γ, *λ* and *ν*. We next update the estimates of *γ*
_{
i
} using Equation 24 first, then proceed to evaluate *λ* and *ν* using Equations 25 and 26. The above process would be repeated until convergence. The iterative algorithm is based on alternating optimization and is computationally intensive. One of the computational burdens lies in the evaluation of Equations 20 and 21 required in the evaluation of Equation 24, where inverting matrices of size *N* × *N* is needed. This motivates the development of the following alternative algorithm.

### 3.2 Fast alternative solution

**B**

_{ i }defined under Equation 22 as ${\mathbf{B}}_{i}=\mathrm{I}+\sum _{k=1(\ne j)}^{N}{\gamma}_{k}{\mathbf{\Phi}}_{i,k}{\mathbf{\Phi}}_{i,k}^{T}+{\gamma}_{j}{\mathbf{\Phi}}_{i,j}{\mathbf{\Phi}}_{i,j}^{T}={\mathbf{B}}_{i,-j}+{\gamma}_{j}{\mathbf{\Phi}}_{i,j}{\mathbf{\Phi}}_{i,j}^{T}$, where

**B**

_{ i,-j }is

**B**

_{ i }with the contribution of the column

**Φ**

_{ i,j }in the matrix

**Φ**

_{ i }removed such that we have $det\left({\mathbf{B}}_{i}\right)=det\left({\mathbf{B}}_{i,-j}\right)det\left(1+{\gamma}_{k}\underset{i,j}{\overset{T}{\mathbf{\Phi}}}\underset{i,-j}{\overset{-1}{\mathbf{B}}}{\mathbf{\Phi}}_{i,j}\right).$ It can be verified via applying the matrix inversion lemma that the inverse of

**B**

_{ i }is equal to ${\mathbf{B}}_{i}^{-1}={\mathbf{B}}_{i,-j}^{-1}-{\gamma}_{j}\frac{{\mathbf{B}}_{i,-j}^{-1}{\mathbf{\Phi}}_{i,j}{\mathbf{\Phi}}_{i,j}^{T}{\mathbf{B}}_{i,-j}^{-1}}{1+{\gamma}_{j}{\mathbf{\Phi}}_{i,j}^{T}{\mathbf{B}}_{i,-j}^{-1}{\mathbf{\Phi}}_{i,j}}.$ With the above notations, we are able to introduce ${\mathcal{L}}_{0}\left(\mathit{\gamma}\right)$ that collects the terms relating to γ in $\mathcal{L}\left(\mathit{\gamma},\lambda ,\nu \right)$ in Equation 22, which is defined as

Here, γ
_{-j
} is γ with *γ*
_{
j
} removed, ${s}_{i,j}\triangleq {\mathbf{\Phi}}_{i,j}^{T}{\mathbf{B}}_{i,-j}^{-1}{\mathbf{\Phi}}_{i,j}$, ${q}_{i,j}\triangleq {\mathbf{\Phi}}_{i,j}^{T}{\mathbf{B}}_{i,-j}^{-1}{\mathit{y}}_{i}$, and ${g}_{i,j}\triangleq {\mathit{y}}_{i}^{T}{\mathbf{B}}_{i,-j}^{-1}{\mathit{y}}_{i}+2b$.

*γ*

_{ j }and setting the result to zero, we obtain

*s*

_{ i,j }≫ 1 /

*γ*

_{ j }, which is generally valid numerically (e.g., typically we have

*s*

_{ i,j }> 20 /

*γ*

_{ j }[7]), simplifies the denominator of Equation 29 into $\left({s}_{i,j}-{q}_{\mathrm{i.j}}^{2}/{g}_{i,j}\right){s}_{i,j}$. Meanwhile, let ${A}_{0}\triangleq \sum _{i=1}^{L}\frac{{s}_{i,j}+\lambda -\left({M}_{i}+2a\right){q}_{\mathrm{i.j}}^{2}/{g}_{i,j}}{\left({s}_{i,j}-{q}_{\mathrm{i.j}}^{2}/{g}_{i,j}\right){s}_{i,j}}$, ${B}_{0}\triangleq \sum _{i=1}^{L}\frac{\lambda {s}_{i,j}+\left({s}_{i,j}+\lambda \right)\left({s}_{i,j}-{q}_{\mathrm{i.j}}^{2}/{g}_{i,j}\right)}{\left({s}_{i,j}-{q}_{\mathrm{i.j}}^{2}/{g}_{i,j}\right){s}_{i,j}}$, and ${C}_{0}\triangleq L\lambda $, and as a result, Equation 29 becomes

where ${\Delta}_{0}={B}_{0}^{2}-4{A}_{0}{C}_{0}$ and *C*
_{0}≥0.

*γ*

_{ i }≥ 0, the estimate from Equation 31 can only take two possible values, i.e.,

_{ i,j }to zero (see Equation 17). This indicates that

**Φ**

_{ i,j }can be deleted from the matrix

**Φ**

_{ i }. As a result, in contrast to the iterative approach for estimating

*γ*

_{ i }and

*λ*(see Section 3.1), the alternative algorithm would have a complexity depending on the number of retained columns in the matrix

**Φ**

_{ i }. Moreover, the evaluation of Equations 32 and 33 is relatively easy since computing

*s*

_{ i,j }and

*q*

_{ i,j }required in

*A*

_{0}and

*B*

_{0}can be achieved via [7]:

where $\Delta {\mathcal{L}}_{0}\left({\mathit{\gamma}}^{k}\right)$ is the increment of ${\mathcal{L}}_{0}\left(\mathit{\gamma}\right)$ in the *k* th iteration and *thresh* denotes a pre-specified threshold value. To improve the convergence speed, in step 5 of Algorithm 1, we select the ${\gamma}_{j}^{k}$ that leads to the largest increase in ${\mathcal{L}}_{0}\left(\mathit{\gamma}\right)$. Other steps in the algorithm, including updating μ
_{
i
}, **Σ**
_{
i
}, *s*
_{
i,j
}, *q*
_{
i,j
}, and *g*
_{
i,j
} in steps 10 to 11 and changing the model as in steps 6 to 8, are the same as those detailed in 6 of [7].

#### Algorithm 1 **FAST LMCS**

*γ*

_{ j }in the information sharing vector γ are found via [7]:

*γ*

_{ j }through

*λ*= 0. This is somewhat expected from the comparison presented at the end of Section 2, where we show that, compared with MCS, LMCS introduces another layer of prior information embedded in the parameter

*λ*. When

*λ*= 0, we can verify that ${A}_{0}=\sum _{i=1}^{L}\frac{{s}_{i,j}-\left({M}_{i}+2a\right){q}_{\mathrm{i.j}}^{2}/{g}_{i,j}}{\left({s}_{i,j}-{q}_{\mathrm{i.j}}^{2}/{g}_{i,j}\right){s}_{i,j}}$,

*B*

_{0}=

*L*, and

*C*

_{0}= 0. As a result, Equations 32 and 33 would become

which are identical to the approximate solutions to Equation 39 established in [7] (see Equations 39 and 40 in [7]). This corroborates the validity of the Bayesian derivations that lead to LMCS.

## 4 MDL-based task classification and signal reconstruction

The MCS algorithm and the newly proposed LMCS method both assume that the original signals of the *L* CS tasks are statistically correlated. In other words, the original signals belong to the same cluster or group, from the viewpoint of signal classification. When the above assumption is not fulfilled, the signal reconstruction performance of MCS and LMCS would be degraded. We shall develop in this section novel signal classification and recovery algorithms on the basis of the MDL principle. The new methods are referred to as MDL-MCS or MDL-LMCS so as to reflect the fact that we augment the MCS and LMCS techniques with MDL. We start this section with the theoretical derivation of the MDL-based classification for MCS and LMCS.

### 4.1 MDL-based classification

**Y**= {y

_{1}, …, y

_{ L }} be the set collecting the compressive measurements of the

*L*CS tasks in consideration and ι = [

*ι*

_{1}, …,

*ι*

_{ L }] be the partition of Y into

*K*clusters, where

*ι*

_{ i }=

*k*means that y

_{ i }belongs to the

*k*th cluster,

*i*= 1, …,

*L*, and

*k*= 1, …,

*K*. Assuming statistical independence among signals from two different clusters, we can express the likelihood function of Y as

**D**= {d

_{1}, …, d

_{ K }} is the set of model parameters, d

_{ k }is the model parameter vector of the model for the

*k*th cluster,

**Y**

_{ k }contains the compressive measurements of the CS tasks in the

*k*th cluster, and

*p*

_{ k }(

**Y**

_{ k }|d

_{ k }) represents the likelihood function of

**Y**

_{ k }. The description length of

**Y**under the model set D is then

**D**and the CS task partition ι are statistically independent, we have $\mathit{\text{DL}}\left(\mathbf{D},\mathit{\iota}\right)=-{log}_{2}p\left({\left[\mathbf{D}\right]}_{\delta}\right)-{log}_{2}p\left({\left[\mathit{\iota}\right]}_{\delta}\right)$, and it acts as a penalty function measuring the model complexity. The notation [·]

_{ δ }denotes elementwise quantization with precision

*δ*. With sufficient quantization precision, we have $p\left({\left[\mathbf{Y}\left|\mathbf{D}\right.,\mathit{\iota}\right]}_{\delta}\right)\approx p\left(\mathbf{Y}\left|\mathbf{D}\right.,\mathit{\iota}\right){\delta}^{{S}_{\mathbf{Y}}}$, $p\left({\left[\mathbf{D}\right]}_{\delta}\right)\approx p\left(\mathbf{D}\right){\delta}^{{S}_{\mathbf{D}}}$, and $p\left({\left[\mathit{\iota}\right]}_{\delta}\right)\approx p\left(\mathit{\iota}\right){\delta}^{{S}_{\mathit{\iota}}}$[20]. Here,

*p*(

**D**) and

*p*(ι) are the priors of

**D**and ι.

*S*

_{ Y },

*S*

_{ D }, and

*S*

_{ ι }denote the numbers of elements in

**Y**,

**D**, and ι. As a result, the description length of

**Y**can be rewritten as

where ${\mathbf{D}}^{\text{LMCS}}=\left\{{\mathit{d}}_{k}^{\text{LMCS}}\right\}$, *k* = 1, …, *K*, is the set of the model parameters in LMCS, ${\mathit{d}}_{k}^{\text{LMCS}}=\left\{{\mathit{\gamma}}^{\left(k\right)},{\lambda}^{\left(k\right)}\right\}$ contains the information sharing parameters of the *k* th cluster, ${\mathbf{B}}_{i}^{\left(k\right)}={\mathbf{I}}^{\left(k\right)}{+\mathbf{\Phi}}_{i}^{\left(k\right)}{\left({\mathbf{\Gamma}}_{0}^{\left(k\right)}\right)}^{-1}{\left({\mathbf{\Phi}}_{i}^{\left(k\right)}\right)}^{T}$, ${\mathbf{\Gamma}}_{0}^{\left(k\right)}=\text{diag}(1/{\gamma}_{1}^{\left(k\right)},\dots ,1/{\gamma}_{N}^{\left(k\right)})$, and *L*
_{
k
} is the number of tasks in the *k* th cluster. Other variables are the same as those in Equation 22.

where ${\mathbf{D}}^{\text{MCS}}=\left\{{\mathit{d}}_{k}^{\text{MCS}}\right\}$ is the set of model parameters for MCS, ${\mathit{d}}_{k}^{\text{MCS}}=\left\{{\mathit{\alpha}}_{\text{MCS}}^{\left(k\right)}\right\}$, ${\mathit{\alpha}}_{\text{MCS}}^{\left(k\right)}$ is the information sharing vector of cluster *k*, ${\mathbf{C}}_{i}^{\left(k\right)}={\mathbf{I}}^{\left(k\right)}{+\mathbf{\Phi}}_{i}^{\left(k\right)}{\left({\mathbf{A}}_{\text{MCS}}^{\left(k\right)}\right)}^{-1}{\left({\mathbf{\Phi}}_{i}^{\left(k\right)}\right)}^{T}$, and ${\mathbf{A}}_{\text{MCS}}^{\left(k\right)}=\text{diag}\left({\mathit{\alpha}}_{\text{MCS}}^{\left(k\right)}\right)$. In MCS, ${\mathbf{A}}_{\text{MCS}}^{\left(k\right)}$ is distributed uniformly, so $-{log}_{2}p\left({\mathbf{D}}^{\text{MCS}}\right)$ would be a constant (see Section 2.1).

*n*(

*L*, ι) be the number of different ways to partition

*L*tasks into

*K*groups with each group having

*L*

_{ k }CS tasks and $\sum _{k=1}^{K}{L}_{k}=L$. It can be verified that

*n*(

*L*, ι) is equal to

*L*

_{ k }tasks out of the

*L*CS tasks while the denominator removes the partitions produced by simply swapping the tasks within a cluster without changing the clustering structure. Assuming that the ι has the prior of a uniform distribution, we have

**Y**of the

*L*CS tasks under LMCS or MCS. Given a quantization precision

*δ*, the MDL criterion finds the optimal number of clusters

*K*via

### 4.2 MDL-LMCS and MDL-MCS

Solving Equation 50 directly may be computationally prohibitive since it requires calculating the description length of **Y** for all possible clustering structures. To address this difficulty, we shall propose the new MDL-LMCS and MDL-MCS algorithms for classifying the CS tasks and recovering all original signals in a joint and iterative manner. The algorithm flow is summarized in Algorithm 2. It takes as its input the sets **Y** and **Φ** that collect the compressive measurement vectors and the measurement matrices in the *L* CS tasks, respectively.

Since the tasks have not been classified at the beginning, the algorithm considers that they belong to a single cluster clust{1} = {**Y**, **Φ**}, and as a result, it sets *K*, the number of obtained clusters, to be 1, and *num*, the number of unclassified tasks, to be *L*. The algorithm also initializes $\hat{\mathbf{Y}}$ and $\hat{\mathbf{\Phi}}$, the sets that collect the compressive measurements and the measurement matrices of the unclassified tasks, as $\hat{\mathbf{Y}}=\mathbf{Y}$ and $\hat{\mathbf{\Phi}}=\mathbf{\Phi}$. Signal reconstruction via LMCS or MCS for MDL-LMCS or MDL-MCS is then performed using $\hat{\mathbf{Y}}$ and $\hat{\mathbf{\Phi}}$ to obtain the reconstructed signal set ${\hat{\mathit{\Theta}}}_{1}$ and the sharing parameter set ${\hat{\mathbf{D}}}_{1}$. We plug ${\hat{\mathbf{D}}}_{1}$ into Equation 46 or 47 to calculate the total description length (TDL) *mdl*
_{1} for all the compressive measurements in **Y**. This completes the initialization stage of the algorithm.

The proposed algorithm proceeds to classify the *L* tasks as follows. In the first iteration, it first applies the operation CLASSIFY(·) to form a new cluster $\left\{{\hat{\mathbf{Y}}}_{\text{min}},{\hat{\mathbf{\Phi}}}_{\text{min}}\right\}$ from the unclassified task set $\hat{\mathbf{Y}}$. ${\hat{\mathbf{Y}}}_{\text{min}}$ has *L*
_{min} tasks and their measurement matrices are collected in ${\hat{\mathbf{\Phi}}}_{\text{min}}$. We remove ${\hat{\mathbf{Y}}}_{\text{min}}$ and ${\hat{\mathbf{\Phi}}}_{\text{min}}$ from $\hat{\mathbf{Y}}$ and $\hat{\mathbf{\Phi}}$ to update them, while the number of remaining unclassified task becomes num-*L*
_{min}. Now, we have *K* = 2 clusters, $\mathit{clust}\left\{1\right\}=\{\widehat{\mathbf{Y}},\widehat{\mathbf{\Phi}}\}$, and $\mathit{clust}\left\{2\right\}=\{{\widehat{\mathbf{Y}}}_{\text{min}},{\widehat{\mathbf{\Phi}}}_{\text{min}}\}$
^{a}. LMCS or MCS is then applied to both clusters to identify their original signals and sharing parameters. The results are kept in ${\hat{\mathit{\Theta}}}_{2}$ and ${\hat{\mathbf{D}}}_{2}$, the latter of which is substituted into (46) or (47) for MDL-LMCS or MDL-MCS to compute again the TDL of **Y**, denoted by mdl_{2}. This completes the processing of iteration 1. We then compare mdl_{1} with mdl_{2} and if mdl_{2} < *mdl*
_{1}, the algorithm would start its second iteration to continue the task classification, where CLASSIFY(·) will be applied to $\widehat{\mathbf{Y}}$ and yield clust{3}. The above process is repeated until *mdl*
_{
K
}>*mdl*
_{
K-1} occurs, which implies the appearance of over-fitting. The algorithm finally outputs the clusters available in the (*K*-2)th iteration.

_{ i }. It is paired with every of the remaining tasks in $\hat{Y}$, and this yields

*num*-1 two-task clusters. In the case of MDL-LMCS, we then apply LMCS to estimate the sharing parameters {γ

^{(t)},

*λ*

^{(t)},

*ν*

^{(t)}} of the two-task cluster

*t*,

*t*= 1, 2, …,

*num*-1 and compute the corresponding description length for y

_{ i }via

We next perform a grouping operation on the obtained num-1 description length ${\mathit{\text{DL}}}_{\text{LMCS}}^{\left(t\right)}\left({\mathit{y}}_{i}\right)$ to identify those tasks in $\widehat{\mathbf{Y}}$ that are likely to correlate well with y
_{
i
} and should be grouped with y
_{
i
} in a new cluster ${\widehat{\mathbf{Y}}}_{\text{min}}$ (see Algorithm 2). Recall that each description length indeed corresponds to a task in $\widehat{\mathbf{Y}}$ other than y
_{
i
}. The grouping procedure is based on the well-known *K*-mean technique. The difference here is that before the application of the *K*-mean, we first compute the algorithmic mean of ${\mathit{\text{DL}}}_{\text{LMCS}}^{\left(t\right)}\left({\mathit{y}}_{i}\right)$ and set those above the mean value to be equal to the mean. This is equivalent to excluding the tasks that lead to large value of ${\mathit{\text{DL}}}_{\text{LMCS}}^{\left(t\right)}\left({\mathit{y}}_{i}\right)$ when being paired with y
_{
i
} because they are unlikely to be well correlated with y
_{
i
}. We next apply *K*-mean to the remaining description length to obtain two groups. The mean description length for both groups are found. The tasks belonging to the group with a smaller mean description length are combined with y
_{
i
} to produce the output of CLASSIFY(·), ${\widehat{\mathbf{Y}}}_{\text{min}}$.

_{ i }is evaluated over every two-task cluster using

#### Algorithm 2 **MDL-LMCS (or MDL-MCS)**

### 4.3 Implementation aspect

The development of MDL-LMCS and MDL-MCS presented in the previous subsection implicitly assumes that the quantization precision *δ* is known *a priori*. Nevertheless, in an ideal case, *δ* should be determined jointly with the optimal number of clusters *K* through minimizing the right-hand side of Equation 50 with respect to *δ* and *K*.

We shall follow the approach similar to the one adopted in [20] to determine the quantization precision. First, it can be verified that the value of *δ* would have no impact on locating the unclassified tasks that are correlated with a randomly selected one if the compressive measurement vectors of all the tasks have the same dimension. This is because, in this case, the term depending on *δ* in Equations 51 and 52 would be the same for any value of *t*. As a result, *δ* will affect the task classification performance via Equations 46 and 47 only, from which it can be seen that a very fine quantization would lead to a smaller number of clusters. This may degrade the signal reconstruction performance as weakly correlated signals may be recovered jointly. A large value of *δ* would not necessarily improve performance, as in this case, the original signals may tend to be recovered separately. Our experiments suggest that *δ* be within the range of 0.01 to 0.1, depending on the type of data to be processed. Throughout the experiments in Section 5, we shall fix *δ* to be 0.1, instead of attempting to optimize it for different experiments.

## 5 Simulations

Monte Carlo (MC) simulations using synthetic data and images are performed to illustrate the performance of the LMCS algorithm developed in Section 3 and the MDL-augmented MCS algorithms, namely, the MDL-LMCS and MDL-MCS techniques presented in Section 4.

### 5.1 Synthetic signals

In each simulation of this subsection, the length of the original signals of all the CS tasks is fixed at *N* = 512, and we generate two sets of results. One set of results is produced when the non-zero elements of the original signals take binary values ±1 in a random manner. The other set is generated with the non-zero elements of the original signals being independently drawn from zero-mean Gaussian distribution with unit variance. The elements of the measurement matrix of any CS task, on the other hand, can only be drawn from a Gaussian distribution with zero mean and variance one. Each column of any measurement matrix is normalized to have a unit norm.

For the purpose of comparison, we implement the BCS and MCS techniques developed in [4] and [7]. We shall denote them as ST-BCS and MCS in the figures. Here, ST stands for single task, and it is introduced to highlight that ST-BCS and MCS recover the original signals separately and jointly. We also implement the Laplace prior-based BCS proposed in [6] and denote it as LST-BCS. When implementing the three benchmark algorithms (ST-BCS, MCS, and LST-BCS) and the three proposed methods (LMCS, MDL-LMCS, and MDL-MCS), we always initialize *a* = 10^{3} and *b* = 1 so that the noise precision *β* has the same prior distribution for all the algorithms considered (see Equation 5).

We shall follow the previous works [4, 6, 7] that proposed the three benchmark methods and use the average normalized signal reconstruction error as the primary performance metric. It is defined as $\frac{1}{L}\sum _{i=1}^{L}{\u2225{\mathit{\theta}}_{i}-{\widehat{\theta}}_{i}\u2225}_{2}/{\u2225{\mathit{\theta}}_{i}\u2225}_{2}$, where θ
_{
i
} and ${\widehat{\theta}}_{i}$ are the true and the estimated original signal vectors of the *i* th CS task. Note that the average normalized signal reconstruction error measures the Euclidean distance between the waveforms of the recovered and the original signals. It is not very informative regarding the quality of the recovered signal supports. Therefore, we shall also include in some experiments performance results of different algorithms in recovering the signal supports, which are quantified by the average incorrect support recovery ratio $\frac{1}{L}\sum _{i=1}^{L}\left|\right|S\left({\mathit{\theta}}_{i}\right)-S\left({\widehat{\mathit{\theta}}}_{i}\right)|{|}_{0}/N$. Here, ||·||_{0} denotes the *l*
_{0}-norm and *S*(**x**) sets all the non-zero elements in **x** to be 1.

#### 5.1.1 LMCS

We consider the case of *L* = 2 CS tasks as in [7], in order to illustrate the performance of the proposed LMCS technique and the existing methods under a simulation setup already used in the literature. The original signal of each task contains 64 non-zero elements at random locations. Zero-mean Gaussian noise with a standard deviation of 0.01 is added to the two obtained compressive measurement vectors^{b}.

*λ*and

*ν*on the performance of LMCS. The two signals are assumed to have 75% of their non-zero elements overlapped. We realize LMCS with

*λ*=0,

*λ*= 1,

*λ*= 2, and

*λ*estimated using Equation 25. The results shown are averaged over 200 runs. In particular, Figure 2a,b plots the average signal reconstruction error as a function of the number of compressive measurements for the two cases where the original signals are random binary numbers ±1 and zero-mean Gaussian random variables with unit variance. The results show that in both cases, the reconstruction error of LMCS gradually improves as the number of compressive measurements increases, and the best performance is obtained when

*λ*is estimated using Equation 25. Moreover, we can see that the LMCS with

*ν*= 0 and

*ν*estimated using Equation 26 yields similar signal reconstruction performance. The underlying reason is that the value of

*λ*estimated jointly with

*ν*is nearly identical to that obtained with

*ν*= 0. This can be better explained as follows. The value of

*ν*, when it is identified together with

*λ*, is generally non-zero but less than one in this simulation. Careful examination of Equation 25 that gives the estimate of

*λ*reveals that the impact of a small non-zero

*ν*on

*λ*is negligible, when the original signal length

*N*is large (in this section,

*N*= 512) and the measurement noise level is low, which implies a large value of the noise precision

*β*, and as a result, large values of the hyper-parameters

*γ*

_{ j }for original signals having a unit variance (see Equation 17). Therefore, in the remaining simulations, we fix

*ν*at zero when realizing LMCS and MDL-LMCS.

*ν*= 0 is a boundary value for the Gamma distribution. As

*ν*approaches 0, the prior distribution of

*λ*would provide vague information on

*λ*as $p\left(\lambda \right)\propto 1/\lambda $ (also see Equation 19 in [6]). However, this would not change the fact that Laplace prior is imposed on the original signals, as shown in Equation 12. In other words, LMCS would still outperform MCS because it enhances the sparsity constraints on the non-zero elements of the original signals. This is also supported by the following simulation results (see Figures 3 and 4).

Figure 3 demonstrates the impact of the correlation between the two original signals on the performance of LMCS. It considers the cases when the two original signals have binary non-zero elements, and they have 75% and 50% of their non-zero elements overlapped. Figure 3a,b plots the average signal reconstruction error and the incorrect support recovery ratio of LMCS as a function of the number of compressive measurements. The results shown are averaged over 50 runs. For comparison, we also include in the figures the results from ST-BCS, LST-BCS, and MCS. We can observe from Figure 3a that LMCS and MCS outperform greatly over ST-BCS and LST-BCS due to the utilization of the prior sharing mechanism (see Section 2). The performance of LMCS and MCS improves as the number of the overlapping non-zero elements in the two original signals increases, as expected. More importantly, LMCS exhibits superior performance in terms of a much lower signal reconstruction error over MCS for the two cases where the two original signals have 75% and 50% of their non-zeros overlapped. The performance enhancement mainly comes from the use of Laplace priors on the original signals in LMCS. Compared with MCS, LMCS imposes another layer of prior information on the hyper-parameters of the original signals, which makes MCS a special case of LMCS as shown in Equations 39 and 40 at the end of Section 4. As a result, LMCS offers more flexibility in modeling the sparsity of the original signals. This is also corroborated by Figure 3b, where it shows that in the case where the two original signals have 75% of their non-zero elements colocated, LMCS can provide a lower incorrect support recovery ratio and can better recover the sparse signal support.

Figure 4 repeats the simulation experiment in Figure 3, but it considers the case where the two original signals have the non-zero elements drawn from zero-mean Gaussian distribution with unit variance. The obtained observations are similar to those in Figure 3.

#### 5.1.2 MDL-based task classification and signal reconstruction

In this subsection, we present simulation results to illustrate the performance of MDL-MCS and MDL-LMCS developed in Section 4. For the purpose of comparison, we also show the results of the ST-BCS, LST-BCS, MCS, and LMCS methods as well as the DP-MCS technique.

The simulated algorithms are used to recover the original signals of *L* = 40 CS tasks that belong to eight clusters with five tasks each. Every cluster has its own signal template that differs in the signal supports. All the original signals have 64 non-zero components, and their locations are initially chosen so that the correlation between any two original signals from different clusters is zero. Later, we perform the following perturbation to induce slight correlation among clusters. Specifically, in each ensemble run, six non-zero elements in each signal template are selected randomly and set to zero elements, while at the same time, six elements that are zeros in the original template are reset to be non-zeros. In this way, the five signals within the same cluster are highly correlated, but the signals from different clusters have distinct sparsity structures. The simulation results are obtained via averaging over 50 ensemble runs.

### 5.2 Images

In this subsection, we compare the performance of MDL-MCS and MDL-LMCS with that of DP-MCS in recovering 2-D images of random bars. In this experiment, the elements of the measurement matrices of the three algorithms in consideration are drawn from a uniform spherical distribution.

All original images have the dimension of 1,024 × 1,024. Here, we utilize the Haar wavelet expansion with a coarsest scale of 3 and a finest scale of 6. Figure 7a gives the result of the inverse wavelet transform with 4,096 samples, denoted as linear in the figure. This is the best performance achievable by all the CS algorithms considered here. The reconstruction result from DP-MCS is shown in Figure 7b, where we adopted the hybrid CS scheme that compresses the fine-scale coefficients only as in [7] into *M*
_{
i
} = 680 (*i* = 1, …, 9) measurements for each task. Figure 7c,d gives the recovery results of MDL-MCS and MDL-LMCS, respectively.

**Image reconstruction and classification performance of DP-MCS, MDL-MCS, and MDL-LMCS**

Average reconstruction | Correct classification | |
---|---|---|

error | ratio | |

Linear | 0.22623 | - |

DP-MCS | 0.27647 | 0.35 |

MDL-MCS | 0.24511 | 0.60 |

MDL-LMCS | 0.22642 | 1.00 |

The results in Figure 7 and Table 1 show that MDL-LMCS has the best image reconstruction and classification performance, while MDL-MCS yields a better performance than DP-MCS. This is consistent with the observations obtained from Figures 5 and 6.

## 6 Conclusions

In this paper, we first extended previous works on the Laplace prior-based Bayesian CS to the scenario of multiple CS tasks and developed the LMCS technique. The hierarchical prior model was adopted to impose the Laplace priors, and it was shown that the MCS algorithm is indeed a special case of LMCS. Next, this paper considered the scenario where the multiple CS tasks are from different groups, under which the performance of both MCS and LMCS would be degraded, since they attempt to recover the uncorrelated signals jointly. We proposed the MDL-based MCS techniques, namely, MDL-MCS and MDL-LMCS, which first classify tasks into different groups using the MDL principle and then reconstruct signals of every cluster. Simulations verified the enhanced performance of MDL-MCS and MDL-LMCS in terms of lower signal reconstruction error over the benchmark MCS and DP-MCS techniques as well as single-task CS algorithms.

## Endnotes

^{a} It can be easily verified that in our algorithm, *K* is equal to the iteration index plus one. Besides, clust{1} always contains all the unclassified tasks and clust{*K*} is the newest cluster formed in the current iteration.

^{b} Our choice of the noise standard deviation of 0.01 is on the same order of the values adopted in the literature. For example, in [6] and [7], the noise standard deviation is set to be 0.03 and 0.005.

## Appendix 1

### Derivation and analysis of Equations 32 and 33

In this appendix, we shall present the derivation that leads to Equations 32 and 33 and show that it is only a suboptimal solution to the maximization of Equation 27.

Our derivation applies the approximation that *s*
_{
i,j
} ≫ 1 / *γ*
_{
j
}, which has been found to be valid numerically [7]. This results in the estimate of *γ*
_{
j
} having the functional form given in Equation 30. When *A*
_{0} > 0, both solutions in Equation 30 would be negative, which violates the requirement that *γ*
_{
j
} must be positive. If *A*
_{0} < 0, only the solution ${\gamma}_{j}^{-1}=\left(-{B}_{0}-\sqrt{{\Delta}_{0}}\right)/\left(2{A}_{0}\right)$ is valid. For the case *A*
_{0} = 0, from Equation 27, *γ*
_{
j
} will have the accurate solution *γ*
_{
j
} = 0. This completes the derivation of Equations 32 and 33.

*s*

_{ i,j }≫ 1 /

*γ*

_{ j }transforms Equation 28 into

This indicates that the solution in Equation 32 is the unique maximizer of the approximated version of Equation 27. However, solving Equation 29 accurately, which is equal to finding all the candidate maximizers for Equation 27, may yield two or more positive estimates of *γ*
_{
j
}. Among them, one would be relatively close to the approximate solution in Equation 32. In other words, the approximate solution is within the vicinity of a stationary point of Equation 27, which may only correspond to a local maxima.

## Appendix 2

### Derivation of Equation 46

*k*) to denote the

*k*th cluster in the following derivation. For mathematical tractability, besides the independence among signals from two different clusters, we also assume the independence among signals within the same cluster. As a result, we have

*L*

_{ k }is the number of tasks in the

*k*th cluster such that $\sum _{k=1}^{K}{L}_{k}=L$, ${\mathbf{D}}^{\text{LMCS}}=\left\{{\mathit{d}}_{k}^{\text{LMCS}}\right\}$,

*k*= 1, …,

*K*, and ${\mathit{d}}_{k}^{\text{LMCS}}=\left\{\phantom{\rule{1em}{0ex}}{\mathit{\gamma}}^{\left(k\right)},{\lambda}^{\left(k\right)}\right\}$ contain the information sharing parameters of the

*k*th cluster. Similarly, assuming statistical independence among ${\mathit{d}}_{k}^{\text{LMCS}}$, we obtain

Carrying out the integration, simplifying and applying some straightforward manipulations give Equation 46.

## Declarations

### Acknowledgements

The authors wish to thank the editor and the anonymous reviewers for their constructive suggestions. The authors thank S. Derin Babacan, Shihao Ji, and David Dunson for sharing codes of their algorithms. This work was supported in part by Hunan Provincial Innovation Foundation for Postgraduates under Grant CX2012B019, Fund of Innovation, Graduate School of National University of Defense Technology under grant B120404, and National Natural Science Foundation of China (no. 61304264).

## Authors’ Affiliations

## References

- Baraniuk R: A lecture on compressive sensing. IEEE Mag. Signal Process. 2007, 24(4):118-121.Google Scholar
- Candés E, Romberg J, Tao T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information.
*IEEE Trans. Inf. Theory*2006, 52(2):489-509.View ArticleMathSciNetMATHGoogle Scholar - Donoho DL: Compressed sensing.
*IEEE Trans. Inf. Theory*2006, 52(4):1289-1306.MathSciNetView ArticleMATHGoogle Scholar - Ji S, Xue Y, Carin L: Bayesian compressive sensing.
*IEEE Trans. Signal Process*2008, 56(6):2346-2356.MathSciNetView ArticleGoogle Scholar - Tipping ME: Sparse Bayesian learning and the relevance vector machine.
*J. Mach. Learn. Res*2001, 1: 211-244.MathSciNetMATHGoogle Scholar - Babacan S, Katsaggelos A, Molina R: Bayesian compressive sensing using Laplace priors.
*IEEE Trans. Image Process*2010, 19(1):53-63.MathSciNetView ArticleGoogle Scholar - Ji S, Dunson D, Carin L: Multi-task compressive sensing.
*IEEE Trans. Signal Process*2009, 57(1):92-106.MathSciNetView ArticleGoogle Scholar - Leviatan D, Temlyakov VN: Simultaneous approximation by greedy algorithms, Technical report, University of South Carolina. 2003.Google Scholar
- Cotter SF, Rao BD, Engan K, Kreutz-Delgado K: Sparse solutions to linear inverse problems with multiple measurement vectors.
*IEEE Trans. Signal Process*2005, 53(7):2477-2488.MathSciNetView ArticleGoogle Scholar - Tropp JA, Gilbert AC, Strauss MJ: Algorithms for simultaneous sparse approximation.
*Part I: Greedy pursuit. Signal Process*2006, 86: 572-588.MATHGoogle Scholar - Tropp JA: Algorithms for simultaneous sparse approximation.
*Part II: convex relaxation. Signal Process*2006, 86: 589-602.MATHGoogle Scholar - Escoda D, Granai L, Vandergheynst P: On the use of a priori information for sparse signal approximations.
*IEEE Trans. Signal Process*2006, 54(9):3468-3482.View ArticleGoogle Scholar - Baron D, Duarte MF, Sarvotham S: An information-theoretic approach to distributed compressed sensing.
*Proceedings of the 43rd Allerton Conference on Communication, Control, and Computing, Monticello, IL, Sept 2005*Google Scholar - Wipf DP, Rao BD: An empirical Bayesian strategy for solving the simultaneous sparse approximation problem.
*IEEE Trans. Signal Process*2007, 55(7):3704-3716.MathSciNetView ArticleGoogle Scholar - Seeger MW, Nickisch H: Compressed sensing and Bayesian experimental design.
*Proceedings of the 25th International Conference on Machine Learning, Helsinki, July 2008*Google Scholar - Qi Y, Liu D, Carin L, Dunson D: Multi-task compressive sensing with Dirichlet process priors.
*Proceedings of the 25th International Conference on Machine Learning, Helsinki, July 2008*Google Scholar - Rissanen J: Modeling by shortest data description.
*Automatica*1978, 14: 465—471.View ArticleMATHGoogle Scholar - Rissanen J, coding Universal, information prediction, Trans estimation. IEEE:
*Inf. Theory*. 1984, 30(4):629—636.View ArticleGoogle Scholar - Barron A, J Rissanen BYu: The minimum description length principle in coding and modeling.
*IEEE Trans. Inf. Theory*998, 44(6):2743—2760.MathSciNetMATHGoogle Scholar - Ramirez I, Sapiro G: An MDL framework for sparse coding and dictionary learning.
*IEEE Trans. Signal Process*2012, 60(6):2913—2927.MathSciNetView ArticleGoogle Scholar - Liu J, Gao SW, Luo ZQ: TN Davidson, JPY Lee, The minimum description length criterion applied to emitter number detection and pulse classification.
*Proceedings of the Ninth IEEE Workshop on Statistical Signal and Array Processing, Portland, OR, Sept 1998*Google Scholar - Wong KM, Luo ZQ, Liu J, Lee JPY, Gao S W: Radar emitter classification using intrapulse data.
*Int. J. Electron. Comm*1999, 12: 324-332.Google Scholar - Liu J, Lee JPY, Li L, Luo Z, Wong KM: Online clustering algorithms for radar emitter classification.
*IEEE Trans. Pattern Anal. Mach. Intell*2005, 27(8):1185-1196.View ArticleGoogle Scholar - Cover T, Thomas J:
*Elements of Information Theory*. New York: Wiley; 2006.MATHGoogle Scholar - Caruana R: Multi-task learning.
*Mach. Learn*1997, 28(1):41-75. 10.1023/A:1007379606734MathSciNetView ArticleGoogle Scholar - Baxter J: Learning internal representations.
*Proceedings of the Eighth Annual Conference on Computational Learning Theory, Santa Cruz, CA, July 1995*Google Scholar - Baxter J: A model of inductive bias learning.
*J. Artif. Intell. Res*2000, 12: 149-198.MathSciNetMATHGoogle Scholar - Lawrence ND, Platt JC: Learning to learn with the informative vector machine.
*Proceedings of the 21st International Conference on Machine Learning, Banff, Alberta, July 2004, 65*Google Scholar - Yu K, Tresp V, Schwaighofer A: Learning Gaussian processes from multiple tasks.
*Proc. 22nd Int. Conf. Mach. Learn. (ICML 22), 2005*Google Scholar - Zhang J, Ghahramani Z, Yang Y: Learning multiple related tasks using latent independent component analysis.
*Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Dec 2006*Google Scholar - Ando RK, Zhang T: A framework for learning predictive structures from multiple tasks and unlabeled data.
*J. Mach. Learn. Res*2005, 6: 1817-1853.MathSciNetMATHGoogle Scholar - Evgeniou T, Micchelli CA, Pontil M: Learning multiple tasks with kernel methods.
*J. Mach. Learn. Res*2005, 6: 615-637.MathSciNetMATHGoogle Scholar - Burr D, Doss H: A Bayesian semiparametric model for random-effects meta-analysis.
*J. Amer. Stat. Assoc*2005, 100: 242-251. 10.1198/016214504000001024MathSciNetView ArticleMATHGoogle Scholar - Dominici F, Parmigiani G, Wolpert R, Reckhow K: Combining information from related regressions.
*J. Agric. Biolog. Environ. Stat*1997, 2(3):294-312. 10.2307/1400447MathSciNetView ArticleGoogle Scholar - Hoff PD: Nonparametric modeling of hierarchically exchangeable data, Technical report, University of Washington. 2003.Google Scholar
- Muller P, Quintana F, Rosner G: A method for combining inference across related nonparametric Bayesian models.
*J. R. Stat. Soc. Ser. B*2004, 66(3):735-749. 10.1111/j.1467-9868.2004.05564.xMathSciNetView ArticleMATHGoogle Scholar - Mallick BK, Walker SG: Combining information from several experiments with nonparametric priors.
*Biometrika*1997, 84(3):697-706. 10.1093/biomet/84.3.697View ArticleMATHGoogle Scholar - Tang L, Zhou Z, Shi L, Yao H, Ye Y, ZhangJ: Laplace prior based distributed compressive sensing.
*Proceeding of the 5th International ICST Conference on Communications and Networking in China, Beijing, Aug 2010*Google Scholar - Themelis KE, Rontogiannis AA, Koutroumbas KD: A Novel Hierarchical Bayesian Approach for Sparse Semisupervised Hyperspectral Unmixing.
*IEEE Trans. Signal Process*2012, 60(2):585-599.MathSciNetView ArticleGoogle Scholar - Bishop CM:
*Pattern Recognition and Machine Learning*. New York: Springer-Verlag; 2006.MATHGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.