 Research
 Open Access
 Published:
Sparsity estimation from compressive projections via sparse random matrices
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 56 (2018)
Abstract
The aim of this paper is to develop strategies to estimate the sparsity degree of a signal from compressive projections, without the burden of recovery. We consider both the noisefree and the noisy settings, and we show how to extend the proposed framework to the case of nonexactly sparse signals. The proposed method employs γsparsified random matrices and is based on a maximum likelihood (ML) approach, exploiting the property that the acquired measurements are distributed according to a mixture model whose parameters depend on the signal sparsity. In the presence of noise, given the complexity of ML estimation, the probability model is approximated with a twocomponent Gaussian mixture (2GMM), which can be easily learned via expectationmaximization.
Besides the design of the method, this paper makes two novel contributions. First, in the absence of noise, sufficient conditions on the number of measurements are provided for almost sure exact estimation in different regimes of behavior, defined by the scaling of the measurements sparsity γ and the signal sparsity. In the presence of noise, our second contribution is to prove that the 2GMM approximation is accurate in the large system limit for a proper choice of γ parameter. Simulations validate our predictions and show that the proposed algorithms outperform the stateoftheart methods for sparsity estimation. Finally, the estimation strategy is applied to nonexactly sparse signals. The results are very encouraging, suggesting further extension to more general frameworks.
Introduction
Compressed sensing (CS) [1, 2] is a novel signal acquisition technique that recovers an unknown signal from a small set of linear measurements. According to CS, if a signal having dimension n is known to be sparse, i.e., it has only k≪n nonzero entries when represented by a suitable basis, then it can be efficiently recovered using only m≪n linear combinations of the signal entries, provided that these linear projections are sufficiently incoherent with respect to the signal basis.
In most of CS applications, it is usually assumed that an upper bound on the sparsity degree k is known before acquiring the signal. However, some signals may have a timevarying sparsity, as in spectrum sensing [3], or spatially varying sparsity, as in the case of blockbased image acquisition [4]. Since the number of linear measurements required for the recovery depends on the sparsity degree of the signal [5], the knowledge of k is crucial to fully exploit the potential of CS.
In many recovery algorithms, the optimal tuning of parameters requires the knowledge of the degree of sparsity of the signal. For example, in Lasso techniques [6], a parameter λ related to k has to be chosen [7], whereas for greedy algorithms, such as orthogonal matching pursuit (OMP) [8] or compressive sampling matching pursuit (CoSaMP) [9], the performance and the number of iterations depend on k.
The ability to estimate the signal sparsity degree directly from a small number of linear measurements can represent an important tool in several promising applications. One of the most obvious applications is the possibility to dynamically adapt the number of measurements acquired by a CS instrument, e.g., an imager, to the estimated signal sparsity. We can envisage a system that acquires linear measurements in a sequential way and continuously updates the estimated sparsity according to the new measurements. The acquisition can stop as soon as the number of acquired measurements is enough to guarantee the correct reconstruction of a signal based on the estimated sparsity.
Other applications may include the possibility of comparing the support of two sparse signals from their measurements. Due to the linearity of the sparse signal model, the degree of overlap between the supports of two sparse signal can be estimated by measuring the sparsity degree of their sum (or difference) [10]. Finally, sparsity estimation can be used to decide whether a signal can be represented in a sparse way according to a specific basis, which can be used to select the most suitable basis allowing the sparsest representation.
Related work
The problem of estimating the sparsity degree has begun to be recognized as a major gap between theory and practice [11–13], and the literature on the subject is very recent.
In some papers, the joint problem of signal reconstruction and sparsity degree estimation is investigated, in particular for timevarying settings. The following iterative approach is considered: given an initial upper bound for the sparsity degree, at a generic time step t, the signal is reconstructed and sparsity degree is estimated; such estimation is then used at time t+1 to assess the number of measurements sufficient for reconstruction. The seminal work [14] investigates the problem in the framework of spectrum sensing for cognitive radios and proposes an iterative method that at each time step performs two operations: (a) the signal is recovered via Lasso, and (b) the sparsity degree is estimated as the number of recovery components with magnitude larger than an empirically set threshold. The efficiency of this procedure is validated via numerical simulations.
Some authors propose sequential acquisition techniques in which the number of measurements is dynamically adapted until a satisfactory reconstruction performance is achieved [15–19]. Even if the reconstruction can take into account the previously recovered signal, these methods require to solve a minimization problem at each newly acquired measurement and may prove too complex when the underlying signal is not sparse, or if one is only interested in assessing the sparsity degree of a signal under a certain basis without reconstructing it.
In other papers, the sparsity degree estimation is only considered, which generally requires less measurements than signal reconstruction. In [13], sparsity degree is estimated through an eigenvaluebased method, for wideband cognitive radios applications. In this work, the signal reconstruction is not required, while in practice, the used number of measurements was quite large. In [20], the sparsity of the signal is lowerbounded through the numerical sparsity, i.e., the ratio between the ℓ_{1} and ℓ_{2} norms of the signal, where these quantities can be estimated from random projections obtained using Cauchydistributed and Gaussiandistributed matrices, respectively. A limitation of this approach is that it is not suitable for adaptive acquisition since measurements taken with Cauchydistributed matrices cannot be used later for signal reconstruction. In [21], this approach is extended to a family of entropybased sparsity measures of kind (∥x∥_{q}/∥x∥_{1})^{q/(1−q)} with q∈[0,2], for which estimators are designed and theoretically estimated in terms of limiting distributions. In [22], the authors propose to estimate the sparsity of an image before its acquisition, by calculating the image complexity. However, the proposed method is based on the image pixel values and needs a separate estimation that does not depend on the measurements. Further, in [23], the minimum number of measurements to recovery, the sparsity degree was theoretically investigated.
Finally, we notice that the problem of estimating the sparsity degree of a vector is partially connected to the problem of estimating the number of distinct elements in data streams [24, 25], which has been largely studied in the last decades due to its diverse applications. The analogy lies in the fact that the sparsity degree problem could be seen as the estimation of the number of elements distinct from zero. Moreover, many efficient algorithms to estimate the number of distinct elements are based on random hashing (see [25] for a review) to reduce the storage space, which is our concern as well. However, the problem of distinct elements considers vectors a=(a_{1},…,a_{n}) with a_{i}∈Q, where Q is a finite set, which is intrinsically different from our model where the signal x has realvalued components. Therefore, the strategies conceived for this problem cannot be applied for our purpose.
Our contribution
In this paper, we propose a technique for directly estimating the sparsity degree of a signal from its linear measurements, without recovery. The method relies on the fact that measurements obtained by projecting the signal according to a γsparsified random matrix are distributed according to a mixture density whose parameters depend on k. This is an extension of the algorithm in [26], which works only in the case of noisefree, exactly ksparse signals. First, we analyze the case of noisefree, exactly ksparse signals as a special case, and we provide theoretical guarantees regarding the consistency of the proposed estimator and its asymptotic behavior under different regimes of the parameters k and γ. Then, we analyze the more generic case of noise, including the nonexactly sparse signals, and we propose to approximate the measurement model by a twocomponent Gaussian mixture model (2GMM), whose parameters can be easily estimated via expectationmaximization (EM) techniques. In this case, we prove that there is a regime of behavior, defined by the scaling of the measurement sparsity γ and the sparsity degree k, where this approximation is accurate. An interesting property of the proposed method is that measurements acquired using a γsparsified random matrix also enable signal reconstruction, with only a slight performance degradation with respect to dense matrices [27, 28].
Some preliminary results, limited to the sparsity estimation of noisy, exactly ksparse ternary signals, have appeared in [29]. In this paper, we extend the results in [29] from both a theoretical and a practical point of view, by considering any ksparse signal and extending the model to nonexactly sparse signals.
Outline of the paper
The paper is organized as follows. Section 2 presents the notation and a brief review of CSrelated results. The sparsity estimation problem is formally introduced in Section 3, where we discuss the optimal estimator, whereas the main properties of the optimal estimator in the noisefree setting are outlined in Section 4. In Section 5, we introduce the proposed iterative algorithm for dealing with the noisy setting, together with some approximate performance bounds. Finally, the proposed estimators are experimentally validated in Section 6, while concluding remarks are given in Section 7.
Preliminaries
In this section, we define some notation, we review the CS fundamentals, and we briefly discuss the use of sparsified matrices in the CS literature.
Notation
Throughout this paper, we use the following notation. We denote column vectors with small letters, and matrices with capital letters. If \(x\in \mathbb {R}^{n}\), we denote its jth element with x_{j} and, given S⊆[n]:={1,…,n}, by x_{S}, the subvector of x corresponding to the indices in S. The support set of x is defined by supp(x)={i∈[n]:x_{i}≠0} and we use ∥x∥_{0}=supp(x). Finally, the symbol ∥x∥ with no subscript has always to be intended as the Euclidean norm of the vector x.
This paper makes frequent use of the following notation for asymptotics of real sequences \((a_{n})_{n\in \mathbb {N}}\) and \((b_{n})_{n\in \mathbb {N}}\): (i) a_{n}=O(b_{n}) for n→∞ if there exists a positive constant c∈(0,+∞) and \(n_{0}\in \mathbb {N}\) such that a_{n}≤cb_{n} for all n>n_{0}, (ii) a_{n}=Ω(b_{n}) for n→∞ if there exists a constant c^{′}∈(0,+∞) and \(n_{1}\in \mathbb {N}\) such that a_{n}≥c^{′}b_{n} for all n>n_{0}, (iii) a_{n}=Θ(b_{n}) for n→∞ if a_{n}=O(b_{n}) and a_{n}=Ω(b_{n}), and (iii) a_{n}=o(b_{n}) for n→∞ means that \({\lim }_{n\rightarrow \infty } a_{n}/b_{n} = 0\).
Given a random variable, we denote the probability density function with f.
Sparse signal recovery using sparse random projections
Let \(x\in \mathbb {R}^{n}\) be an unknown deterministic signal. CS [30] aims to recover a signal from a small number of nonadaptive linear measurements of the form
where \(y\in \mathbb {R}^{m}\) is a vector of observations, \(A\in \mathbb {R}^{m\times n}\) is the sensing matrix with m<n, \(\eta \in \mathbb {R}^{m}\) is an additive Gaussian noise N(0,σ^{2}I_{m×m}), and I_{m×m} is the identity matrix with m rows, and m is the columns. Since the solution to (1) is not unique, the signal is typically assumed to be sparse, i.e., it can be represented with k nonzero coefficients, or compressible, in the sense that it can be well approximated by a vector having only k nonzero coefficients. In the following, we refer to k as the signal sparsity degree and we denote the set of signals with exactly k nonzero components as \(\Sigma _{k}=\{v\in \mathbb {R}^{n}:\v\_{0}\leq k\}\).
The literature describes a wide variety of approaches to select the sparsest solution to the affine system in (1). In particular, a large amount of work in CS investigates the performance of ℓ_{1} relaxation for sparse approximation.
The problem of recovery can be analyzed in deterministic settings, where the measurement matrix A is fixed, or in random settings in which A is drawn randomly from a subGaussian ensemble. Past work on random designs has focused on matrices drawn from ensemble of dense matrices, i.e., each row of A has n nonzero entries with high probability. However, in various applications, sparse sensing matrices are more desirable [31]. Furthermore, sparse measurement matrices require significantly less storage space, and algorithms adapted to such matrices have lower computational complexity [32, 33]. In [27], the authors study what sparsity degree is permitted in the sensing matrices without increasing the number of observations required for support recovery.
In this paper, we consider γsparsified matrices [27], in which the entries of the matrix A are independently and identically distributed according to
where δ_{0} denotes a Dirac delta centered at zero.
Since weak signal entries could be confused with noise, in [27], the support recovery is studied also as a function of the minimum (in magnitude) nonzero value of x:
Consequently, for a fixed λ>0, let us define:
For this class of signals, the following result has been proved.
Theorem 1
(Corollary 2 in [27]) Let the measurements matrix \(A\in \mathbb {R}^{m}\) be drawn with i.i.d. elements from the γsparsified Gaussian ensemble. Then, a necessary condition for asymptotically reliable recovery over the signal class \(\mathcal {X}_{k}(\lambda)\) is
where
In particular, Theorem 1 says that if γk→∞ as n→∞, then the number of measurements is of the same order as that for dense sensing matrices. In sharp contrast, if γk→0 sufficiently fast as n→∞, then the number of measurements of any decoder increases dramatically. Finally, if γk=Θ(1) and λ^{2}k=Θ(1), then at least max{Θ(k log(n/k)),Θ(k log(n−k)/ logk)} measurements are necessary for estimating the support of the signal.
Several recovery algorithms are based on the use of sparse sensing matrices. In particular, countminute sketch algorithms need about 10 to 15 times more measurements than ℓ_{1}decoding and sparse matching pursuit needs about half of the measurements of countmin sketch. Other sketch algorithms include [34] that can be as accurate as ℓ_{1} decoding with dense matrices under the condition γk=Θ(1) with the same order of measurements.
Sparsity estimation problem: mathematical formulation
Our goal is to estimate k from the measurements y=Ax, where A is a γsparsified matrix, without the burden of reconstructing x. Specifically, we aim at providing conditions on the triplet (n,m,k) as well as on x and A under which the estimation of signal sparsity is accurate with high probability. The theoretical results that we provide also hold true for high dimensional settings, that is, (n,m,k), and are allowed to tend to infinity.
Given a rule for computing estimates of the signal sparsity, we will measure the error between the estimate \(\widehat {k}(m,n)\) and the true sparsity degree k using the relative error:
We say that the sparsity estimator \(\widehat {k}\) is asymptotically weakly consistent when \(e\left (k,\widehat {k}\right)\) converges in probability to 0 as m,n→∞. If we replace convergence in probability with almost sure convergence, then the estimator is said to be strongly consistent.
If the signals are not exactly sparse but compressible, i.e., they admit a representation with few large components in magnitude; in CS literature, the recovery guarantees are expressed in terms of the sparsity degree of the bestk approximation [30, 35] defined as follows
For this reason, the sparsity of a not exactly sparse signal is defined as the number of components containing most of the energy up to a relative error τ
Then, defining \(e=x\widehat {x}_{k_{\tau }}\), we write
where η=Ae. It should be noticed that each component η_{i} is distributed as is a mixture of Gaussians. We make the following approximation: η_{i}∼N(0,σ^{2}) with
In the noiseless case, by linearity of expectation, we have
from which
where the last inequality follows from \(\mathbb {E}\left [A_{ij}\right ]=0\) and \(\mathbb {E}\left [A_{ij}^{2}\right ]=1\) for all i,j. Then, the model describing the measurements can be approximated by (1) with σ^{2}≈τ∥y∥^{2}/m. We underline that this argument is true for all sensing matrices drawn from the ensemble in (2), and at this time, we do not make any additional assumption on the number of measurements.
Given (y,A) and assuming that the perturbation is additive Gaussian, the ML estimator of the signal sparsity can be obtained via the following exhaustive search:
However, this optimization problem is NP hard and the search of the solution requires an exponential time in the signal length n (one optimization problem for all subsets of [ n] of size s and for all s, which amounts to \(\sum _{s=1}^{n} {n\choose s} = 2^{n} 1\)).
Given supp(x), if A is chosen from the ensemble of γsparsified matrices, any measurement
is a random variable whose density function is a mixture of Gaussians with 2^{k} components. The result follows from the following argument. Let S be the overlap between the support of ith row of A and supp(x). It is easy to see that given S then y_{i}∼N(0,α_{S}) where α_{S}=∥x_{S}∥^{2}/γ+σ^{2}. Without any further assumption, taking into account all possible overlaps between the support of the ith row of A and support of the signal x with cardinality s≤k, we can have in principle \(\sum _{s\leq k}{k\choose s}=2^{k}\) different type of Gaussians. We conclude that y_{i} is a Gaussian mixture with 2^{k} components. If the nonzero elements of the signal have all equal values in magnitude, then the number of components of the Gaussian mixture reduces dramatically to k. Given the set of m independent and identically distributed samples y=(y_{1},…,y_{m})^{⊤}, the sparsity estimation can be recast into the problem of evaluating the number of mixture components and parameters. However, also in the simple case where k is known, the estimation of the finite mixture density function does not admit a closedform solution, and the computational complexity is practically unfeasible.
Method: noisefree setting
In this section, we show that in the absence of noise (i.e., ∥η∥=0), ∥y∥_{0} is a sufficient statistic for the underlying parameter k. We show that the performance of the proposed estimators of the sparsity degree depends on the SNR=λ^{2}k/σ^{2} and that the traditional measure ∥x∥^{2}/σ^{2} has no significant effect in the estimation of the sparsity degree.
Even in the absence of noise, since A is chosen from the ensemble of γsparsified matrices, any measurement \(y_{i}=\sum _{j\in [n]}A_{ij}x_{j} \) is still a random variable. The ML solution provides the following estimator of the signal sparsity.
Proposition 1
Let us define
Then, the ML estimate of the signal sparsity is
The estimator derived in proposition 1 has already been proposed in [26] for estimating the degree of sparsity. In the following, we will denote the estimator in (16) as oracle estimator since it is equivalent to estimating k in the presence of an oracle who knows which entries in y are only due to noise. In our analysis, we prove that the oracle estimator is asymptotically strongly consistent, i.e, with the property that as the number of measurements increases indefinitely, the resulting sequence of estimates converges almost surely to the true sparsity degree (see Theorem 2). This means that the density functions of the estimators become more and more concentrated near the true value of the sparsity degree.
Given a sequence of events \(\{E_{m}\}_{m\in \mathbb {N}}\), we denote with \(\limsup _{m\rightarrow \infty }E_{m}\) the set of outcomes that occur infinitely many times. More formally,
Theorem 2
Let p_{k}=1−(1−γ)^{k}, then
where \(\xi _{k}={(1p_{k})}\left (1\mathrm {e}^{\epsilon \log (1p_{k})}\right).\) Moreover, let E_{m} be the sequence of events
with \(m\in \mathbb {N}\) and ρ>1/2, then
Remark 1
From Theorem 2, we deduce that almost surely (i.e., with probability 1) the relative error between the estimated sparsity and the true value of the sparsity degree is
for all but finitely many m.
Asymptotic analysis of ML estimator
We analyze now the performance of the oracle estimator in the large system limit, when n,k,and m tend to infinity. Since we are dealing with sparse signals, we assume that the sparsity degree k scales at most linearly in the signal length, i.e., the relative sparsity k/n≤ρ is kept bounded with ρ≪1. The following theorem shows sufficient conditions on the number of measurements for weak consistency in different regimes of behavior, defined by the scaling of the measurement sparsity γ and the signal sparsity k.
Theorem 3
Let ψ(k)=γk=o(k) as k→∞ and define the function g(k) as follows:

a)
if ψ(k)→∞ as k→∞, then g(k)=Ω(e^{2ψ(k)});

b)
if ψ(k)=Θ(1) as k→∞, then g(k)→∞ for k→∞;

c)
if ψ(k)=o(1) as k→∞, then g(k)=Ω(ψ(k)^{−2(1+ε)}), for any ε>0.
If the number of measurements is such that \(\frac {m}{\log m}\geq g(k)\), then
where
for some constant ρ>1/2.
In the following theorem, we show that, under stricter conditions, strong consistency is also ensured.
Theorem 4
Let ψ(k)=γk and define the function g(k) as in Theorem 3. If the number of measurements is such that
then
Remark 2
Theorems 3 and 4 characterize the regimes in which measurement sparsity begins to improve the estimation of the signal sparsity. The function ψ(k)=γk represents the average number of nonzeros in each row of A that align with the support of the signal x. This analysis reveals three cases of interest, corresponding to whether measurement sparsity has no effect, a small effect, or a significant effect on the number of measurements sufficient for asymptotic consistency. If ψ(k)=Θ(1) as k→∞, then m=Ω(k) measurements are sufficient for the concentration result. In sharp contrast, if ψ(k)→∞ as k→∞, then the number of measurements guaranteeing the asymptotic consistency is exponential in ψ(k), meaning that, in order to be sure to get an unbiased estimator with k measurements, we need \(\psi (k)\leq \frac {1}{2}\left (\log k\log (\log k)\right)\). If ψ(k)→0, then the condition \(\psi (k)\geq \sqrt [2+\epsilon ]{\frac {\log (k)}{k}}\) with ε>0 is sufficient to get an unbiased estimator with k measurements.
Remark 3
Theorems 3 and 4 suggest that in order to obtain a good estimation of the sparsity degree, we need sufficiently sparse matrices, but not too sparse. On the other hand, at the same time, the use of sparser matrices requires more measurements for a successful recovery of the signal. If we combine the results obtained in Theorems 3 and 4 with those provided in Theorem 1, we notice there is a large range for γ (provided by \(c\leq \psi (k)\leq \frac {1}{2}\left (\log k\log (\log k)\right)\) with c>0) where both sparsity estimation and the recovery can be successful. We will provide more details on how to choose γ for specific applications in Section 6.
The proofs of Theorems 3 and 4 are postponed to the Appendix.
Method: noisy setting
As already noticed, in the generic noisy setting, the estimation of signal sparsity via an exhaustive ML is unfeasible. A possible approach is to resort to the wellknown EM algorithm [36]. This algorithm can find ML solutions to problems involving observed and hidden variables, and in the general, setting is known to converge to a local maximum of the likelihood.
In this section, we prove that, under suitable conditions, the distribution of the measurements can be well approximated by a twocomponent Gaussian mixture that can be easily learned by EM algorithm. Finally, we show that the case of notexactly sparse signals can be well approximated by the same model.
2GMM approximation for large system limit
Our main goal is to show that the Gaussian mixture model that describes the measurements can be simplified in the large system limit as n,k→∞. The next theorem reveals that there is a regime of behavior, defined by the scaling of the measurement sparsity γ and the signal sparsity k, where the measurements can be approximately described by a twocomponent Gaussian mixture model (2GMM). We state this fact formally below. Recall that, given two density functions f,g, their Kolmogorov distance is defined as
Theorem 5
Let supp(x)=k and ϕ(ζσ^{2}) be the probability density function of a normally distributed random variable with expected value 0 and variance σ^{2}, i.e.,
Given a set S, let α_{S}=∥x_{S}∥^{2}/γ+σ^{2} and p_{S}=(1−γ)^{k−S}γ^{S}. Let us consider the density functions (the subscript is to emphasize the dependence on parameter k)
Then, there exists a constant \(C\in \mathbb {R}\) such that
The proof of Theorem 5 is postponed to the Appendix. As a simple consequence, we obtain that, under suitable conditions, the approximation error depends on ψ(k).
Corollary 1
Let ψ(k)=γk and \(f_{k}, f_{k}^{\textrm {2GMM}}\) be the sequence of density functions defined in (27) and (28). Then, there exists a constant \(C^{\prime }\in \mathbb {R}\) such that
with C^{′}≈0.03, \(\lambda _{\max }=\max \limits _{i:x_{i}\neq 0}x_{i}\), \(\lambda _{\min }=\min \limits _{i:x_{i}\neq 0}x_{i}\).
Corollary 1 shows that the error in the approximation can be controlled by parameter ψ(k). Some considerations are in order. Consider for example a ksparse signal with all nonzero components equal in modulus, i.e., with λ_{max}=λ_{min}. Then, the bound reduces to \( \left \f_{k}f^{\mathrm {2GMM}}_{k}\right \_{\mathrm {K}}\leq C\left (\psi (k)+\psi (k)^{2}\right)\). We can see that if ψ(k)=γk→0, then the Kolmogorov distance goes to zero. However, as suggested by Theorem 1, we expect to need more measurements m to perform a good estimation of the sparsity degree. The best regime is when ψ(k)=Θ(1) as k→∞: in that case, the distance remains bounded and we expect that a number of measurements proportional to k is sufficient for the sparsity estimation (suppose, for example, that γ=3/k. Then, we expect \(\left \f_{k}f^{\mathrm {2GMM}}_{k}\right \_{\mathrm {K}}< 0.36\)). For signals with λ_{max}≠λ_{min} similar considerations can be done if λ_{max} and λ_{min} scale similarly as a function of k.
Sparsity estimation via EM
Using the approximation in Theorem 5, we recast the problem of inferring the signal sparsity as the problem of estimating the parameters of a twocomponent Gaussian mixture, whose joint density function of \(y\in \mathbb {R}^{m}\) and hidden class variables z∈{0,1}^{m} is given by
with i∈ [m]. Starting from an initial guess of mixture parameters α(0),β(0),andp(0), the algorithm that we propose (named EMSp and summarized in Algorithm 1) computes, at each iteration \(t\in \mathbb {N}\), the posterior distribution
(EStep) and reestimates the mixture parameters (MStep) until a stopping criterion is satisfied. Finally, the estimation of the signal sparsity is provided by
The sequence of signal sparsity estimations k(t) generated by Algorithm 1 converges to a limit point. For brevity, we omit the proof, which can be readily derived from standard convergence arguments for dynamical systems [37]. In the following, we will denote Algorithm 1 as EMSparse (EMSp).
The CramérRao bound for 2GMM
The CramérRao (CR) bound is a popular lower bound on the variance of estimators of deterministic parameters. Given a parameter ξ and an unbiased estimator \(\widehat {\xi }\), let f(x;ξ) be the likelihood function. The CR bound is given by
that is, the inverse of the Fisher information.
The EMSp algorithm, for measurements that can be exactly modeled as a 2GMM, and for a large number of measurements, would be asymptotically optimal and unbiased and achieve a performance very close to the CR bound. However, because of the presence of noise in the data and the approximation of the 2GMM model, we expect that the estimator provided by EMSp algorithm will be biased. A theoretical analysis of the bias in terms of these two factors is hard to carry out. In the following, we analyze the performance of EMSp in the estimation of the 2GMM parameters via the CR bound, which gives us an indication of how much the nonidealities of the model affect the performance of the proposed estimator.
Let us consider a 2GMM framework, in which two zeromean Gaussians with known variances α and β are given, and let us call p the mixture parameter. The likelihood function is f(x;p)=(1−p)ϕ(xα)+pϕ(xβ), and
is the CR bound for the ML estimator \(\widehat {p}\) of p. The stochastic mean cannot be computed in a closed form, but can be approximated with a Monte Carlo method.
CR\((\widehat {p})\) represents a benchmark to evaluate the accuracy of our estimation of p via EMSp, as will be done in Section 6.2.
Results and discussion
In this section, we illustrate the performance of the proposed estimators through extensive numerical simulations^{Footnote 1}. We present experiments both in the noisefree setting and in nonideal settings, where signals are not exactly sparse or measurements are affected by noise. Finally, an application where sparsity estimation improves the efficiency of signal recovery in CS is proposed.
Noisefree measurements
We start testing signals that are exactly sparse and measurements that are not affected by additive noise.
We evaluate the estimation accuracy in terms of empirical probability of correct estimation: a run is considered successful if \(e\left (k,\widehat {k}\right)<5\times 10^{2}\) where \(\widehat {k}\) is the estimated sparsity. In Fig. 1, we show results averaged over 1000 random instances, obtained by generating different sensing matrices from the γsparsified Gaussian ensemble. We underline that the values of the nonzero entries of the signals (which are drawn from a standard Gaussian distribution for this experiment) do not affect the performance in the noisefree case. Similarly, the length n of the signal plays no role in the estimation (see Proposition 1).
The empirical probability of correct estimation is studied as a function of m and k for three different regimes of parameter ψ(k) defined in Remark 2 (see Fig. 1) :

a)
\(\psi (k)=\frac {1}{2}(\log (k)\log (\log k))\);

b)
ψ(k)=1;

c)
\(\psi (k)=\sqrt [3]{{\log k}/{k}}\).
According to Theorem 3 (see also Remark 2), when m≥k, the relative error between the estimated sparsity and the true value of the sparsity degree tends to zero almost surely (i.e., with probability 1). This can be appreciated also in the numerical results in Fig. 1, where the line m=k is drawn for simplicity. Moreover, we can see that for any fixed k, the error decreases when m increases.
Noisy measurements
In the second experiment, we show the performance of the EMSp algorithm when measurements are noisy according to the model proposed in (1) and we compare to the numerical sparsity estimator [20]. In order to have a fair comparison, we perform this test on ternary signals in {−λ,0,λ}^{n} for which sparsity and numerical sparsity coincide. We then consider random sparse signals with nonzero entries uniformly chosen in {λ,−λ}, \(\lambda \in \mathbb {R}\), and SNR=λ^{2}k/σ^{2} (see definition in Section 3). Moreover, we set ψ(k) constant in order to focus on the effects of the additive noise in the estimation.
We remark that we compare only to [20] because, as illustrated in Section 1.1, the other proposed algorithms for sparsity degree estimation are based on signal reconstruction [14] (requiring a larger number of measurements and increased complexity, which would give an unfair comparison) or are conceived for very specific applications [13, 22].
In Fig. 2, we show the mean relative error (MRE) defined as
for different values of k and m in settings with SNR=10 dB and ψ(k)=1/10. We appreciate that, in the considered framework, EMSp always outperforms the method based on the numerical sparsity estimation.
In Fig. 3, we set k=1000, ψ(k)=1/3, and we vary the SNR from 0 to 40 dB, while m∈{800,1000,2000,5000}. Again, we see that EMSp outperforms [20]. We specify that a few tens of iterations are sufficient for the convergence of EMSp.
Finally, we compare the performance of EMSp with an oracle estimator designed as follows: we assume to know exactly the variances α and β and we generate measurements y_{i} distributed according to 2GMM (1−p)ϕ(y_{i}α)+pϕ(y_{i}β), for i=1,…,m; given the sequence y, α, and β, we then compute the ML estimate of p via EM. We name this estimator EM oracle. Comparing the estimates of p of EMSp and EM oracle, we can check if our 2GMM approximation is reliable. We clearly expect that EM oracle performs better, as the measurements are really generated according to a 2GMM, and also the true α and β are exploited. However, our results show that the 2GMM approximation is trustworthy. In Fig. 4, we depict the sample variance of the estimator \(\widehat {p}\) of p (obtained from 1000 runs) of EMSp and EM oracle. We show also the CR bound (see Section 5.3), which represents a performance lower bound for the estimation of p. As explained in Section 5.3, the stochastic mean required in the CR bound for 2GMM cannot be analytically computer and is here evaluated via Monte Carlo.
In both graphs of Fig. 4, we set k=1000, m=k in Fig. 4a, and ψ(k)=1/10 in Fig. 4b. We notice that in the considered regimes, EM oracle and CR bound are very close and not really affected by the SNR. Regarding EMSp, we observe that (a) keeping k,γ fixed, EMSp gets closer to the optimum as the SNR increases; and (b) keeping k,m fixed, we can find an optimal γ that allows us to get very close to the optimum.
Compressibility of real signals
In this section, we test our EMSp algorithm to evaluate the compressibility of real signals. Specifically, we consider images which are approximately sparse in the discrete cosine transform (DCT) domain, that is, they are well represented by few DCT coefficients. Our aim is to estimate the number k of significant DCT coefficients. More precisely, we seek the minimum k such that the bestk approximation \(\widehat {x}_{k}\) has a relative error smaller approximately τ, namely \({\left \\widehat {x}_{k}x \right \_{2}^{2}}\leq \tau {\left \x \right \_{2}^{2}}\). Since DCT coefficients of natural images usually have a powerlaw decay of the form x_{i}≈c/i, the following approximation holds
and since according to theoretical derivation γ∝1/k, we fix γ∝τ. Tuning γ proportionally to τ allows to adapt better the sparsity of the sensing matrix to the expected signal sparsity: for larger τ’s, we expect smaller k^{′}s, which call for larger γ to have the sufficient matrix density to pick the nonzero entries.
In the proposed experiments, we fix γ=cτ with c=5·10^{−2} and we initialize \(\pi _{i}(0)=\frac {1}{2}\) for all i=1,…,m, while we set β and α respectively as the signal energy and the noise energy (namely, the error of the bestk approximation), evaluated from the measurements: \(\beta =\frac {\left \y\right \_{2}^{2}}{m}\) and α=τβ.
In Fig. 5, we depict the results on three n=256×256 images (shown in Fig. 6) represented in the DCT basis (DCT is performed on 8×8 blocks). Specifically, we show original and estimated sparsity (the yaxis represents the ratio k/n), averaged on 100 random sensing matrices. The images have been chosen with different compressibilities, to test our method in different settings. We appreciate that for all the images and across different values of τ, we are able to estimate k with a small error. This experiment shows then that EMSp can be practically used to estimate the compressibility of real signals.
Sparsity estimation for signal recovery
We have already remarked that the knowledge of the sparsity degree k is widely used for signal recovery in CS. In this section, we consider CoSaMP [9], an algorithm which can recover a sparse signal (exactly or with a bounded error, in the absence and in the presence of noise, respectively) if a sufficient number of measurements m is provided, and assuming the knowledge an upper bound k_{max} for k. Our aim is to show that EMSp can be used as a preprocessing for CoSaMP when k is not known; specifically, we estimate k to design the number of measurements necessary for CoSaMP recovery. Subsequently, we denote this joint procedure as EMSP/CoSaMP.
We compare CoSaMP with EMSp/CoSaMP in the following setting. We consider a family \(\mathcal {S}\) of signals of length n=1600 and (unknown) sparsity k∈{20,200} (then, k_{max}=200). The value of k and the position of the nonzero coefficients are generated uniformly at random, and the nonzero values are drawn from a standard Gaussian distribution. Since k is not known, the number of measurements needed by CoSaMP has to be dimensioned on k_{max}: assuming SNR=30 dB, from the literature, we get that m_{C}=4k_{max} are sufficient to get a satisfactory recovery using dense Gaussian sensing matrices. In our specific setting, we always observe a mean relative error MRE\(_{\text {rec}} =\left \x\widehat {x}\right \_{2}/\left \x\right \_{2}<5.5\times 10^{2}\) (for each k∈{20,200}, 100 random runs have been performed).
We propose now the following procedure.

1
First sensing stage and sparsity estimation: we take m_{S}≪m_{C} measurements via γsparsified matrix in (2), and we provide an estimate \(\widehat {k}\) of k using Algorithm ??.

2
Second sensing stage and recovery: we add a sufficient number of measurements (dimensioned over \(\widehat {k}\)) and then perform CoSaMP recovery.
Specifically, the following assessments have been proved to be suitable for our example:

We estimate k with EMSp from \(m_{S}=\frac {k_{\max }}{2}\) sparsified measurements, with γ=6/k_{max};

Since underestimates of k are critical for CoSaMP, we consider \(\widehat {k}\) equal to 2 times the estimate provided by EMSp;

We add \(m_{A}=4\widehat {k}\) Gaussian measurements, and we run CoSaMP with the soobtained sensing matrix with m_{S}+m_{A} rows. When m_{S}+m_{A}>m_{C}, we reduce the total number of measurements to m_{C}.
We show the results averaged over 100 random experiments. In Fig. 7, we compare the number of measurements used for recovery, as a function of the sparsity degree k: a substantial gain is obtained in terms of measurements by EMSp/CoSaMP, with no significant accuracy loss. In Fig. 8, we can see that CoSaMP and EMSp/CoSaMP algorithms achieve similar MRE _{rec}.
Conclusions
In this paper, we have proposed an iterative algorithm for the estimation of the signal sparsity starting from compressive and noisy projections obtained via sparse random matrices. As a first theoretical contribution, we have demonstrated that the estimator is consistent in the noisefree setting and we characterized its asymptotic behavior for different regimes of the involved parameters, namely the sparsity degree k, the number of measurements m, and the sensing matrix sparsity parameter γ. Then, we have showed that in the noisy setting, the projections can be approximated using a 2GMM, for which the EM algorithm provides an asymptotically optimal estimator.
Numerical results confirm that the 2GMM approach is effective for different signal models and outperforms methods known in the literature, with no substantial increase of complexity. The proposed algorithm can represent a useful tool in several applications, including the estimation of signal sparsity before reconstruction in a sequential acquisition framework, or the estimation of support overlap between correlated signals. An important property of the proposed method is that it does not rely on the knowledge of the actual sensing matrix, but only on its sparsity parameter γ. This enables applications in which one is not interested in signal reconstruction, but only in embedding the sparsity degree of the underlying signal in a more compact representation.
Appendix
Proofs of results in Section 4
Proof of Proposition 1
It should be noticed that \( \mathbb {P}(\omega ^{\star }_{i}=0)=(1\gamma)^{\theta n}\), where θ∈[ 0,1] is the parameter to be optimized. Since the rows of the matrix A are independent, 5 so are \(\omega _{i}^{\star }\), and considering that the event \(\omega ^{\star }_{i}=0\) is equivalent to the event that the support of ith row of A is orthogonal to the support of signal x, the ML estimation computes
from which
We conclude that \(\widehat {k}_{o}=\widehat {\theta }_{o}n\). □
Proof of Theorem 2
Let us consider \(\omega ^{\star }_{i}\) as defined in (15) and let \(\widehat {p}_{k}=\frac {\\omega ^{\star }\_{0}}{m}\) where the index emphasizes the dependence on the sparsity degree. We thus have:
Since k log(1−γ)=log(1−γ)^{k}=− log(1−p_{k}), we obtain
with
It should be noticed that \(p_{k}=\mathbb {E}\left [\widehat {p}_{k}\right ]\), hence applying the ChernoffHoeffding theorem [38], the above tail probability is upper bounded as
and we obtain the first part of the statement. Choosing
for some ρ>1/2, we get
and from BorelCantelli Lemma [39], we conclude that
□
Proof of Theorem 3
From Theorem 2, we have
and combining the hypothesis m/ log(m)≥g(k), we get
where the last inequality is obtained noticing that logm≥2 definitely as m→∞. We distinguish now the different cases

a)
If ψ(k)→∞ as k→∞, then the function g is defined as g(k)=Ω(e^{2ψ(k)}) from which we get that also g(k)→∞

b)
If ψ(k)=Θ(1) as k→∞, then the function g is defined as g(k)→∞ for k→∞ from which we get that also g(k)→∞;

c)
If ψ(k)=o(1) as k→∞, then the function g is defined as g(k)=Ω(ψ(k)^{−2(1+ε)}), for any ε>0 from which we get that also g(k)→∞
Since in all cases (a), (b), and (c), the function g(k)→∞, we conclude \(\mathbb {P}\left (\left \{e\left (\widehat {k}_{o},k\right)\geq \epsilon _{k}\right \}\right)\longrightarrow 0\) and the Eq. (22) can be deduced.
We now prove that ε_{k} tends to zero as k→∞. Notice that being ψ(k)=o(k) as k→∞
as k→∞. We have
We have

a)
If ψ(k)→∞ then ε_{k}=O(ψ(k)^{−1});

b)
If ψ(k)=Θ(1) then ε_{k}=O(g(k)^{−1/2});

c)
If ψ(k)→0 then ε_{k}=O(ψ(k)^{ε}).
We conclude that in all three cases (a), (b), and (c) the threshold \( \epsilon _{k}\stackrel {k\rightarrow \infty }{\longrightarrow }0. \)
Proof of Theorem 4
Let ε_{k} be defined as in (23). From Lemma 2, we have, for some ρ>1/2,
where the last inequality is obtained noticing that
definitely. Since 2ρ>1, from the BorelCantelli lemma, we deduce that
Being log(m)/m≥g(k), then ε_{k}→0 as k→∞, and we conclude that
Proof of Theorem 5
In this section, we prove Theorem 5.
Lemma 1
Let A be chosen from the γsparsified Gaussian ensemble uniformly at random and y be given in (14), p_{k}=1−(1−γ)^{k}, and then
Proof
We recall \(y_{i}=\sum _{j=1}^{n}A_{ij}x_{j}+\eta _{i}\) with η_{i}∼N(0,σ^{2}). As already noticed throughout the paper, the measurement y_{i} is a mixture of Gaussians with zero mean and variance depending on the overlap between the support of the ith row of A and supp(x). Suppose that S⊆supp(x) is this overlap which happens with probability p_{S}=(1−γ)^{k−S}γ^{S}, then the variance of the Gaussian is given by \(\alpha _{S}=\frac {\x_{S}\^{2}}{\gamma }+\sigma ^{2}\). Standard computations lead to
We notice that, fixed a component ℓ∈supp(x), we have exactly \({k1\choose s1} \) possible sets of cardinality s containing ℓ, i.e., the number of selections of the remaining s−1 objects among k−1 positions. This observation and the fact x_{ℓ}=0,∀ℓ∉supp(x) leads to
We compute now
As before we notice that, fixed the component ℓ, we have exactly \({k1\choose s1} \) possible sets of cardinality s containing ℓ. Analogously, the couple ℓ,j with ℓ≠j is contained in \({k2\choose s2}\) possible sets of cardinality s. We thus have
We conclude
□
Proof of Theorem 5
Let ψ(k)=γk and consider the sequence of probability density functions
where \(\alpha _{S}=\frac {\x_{S}\^{2}}{\gamma }+\sigma ^{2}\) and p_{S}=(1−γ)^{k−S}γ^{S}. Let \(\bar {\alpha }=\mathbb {E}\left [\text {Var}(y_{i})\omega _{i}=1\right ]=\sigma ^{2}+\frac {\x\^{2}}{p_{k}}\) (see Lemma 1) and denote \(\mathcal {S}\) the set of possible subsets of supp(x), we thus have
where erf is the Gauss error function. Let \(g:\mathbb {R}^{2}\rightarrow \mathbb {R}\) be the function
and by the Lagrange Theorem [40], we obtain
with \(\xi (\alpha)\in (\min \{\bar {\alpha },\alpha \},\max \{\alpha,\bar {\alpha }\})\). It should be noticed that the firstorder term in Taylor’s expansion of g(t;α) vanishes due to conditional mean result from Theorem 2
with \(\xi (\alpha _{S})\in (\min \{\bar {\alpha },\alpha \},\max \{\alpha,\bar {\alpha }\})\subseteq (\lambda ^{2}/\gamma,\x\^{2}/\gamma)\). Putting this expression into (87), we obtain
where
and
Through standard computations, we see that the maximizing value is obtained for \(t=\sqrt {(3\sqrt {6})\xi }\)
with
Finally, considering
and using Lemma 1, we conclude
with C^{′}=C/4.
Proof of Corollary 1
From Theorem 5, we have
where λ_{min}= minix_{i} and \(\lambda _{\max }=\max _{\{i:\ x_{i}\neq 0\}}x_{i}\). The assertion is proved with C^{′}=C(λ_{max}/λ_{min})^{4}.
Notes
 1.
The code to reproduce the simulations proposed in this section is available at https://github.com/sophie27/sparsityestimation
Abbreviations
 2GMM:

Twocomponent Gaussian mixture model
 CS:

Compressed sensing
 CoSaMP:

Compressive sampling matching pursuit
 DCT:

Discrete cosine transform
 EM:

Expectationmaximization
 ML:

Maximum likelihood
 MRE:

Mean relative error
 OMP:

Orthogonal matching pursuit
 SNR:

Signaltonoise ratio
References
 1
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory. 52(4), 1289–1306 (2006).
 2
E. Candes, T. Tao, Near optimal signal recovery from projection: universal encoding strategies?. IEEE Trans. Inf. Theory. 52(12), 5406–5425 (2006).
 3
F. Zeng, L. Chen, Z. Tian, Distributed compressive spectrum sensing in cooperative multihop cognitive networks. IEEE J. Sel. Topics Signal Process.5(1), 37–48 (2011).
 4
L. Gan, in IEEE Int. Conf. DSP. Block compressed sensing of natural images, (2007), pp. 403–406.
 5
E. AriasCastro, E.J. Candes, M.A. Davenport, On the fundamental limits of adaptive sensing. IEEE Trans. Inf. Theory. 59(1), 472–481 (2013).
 6
M.A. Figueiredo, R.D. Nowak, S.J. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Topics Signal Process.1(4), 586–597 (2007).
 7
A. Mousavi, A. Maleki, R.G. Baraniuk, Asymptotic analysis of LASSOs solution path with implications for approximate message passing (2013). preprint available at arXiv:1309.5979v1.
 8
J.A.A. Tropp, C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory. 53:, 4655–4666 (2007).
 9
D. Needell, J.A. Tropp, CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal.26(3), 301–321 (2008).
 10
D. Valsesia, S.M. Fosson, C. Ravazzi, T. Bianchi, E. Magli, in 2016 IEEE International Conference on Multimedia Expo Workshops (ICMEW). Sparsehash: embedding jaccard coefficient between supports of signals, (2016), pp. 1–6.
 11
R. Ward, Compressed sensing with cross validation. IEEE Trans. Inf. Theory. 55(12), 5773–5782 (2009).
 12
Y.C. Eldar, Generalized sure for exponential families: applications to regularization. IEEE Trans. Signal Process.57(2), 471–481 (2009).
 13
SK Sharma, S Chatzinotas, B Ottersten, Compressive sparsity order estimation for wideband cognitive radio receiver. IEEE Trans. Signal Process.62(19), 4984–4996 (2014).
 14
Y. Wang, Z. Tian, C. Feng, Sparsity order estimation and its application in compressive spectrum sensing for cognitive radios. IEEE Trans. Wireless Commun.11(6), 2116–2125 (2012).
 15
D.M. Malioutov, S. Sanghavi, A.S. Willsky, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). Compressed sensing with sequential observations, (2008), pp. 3357–3360.
 16
M.S. Asif, J. Romberg, in Signals, Systems and Computers, 2008 42nd Asilomar Conference On. Streaming measurements in compressive sensing: ℓ _{1} filtering, (2008), pp. 1051–1058.
 17
P.J. Garrigues, L. El Ghaoui, in Neural Information Processing Systems (NIPS). An homotopy algorithm for the Lasso with online observations, vol. 21, (2008).
 18
Y. You, J. Jin, W. Duan, N. Liu, Y. Gu, J. Yang, Zeropoint attracting projection algorithm for sequential compressive sensing. IEICE Electron. Express. 9(4), 314–319 (2012).
 19
T.V. Nguyen, T.Q.S. Quek, H. Shin, Joint channel identification and estimation in wireless network: sparsity and optimization. IEEE Trans. Wireless Commun.17(5), 3141–3153 (2018).
 20
M.E. Lopes, in Proc. 30th Int. Conf. Machine Learning. Estimating unknown sparsity in compressed sensing, vol. 28 (Proceedings of Machine Learning ResearchAtlanta, 2013).
 21
M.E. Lopes, Unknown sparsity in compressed sensing: denoising and inference. IEEE Trans. Inf. Theory. 62(9), 5145–5166 (2016).
 22
S. Lan, Q. Zhang, X. Zhang, Z. Guo, in IEEE Int. Symp. on Circuits and Systems (ISCAS). Sparsity estimation in image compressive sensing, (2012).
 23
A. Agarwal, L. Flodin, A. Mazumdar, in IEEE International Symposium on Information Theory (ISIT). Estimation of sparsity via simple measurements, (2017), pp. 456–460.
 24
Z. BarYossef, T.S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan, ed. by J.D.P. Rolim, S. Vadhan. Randomization and Approximation Techniques in Computer Science (RANDOM) (SpringerBerlin, 2002), pp. 1–10.
 25
P.B. Gibbons, ed. by M. Garofalakis, J. Gehrke, and R. Rastogi. Data Stream Management: Processing HighSpeed Data Streams (SpringerBerlin, 2016), pp. 121–147.
 26
V. Bioglio, T. Bianchi, E. Magli, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). On the fly estimation of the sparsity degree in compressed sensing using sparse sensing matrices, (2015), pp. 3801–3805.
 27
W. Wang, M.J. Wainwright, K. Ramchandran, Informationtheoretic limits on sparse signal recovery: dense versus sparse measurement matrices. IEEE Trans. Inf. Theory. 56(6), 2967–2979 (2010).
 28
D. Omidiran, M.J. Wainwright, Highdimensional variable selection with sparse random projections: measurement sparsity and statistical efficiency. JMLR. 11:, 2361–2386 (2010).
 29
C. Ravazzi, S.M. Fosson, T. Bianchi, E. Magli, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Signal sparsity estimation from compressive noisy projections via γsparsified random matrices, (2016), pp. 4029–4033.
 30
E.J. Candès, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pur. Appl. Math.59(8), 1207–1223 (2006).
 31
D. Omidiran, M.J. Wainwright, Highdimensional variable selection with sparse random projections: measurement sparsity and statistical efficiency. J. Mach. Learn. Res.11:, 2361–2386 (2010).
 32
R. Berinde, P. Indyk, M. Ruzic, in Communication, Control, and Computing, 2008 46th Annual Allerton Conference On. Practical nearoptimal sparse recovery in the l1 norm, (2008), pp. 198–205.
 33
A. Gilbert, P. Indyk, Sparse recovery using sparse matrices. Proc. IEEE. 98(6), 937–947 (2010).
 34
P. Li, CH Zhang, in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) 2015. Compressed sensing with very sparse gaussian random projections, (2006), pp. 617–625.
 35
E.J. Candès, The restricted isometry property and its implications for compressed sensing (Compte Rendus de l’Academie des Sciences, Paris, France). ser. I 346, 589–592 (2008).
 36
M.A.T. Figueiredo, A.K. Jain, Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell.24:, 381–396 (2000).
 37
C.F.J. Wu, On the convergence properties of the em algorithm. Ann. Statist.11(1), 95–103 (1983).
 38
W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc.58(301), 13–30 (1963).
 39
W. Feller, An introduction to probability theory and its applications, vol. 1Wiley, New York, 1968).
 40
T.M. Apostol, Calculus, vol. 2: multivariable calculus and linear algebra with applications to differential equations and probability (Wiley, New York, 1967).
Acknowledgments
The authors thank the European Research Council for the financial support for this research. We express our sincere gratitude to the reviewers for carefully reading the manuscript and for their valuable suggestions.
Funding
This work has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/20072013)/ERC grant agreement no. 279848.
Availability of data and materials
Please contact the corresponding author (chiara.ravazzi@ieiit.cnr.it) for data requests.
Author information
Affiliations
Contributions
All the authors contributed equally, read, and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Ravazzi, C., Fosson, S., Bianchi, T. et al. Sparsity estimation from compressive projections via sparse random matrices. EURASIP J. Adv. Signal Process. 2018, 56 (2018). https://doi.org/10.1186/s1363401805780
Received:
Accepted:
Published:
Keywords
 Sparsity recovery
 Compressed sensing
 Highdimensional statistical inference
 Gaussian mixture models
 Maximum likelihood
 Sparse random matrices