# Robust compressive sensing of sparse signals: a review

- Rafael E. Carrillo
^{1}Email authorView ORCID ID profile, - Ana B. Ramirez
^{2}, - Gonzalo R. Arce
^{3}, - Kenneth E. Barner
^{3}and - Brian M. Sadler
^{4}

**2016**:108

https://doi.org/10.1186/s13634-016-0404-5

© The Author(s) 2016

**Received: **30 April 2016

**Accepted: **22 September 2016

**Published: **19 October 2016

## Abstract

Compressive sensing generally relies on the *ℓ*
_{2} norm for data fidelity, whereas in many applications, robust estimators are needed. Among the scenarios in which robust performance is required, applications where the sampling process is performed in the presence of impulsive noise, i.e., measurements are corrupted by outliers, are of particular importance. This article overviews robust nonlinear reconstruction strategies for sparse signals based on replacing the commonly used *ℓ*
_{2} norm by M-estimators as data fidelity functions. The derived methods outperform existing compressed sensing techniques in impulsive environments, while achieving good performance in light-tailed environments, thus offering a robust framework for CS.

### Keywords

Compressed sensing Sampling methods Signal reconstruction Impulsive noise Nonlinear estimation## 1 Introduction

The theory of compressive sensing (CS) introduces a signal acquisition and reconstruction framework that goes beyond the traditional Nyquist sampling paradigm [1–4]. The fundamental premise in CS is that certain classes of signals, such as natural images, have a succinct representation in terms of a sparsity inducing basis, or frame, such that only a few coefficients are significant and the remaining coefficients are negligibly small. In such cases, the signal is acquired taking a few linear measurements and subsequently accurately recovered using nonlinear iterative algorithms [4, 5]. CS has proven particularly effective in imaging applications due to the inherent sparsity, e.g., in medical imaging [6], astronomical imaging [7], radar imaging [8], and hyperspectral imaging [9].

Since noise is always present in practical acquisition systems, a range of different algorithms and methods have been proposed in the literature that enable accurate reconstruction of sparse signals from noisy compressive measurements using the *ℓ*
_{2} norm as the metric for the residual error (see [10] for a review of CS recovery algorithms). However, it is well known that least squares-based estimators are highly sensitive to outliers present in the measurement vector, leading to a poor performance when the noise does not follow the Gaussian assumption and is, instead, better characterized by heavier-than-Gaussian-tailed distributions [11–14]. A broad spectrum of applications exists in which such processes emerge, including wireless and power line communications, teletraffic, hydrology, geology, atmospheric noise compensation, economics, and image and video processing (see [14–16] and references therein).

As a motivating example, consider a CS system for wireless body area networks (WBAN). WBAN allows the transition from centralized health care services to ubiquitous and pervasive health monitoring in everyday life. Typical signals that are monitored by WBAN are electrocardiogram (ECG) signals, and CS is a promising framework to lower WBAN’s energy consumption. However, ECG signals are typically corrupted by electromyographic noise which shows an impulsive behavior. Another application of interest is a nonintrusive load monitoring system that identifies house appliances and their energy consumption. A CS system can be used to acquire the power signal and then a sparse classification system used to classify the house appliances. However, the power signals exhibit impulsive behavior due to the switching nature of the appliances. If the compressive sampling process has infinite or even very large variance, the reconstructed signal obtained utilizing traditional approaches is far from the desired original signal. Thus, there are clear motivations for developing robust CS techniques that address these challenging environments.

The need to describe impulsive data, coupled with computational advances that enable efficient processing methods based on models more complex than the traditional Gaussian distribution has thus led to the interest in heavy-tailed models. Robust statistics, more specifically, the stability theory of statistical procedures, systematically investigates the effects of deviation from modeling assumptions [17–19]. Maximum likelihood (ML) type estimators, also known as M-estimators, developed in the theory of robust statistics are of great importance in robust signal processing techniques [14, 16]. M-estimators are described by a cost function-defined optimization problem where properties of the cost function (or its first derivative, the so-called influence function) determine the estimator robustness [18].

The key idea in M-estimation is that the cost function, or the influence function, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly generated from the assumed model, and reliable albeit not optimal behavior when the data are generated from another model that is, in some sense, close to the assumed model.

Over the past decade, there have been several works addressing the reconstruction of sparse signals whose measurements are corrupted by outliers or by impulsive noise [20–55]. Parametric approaches that model the corrupting noise as a linear combination of a sparse vector of outliers (possibly gross errors) and a dense vector of small bounded noise have been proposed in the literature. Popilka et al., [21] were the first to analyze this model and proposed a reconstruction strategy that estimate first the sparse error pattern, and then estimate the true signal, in an iterative process. Related approaches are studied in [23–32]. These works assume a sparse error and estimate both signal and error at the same stage using a modified *ℓ*
_{1} minimization problem. This approach was originally proposed by Wright et al. for the face recognition problem with image occlusions [22]. A similar model was proposed by Candès et al. in [33] for the recovery of low rank matrices corrupted by outliers.

Sparse models coupled with sparse reconstruction algorithms have also been used to address the robust regression problem where the number of measurements (observations) is greater than the number of unknowns (explanatory variables) [34–38]. In the context of error correction coding, Candès et al. investigated *ℓ*
_{1} optimization approaches to solve the decoding problem when the received codeword (measurements) is assumed to be corrupted by gross outliers [39, 40].

Approaches based on M-estimators that replace the *ℓ*
_{2} data fidelity term by a more robust cost function have also been proposed. Carrillo et al. propose reconstruction approaches based on the Lorentzian norm as the data fidelity term [41, 42]. In addition, Ramirez et al. develop an iterative algorithm to solve a Lorentzian *ℓ*
_{0}-regularized cost function using iterative weighted myriad filters [43]. A similar approach is used in [44] by solving an *ℓ*
_{0}-regularized least absolute deviation (LAD) regression problem yielding an iterative weighted median algorithm. The authors of [45] propose an iterative approach based on a gradient descent median truncated Wirtinger flow algorithm to solve the phase retrieval problem when the magnitude measurements are corrupted by outliers.

Nonconvex optimization approaches based on *ℓ*
_{
p
} norms as data fidelity functions have been proposed in [46, 47], while an *ℓ*
_{
p
}-space greedy algorithm is proposed in [48]. Greedy algorithms [49, 50] and optimization-based approaches [51, 52] using the Huber function as the data fidelity term have also been proposed in the literature. Bayesian approaches, modeling the corrupting noise using a heavy-tailed probality distribution, are proposed in [53, 54]. Robust PCA approaches resilient to outliers are proposed in [55].

The purpose of this article is to provide an overview of robust reconstruction strategies for CS when the measurements are corrupted by outliers. We approach the problem first from a statistical point of view and then review nonlinear methods that have been proposed in the literature that are based on robust statistics, specifically methods that are based on M-estimators. The organization of this paper is as follows. A general overview of CS is introduced in Section 2, and a collection of robust estimators, known as M-estimators, are discussed in Section 3. We then present a review of nonlinear methods based on robust estimation in Section 4. Section 5 is devoted to illustrate the performance of the reviewed methods in the reconstruction of sparse signals from compressive contaminated samples. Concluding remarks are provided in Section 6.

## 2 Compressive sensing review

Let \(\mathbf {x}\in \mathbb {R}^{n}\) be a signal that is either *s*-sparse or compressible in some representation basis Ψ such that **x**=Ψ
**α**, where \(\mathbf {\alpha } \in \mathbb {R}^{n} \) is the vector of coefficients having at most *s* nonzeros values, i.e., ∥**α**∥_{0}≤*s*. Recall that the *ℓ*
_{
p
} norm of a vector \(\mathbf {u}\in \mathbb {R}^{n}\) is defined as \(\|\mathbf {u}\|_{p}=\left (\sum _{i=1}^{n} |u_{i}|^{p}\right)^{1/p}\). The *ℓ*
_{0} “norm” is not a norm, since it does not meet the positive homogeneity and sub-additivity properties, but in practice simply counts the number of nonzero elements of a vector. Let Φ be an *m*×*n* sensing matrix that represents a dimensionality reduction operation since *m* is taken to be smaller than *n*, with rows that form a set of vectors incoherent with the sparsity representation basis.

The signal **x** is measured by **y**=Φ
**x**. Setting Θ=ΦΨ, the measurement vector becomes **y**=Θ
**α**. In the following, we assume, without loss of generality, that Ψ=I, the canonical basis for \(\mathbb {R}^{n}\), such that **x**=**α**. It has been shown that a convex program (basis pursuit) can recover the original sparse signal, **x**, from a small set of measurements, **y** if the sensing matrix obeys the restricted isometry property (RIP) [56], defined as follows.

### Definition 1.

*s*if there exists a constant

*δ*

_{ s }, defined as the smallest positive quantity such that

**x**∈

*Ω*

_{ s }, where \(\Omega _{s}=\{ \mathbf {x}\in \mathbb {R}^{n} |~\|\mathbf {x}\|_{0}\leq s\}\). A matrix Φ is said to satisfy the RIP of order

*s*if

*δ*

_{ s }∈(0,1).

Basically, the RIP dictates that every set of columns of Φ with cardinality smaller than *s* approximately behaves like an orthonormal system, such that it approximately preserves the *ℓ*
_{2}-distance between any pair of *s*-sparse vectors. It has also been shown that random matrices with Gaussian or sub-Gaussian entries meet the RIP with high probability provided that *m*=*O*(*s* log(*n*)) [5, 57].

**y**=

*Φ*

**x**+

**z**, where

**z**is zero-mean additive white noise. It has been shown that under some characteristics of the noise, notably finite second order statistics or bounded noise in the

*ℓ*

_{2}sense, and if the measurement matrix Φ satisfies the RIP condition, then there exists a variety of algorithms that are able to stably recover the sparse signal from noisy measurements [10]. Among those, basis pursuit denoising (BPD) relaxes the requirement that the reconstructed signal exactly explain the measurements, yielding the convex problem

for some small *ε*>0. Candès shows in [56] that if ∥**y**−Φ
**x**∥_{2}≤*ε* and \(\delta _{2s}<\sqrt {2}-1\), then the solution of (1), \(\hat {\mathbf {x}}\), is guaranteed to obey \(\|\mathbf {x}-\hat {\mathbf {x}}\|_{2}\leq C\epsilon \), where the constant *C* depends on *δ*
_{2s
}.

*ℓ*

_{1}-regularized least squares (

*ℓ*

_{1}-LS) problem, also known as the least absolute shrinkage and selection operator (LASSO) [58],

where *λ* is a regularization parameter that balances the weight between the data fidelity term and the *ℓ*
_{1} regularization term. The *ℓ*
_{1}-LS problem is sometimes preferred over BPD because of the availability of efficient methods to solve (2) [59]. Other sparse reconstruction approaches including greedy algorithms, which iteratively construct sparse approximations, can be found in the literature. Orthogonal matching pursuit (OMP) [60, 61], regularized OMP [62], and iterative hard thresholding (IHT) [63] are examples of this class.

The aforementioned methods use the *ℓ*
_{2} norm as the *data-fitting* term, and they perform adequately under the assumption that the contaminating noise has finite second order statistics. However, just as in classical least squares and mean-square error estimation methods, *ℓ*
_{2}-based sparse reconstruction methods tend to be very sensitive to outliers or gross error present in the measurements. Thus, it is natural to draw on the rich theory of robust linear regression [17–19, 64, 65] as a plausible approach to address the CS reconstruction problem when the measurements are contaminated with heavy-tailed noise. Several key robust estimators are reviewed in the following section.

## 3 M-estimators overview

The presence of outliers in CS measurements leads to the study of robust estimators since the recovered sparse signal is highly affected by the presence of the large errors in the data. Robust M-estimators bring substantial benefits in this scenario because, rather than relying on classical Gaussian ML estimation, they are based on modeling the contamination noise of the measurements as heavy-tailed process.

M-estimators are a generalization of ML estimators and are described by a cost function-defined optimization problem where properties of the cost function determine the estimator robustness [17–19]. In robust estimation theory, two important concepts characterize the robustness of an estimator: the breakdown point and the influence function.

The break down point is used to characterize quantitative robustness of an estimator. It indicates the maximal fraction of outliers (highly deviating samples) in the observations, which an estimator can handle without breaking down. The influence function describes the bias impact of infinitesimal contamination at an arbitrary point on the estimator, standardized by the fraction of contamination. For M-estimators, the influence function is proportional to the first derivative of the cost function [18]. Desirable properties of the influence function are boundedness and continuity. Boundedness ensures that a small fraction of contamination or outliers can have only a limited effect on the estimate, whereas continuity means that small changes in the data lead to small changes in the estimate.

Several robust M-estimators have been studied in the literature, and the most commonly used methods in CS are reviewed in the following. For simplicity of the exposition, the reviewed M-estimators are presented in the location estimation setting, i.e., the one-dimensional case, though the cost functions, and their properties, can be extended to the multidimensional case [14].

### 3.1 Median estimator

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation follows the linear model

*y*

_{ i }=

*α*+

*z*

_{ i }, and the elements

*z*

_{ i }are independent samples obeying a zero-mean Laplacian distribution. This is the classical location parameter estimation problem, which seeks the best estimate of

*α*from a set of observations {

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation has a scale parameter

*σ*

_{ i }. The resulting ML estimate of

*α*is given by

*ℓ*

_{1}norm and its influence function, IF(

*x*)=sign(

*x*), is bounded but discontinuos at the origin. The solution to (3) is the well-known

*weighted median*(WM). The WM operator is defined by [66]

where *w*
_{
i
}=1/*σ*
_{
i
} denotes the weight associated with the *i*-th observation sample and the symbol ◇ represents an operator that replicates *w*
_{
i
} times the value *y*
_{
i
}; i.e. \(w_{i} \diamond y_{i} = \overbrace {y_{i},y_{i},\cdots,y_{i}}^{w_{i} \text {\scriptsize {times}}}\). Thus, the WM operator consists of replicating the *i*th sample *w*
_{
i
} times and sorting all the samples to then find the median value of the entire set. If the weights are real numbers instead of integers, the threshold decomposition framework can be applied to compute the weighted median [67].

### 3.2
*ℓ*
_{
p
} estimator

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation follows the linear model

*y*

_{ i }=

*α*+

*z*

_{ i }and the elements

*z*

_{ i }are independent and follow the zero-centered GGD. The probability density function of the GGD is given by

*Γ*(·) is the gamma function,

*σ*is a scale parameter and

*p*>0, the so-called shade parameter, controls the tail decay rate. If each observation has a different scale parameter

*σ*

_{ i }, the ML estimate of

*α*is given by

There are two special cases of the GGD family that are well studied: the Gaussian (*p*=2) and Laplacian (*p*=1) distributions, which yield the well-known *weighted mean* and *weighted median* estimators, respectively. Conceptually, the lower the value of *p*, the more heavy tailed is the distribution leading to more impulsive samples. When *p*<2, the GGD exhibits heavier than Gaussian tails (super-Gaussian) and when 0<*p*<1, the model is very impulsive. The *p*≠{1,2} cases yield the fractional lower order moment (FLOM) estimation framework [68].

Recall that the *ℓ*
_{
p
} norm of a vector \(\mathbf {u}\in \mathbb {R}^{m}\) is defined as \(\|\mathbf {u}\|_{p}=\left (\sum _{i=1}^{m}|u_{i}|^{p}\right)^{1/p}\). Note that the *ℓ*
_{
p
} norms are convex and everywhere continuous functions when *p*>1. The especial case *p*=1 is the *ℓ*
_{1} norm that is convex but piece-wise continuous. When 0<*p*<1, the *ℓ*
_{
p
} norms are nonconvex and piece-wise continuous. In the latter case, the *ℓ*
_{
p
} norms are not really norms in the strict sense, but quasi-norms, since the sub-additivity property is not satisfied. The influence function for the *ℓ*
_{
p
} norms, IF(*x*)=sign(*x*)*p*|*x*|^{
p−1}, is bounded but discontinuous at the origin for 0<*p*≤1 and continuous everywhere but not bounded for *p*>1. Also note that the influence function is asymptotically redescending, i.e., IF(*x*)→0 as *x*→±*∞* when 0<*p*<1. Having a redescending influence function is a desirable property in a robust estimator since large outliers do not influence the output of the estimate. Thus, the *ℓ*
_{
p
} norms are optimal under GGD noise and offer a powerful framework for impulsive noise applications when 0<*p*<2 [69–71].

### 3.3 Huber estimator

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation follows the linear model

*y*

_{ i }=

*α*+

*z*

_{ i }and the elements

*z*

_{ i }are i.i.d. random variables from a continuous GGP symmetric distribution, with scale parameter

*σ*>0. A robust estimator that combines the

*ℓ*

_{2}and

*ℓ*

_{1}norms as cost function is defined as [17]

*ρ*is a convex and piece-wise continuous function, named the Huber’s cost function, and it is given by

In Eq. (8), the parameter *c* is a tuning constant that influences the degree of robustness of the estimator [50]. The Huber cost function is one of the most popular cost functions in M-estimators since it combines the sensitivity properties of the *ℓ*
_{2} norm and the robustness to outliers of the *ℓ*
_{1} norm [17]. Robustness properties of the Huber estimator are dictated bye the scale parameter *σ* and the tuning constant *c*. Since the Huber cost function is a combination of the *ℓ*
_{2} and *ℓ*
_{1} norms, its influence function is also a combination of the two related influence functions. Thus, its influence function is bounded and piece-wise continuous.

### 3.4 Myriad estimator

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation again follows the linear model

*y*

_{ i }=

*α*+

*z*

_{ i }, as described in the previous sub-section. However, the elements

*z*

_{ i }are now i.i.d. samples obeying the standard Cauchy distribution. The Cauchy probability density function is given by

*σ*is the scale parameter. The ML best estimate of

*α*from a set of observations {

*y*

_{1},

*y*

_{2},⋯,

*y*

_{ n }}, where each observation has a scale parameter

*σ*, is given by

*myriad*estimate. In this case, instead of using the sample mean or the sample median, the optimal solution minimizes the sum of logarithmic square deviations, referred as the Least Lorentzian Squares (LLS) criterion [16]. The influence function of the myriad estimator is given by IF(

*x*)=2

*x*/(

*σ*

^{2}+

*x*

^{2}). Note that this influence function is everywhere continuous, bounded, and asymptotically redescending. The myriad estimate is denoted as

Note that the myriad estimate is also the ML estimator when *z*
_{
i
} follow the Student’s *T* distribution with 1 degree of freedom. The sample myriad has different modes of operation that depend on the tuning of the scale parameter *σ*, the so-called linearity parameter [72]. When the noise is Gaussian, for example, values of *σ* larger than the sample range, i.e., *σ*≥*y*
_{(1)}−*y*
_{(0)}, where *y*
_{(q)} denotes the sample *q*-th quantile, can provide the optimal performance associated with the sample mean. On the other hand, setting *σ* as half the interquartile range, i.e., *σ*=(*y*
_{(0.75)}−*y*
_{(0.25)})/2, considers implicitly half the samples unreliable, giving resilience to gross errors. For highly impulsive noise statistics, mode-type estimators can be achieved by using small values of *σ* [72]. Different approaches to automatically adapt *σ* under different noise scenarios [73] and to efficiently compute the myriad estimate [74, 75] have been proposed.

The Lorentzian norm (or *L*
*L*
_{2} norm) is not a norm in the strict sense, since it does not meet the positive homogeneity and sub-additivity properties. However, it defines a robust metric that does not heavily penalize large deviations, with the robustness depending on the scale parameter *γ*, thus making it an appropriate metric for impulsive environments (optimal in ML sense under the Cauchy model) [16, 72, 76, 77]. Further justification for the use of the Lorentzian norm is the existence of logarithmic moments for algebraic-tailed distributions, as second moments are infinite or not defined for such distributions and therefore not an appropriate measure of the process strength [13, 16].

### 3.5 M-generalized Cauchy estimator

*y*

_{1},

*y*

_{2},…,

*y*

_{ n }}, where each observation follows the linear model

*y*

_{ i }=

*α*+

*z*

_{ i }, and the elements

*z*

_{ i }are i.i.d. samples obeying a generalized Cauchy distribution (GCD). The probability density function of the GCD is given by

*a*=

*p*

*Γ*(2/

*p*)/2(

*Γ*(1/

*p*))

^{2}. In (13), the scale parameter is given by

*σ*, and the tail decay of the distribution is given by

*p*. For the particular case

*p*=2, we have the Cauchy distribution. The ML estimate of the location parameter for GCD distributed samples is given by [77]:

The particular cases of *p*=1 and *p*=2 yield the *meridian* [78] and *myriad* [76] estimators, respectively.

### 3.6 A few comments on the cost functions

*ℓ*

_{1}norm, the Huber cost function, with

*σ*=1 and

*c*=0.75, and the Lorentzian norm with two different values of

*γ*(

*γ*=1 and

*γ*=0.1) for the one-dimensional case. The squared

*ℓ*

_{2}norm is plotted as reference. Compared to the squared

*ℓ*

_{2}norm, the

*ℓ*

_{1}, Huber and Lorentzian functions do not over penalize large deviations, leading to more robust error metrics when outliers are present. Notably, the Lorentzian norm and the Huber cost function, for

*c*<1, are more robust to outliers since they do not increase their value as fast as the

*ℓ*

_{1}norm when

*u*→

*∞*.

In the same manner as the myriad estimator, robustness properties of the Lorentzian norm are defined by the scale parameter *γ*. The Lorentzian norm is convex in the interval −*γ*≤*u*≤*γ* behaving as an *ℓ*
_{2} cost function for small variations compared to *γ* and log-concave outside this interval. Thus, small values of *γ* make the Lorentzian more resilient to gross errors and large values of *γ* make the Lorentzian similar to the squared *ℓ*
_{2} norm. Robustness properties of the Huber cost function also depend on the scale parameter *σ* and on the parameter *c*.

Although the Lorentzian norm is a nonconvex function, it is everywhere continuous and differentiable, which are desirable properties when used as a cost function in optimization problems. On the other hand, the *ℓ*
_{1} and Huber functions are convex and continuous functions, thus enjoying strong theoretical guarantees when used in optimization problems. However, the *ℓ*
_{1} norm is piece-wise continuous and not differentiable, which rules out traditional smooth optimization methods based on derivative information, whereas the Huber function is everywhere differentiable.

*γ*and for the

*ℓ*

_{1}and Huber cost functions, in the location estimation problem. The observation samples are located in

*x*={−1,0,1,10}. Note that for

*γ*=0.1, the Lorentzian cost function exhibits four local minima, whereas for

*γ*=1, the cost function is smoothed and only two local minima are present.

## 4 Review of robust sparse signal reconstruction methods

*s*-sparse signal \(\mathbf {x}_{0}\in \mathbb {R}^{n}\) from a reduced set of noisy linear projections \(\mathbf {y}\in \mathbb {R}^{m}\) given by

where \( \mathbf {z}\in \mathbb {R}^{m}\) is the noise vector with i.i.d. components following a common distribution *f*
_{
z
}(*z*). If the noise contains outliers, or is of impulsive nature, then it is better characterized by a distribution with heavier-than-Gaussian tails. A common model for the noise is to assume that **z**=**r**
_{0}+**w**, where **r**
_{0} is modeled as a sparse error whose locations of nonzero entries are unknown and whose magnitudes can be arbitrarily large and **w** is a small *ℓ*
_{2}-bounded noise (possibly Gaussian). Another common model is to assume that **z** follows a heavy-tailed distribution such as the Laplace distribution or the alpha-stable distribution. In order to mitigate the effect of the impulsive noise in the compressive measurements, a robust *data-fitting* term should be used.

In this section, we present a set of formulations and methods for robust sparse signal reconstruction when the signals are acquired in the presence of impulsive noise. The approaches described herein are based on replacing the *ℓ*
_{2} norm by the previously described robust metrics for the data fidelity term.

### 4.1
*ℓ*
_{1}-based methods

*ℓ*

_{2}norm is replaced by the

*ℓ*

_{1}norm in the data-fitting term, the CS reconstruction problem reduces to solving a constrained LAD regression problem given by

*ℓ*

_{0}constraint have been proposed. For instance, Wang et al. proposed the following convex problem [20]

*ℓ*

_{1}regularized problem (

*ℓ*

_{1}-LAD) whose theoretical properties for statistical regression are studied in [20]. The works in [23–28] study theoretical recovery conditions for the following equivalent convex problem

where **r** is a slack variable that represents the corrupting vector, i.e., **z**=**r**
_{0}, where **r**
_{0} is a sparse vector with unknown nonzero locations and possibly large magnitudes. The parameter *τ* controls the balance between the two *ℓ*
_{1} terms in (19). If a large value of *τ* is used, then the problem can recover a dense error for a sufficiently sparse signal. On the other hand, if a small value of *τ* is chosen, then only a small fraction of corrupted measurements can be corrected.

**z**=

**r**

_{0}+

**w**, where

**r**

_{0}is assumed sparse and

**w**is a small

*ℓ*

_{2}-bounded noise, are also studied [26–30]. These works study theoretical recovery conditions of the following convex program

where *ε* is an bound on the *ℓ*
_{2} norm of **w**.

Recovery guarantees based on the RIP of the extended matrix [Φ
I] were reported in [23]. These recovery guarantees are particularly useful when Φ is, for example, i.i.d. Gaussian. Ngunyen and Tran reported results based on a structured model of the matrix Φ [28]. They assume that Φ is formed by selecting rows from an orthogonal matrix with a low incoherence parameter *μ*, which is the minimum value such that |*Φ*
_{
ij
}|^{2}≤*μ*/*n* for any *i,j*. Under these assumptions, they showed that (20) can recover both **x**
_{0} and **r**
_{0} with high probability if *m*≥*C*
*μ*
^{2}∥**x**
_{0}∥_{0}(log*n*)^{2} and ∥**r**
_{0}∥_{0}≤*γ*
*m*, *γ*∈(0,1), which are nearly optimal conditions for the number of measurements and the sparsity of the error vector, i.e., the number of gross errors that can be corrected. The following theorem, shown by Li [27], presents stable recovery guarantees under a probabilistic model.

### Theorem 1.

*m*and set \(\tau = \sqrt {\log (n/m)+1}\). Then, if ∥

**w**∥

_{2}≤

*ε*, ∥

**x**

_{0}∥

_{0}≤

*γ*

*m*/(log(

*n*/

*m*)+1) and ∥

**r**

_{0}∥

_{0}≤

*γ*

*m*,

*γ*∈(0,1), the solution to the convex problem in (20), \((\hat {\mathbf {x}},\hat {\mathbf {r}})\), satisfies

with probability at least 1−*C* exp(−*c*
*m*), where *K*, *C*, and *c* are numerical constants.

The results in Theorem 1 show that the signal can be stably recovered if the number of gross errors is up to a fixed fraction of the number of measurements. The bounds on the number of measurements are nearly optimal compared to the standard CS problem. Deterministic recovery conditions based on the coherence of the matrix Φ and the number of nonzero entries of **x** and **r** were resported in [26,30]. These coherence-based results do not assume any particular model for the matrix Φ.

where *τ*
_{
x
} and *τ*
_{
r
} are regularization parameters. Recovery guarantees based on a extended restricted eigenvalue condition of the matrix Φ and bounds for the regularization parameters *τ*
_{
x
} and *τ*
_{
r
} are studied in [29]. Note that the problems in (20) and (22) are convex and can be efficiently solved using standard optimization algorithms to solve the *ℓ*
_{1}-LS and BPD problems (see [59
*,*
79]) by using the extended model \(\tilde {\mathsf {\Phi }} = [\mathsf {\Phi }~ \mathsf {I} ]\) and \(\tilde {\mathbf {x}}=[\mathbf {x}^{T}, \mathbf {r}^{T}]^{T}\). In the following, we describe several approaches for solving (17) and (18).

_{ 1 }

**-based coordinate descent algorithm**The problem in (17) is combinatorial and nonsmooth, thus a greedy strategy based on the coordinate descent algorithm and the weighted median estimator is proposed in [44]. In this scheme, each element of the sparse vector

**x**is estimated at each step, while keeping the other elements of the vector fixed. The solution for the one-dimensional problem is then given by

*ℓ*

_{0}-regularization norm is included in the solution by computing the hard thresholding operator after computing the weighted median estimate. Thus, the solution is

where \(\mathbf {r}_{j}=\mathbf {y}-\sum _{k=1, k\neq j}^{n}\mathbf {\phi }_{k} x_{k}\) is the *j*-th residual term that remains after removing the contribution of all the components of the estimated vector except the *j*-th component, and **ϕ**
_{
k
} denotes the *k*-th column vector of the measurement matrix. The coordinate-descent approach is computationally expensive because the estimation of the sparse vector requires cycling through all the components at each iteration of the algorithm.

_{ 1 }

**-based alternating direction method**The problems posed in (18) or (19) are convex but nonsmooth. However, they can be solved using the alternating direction method of multipliers (ADMM) [79

*,*80]. ADMM solves the extended

*ℓ*

_{1}problem (19) by finding a saddle point of the augmented Lagrangian function

where **z** is the Langrange multiplier’s vector and *β*>0 is a penalty constant.

where **g**
^{(k)}=Φ
^{
T
}(**z**
^{(k)}/*β*+Φ
**x**
^{(k)}+**r**
^{(k+1)}−**y**) is the gradient of the differentiable part of the augmented Lagrangian function with respect to **x** and Shrink(·,*ρ*) denotes the shrinkage operator defined as Shrink(**a**,*ρ*)_{
i
}=*s*
*g*
*n*(*a*
_{
i
}) max(|*a*
_{
i
}|−*ρ*,0). The parameters *μ* and *ν* are step sizes. Convergence conditions for *μ* and *ν* and strategies to select *β* are detailed in [80].

### 4.2
*ℓ*
_{
p
}-based methods

*ℓ*

_{ p }norm, with 0<

*p*<1, can be used as the data-fitting term yielding the following recovery optimization problem:

The problem in (26) is optimal under the ML criteria for GGD noise and robust to very impulsive noise. Numerical methods have been proposed to efficiently solve (26) for the 0<*p*<2 case [47]. The algorithm is based on incorporating the proximity operator of the *ℓ*
_{
p
} norm into the framework of ADMM. For the nonconvex case (0<*p*<1), a smoothing strategy has been employed to derive a convergent algorithm. Stability results similar to those derived in [41] are derived in [47] based on the RIP of Φ.

where the *ℓ*
_{1} norm is replaced by the *ℓ*
_{
p
} norm as sparsity promoting function and **r** is a slack variable that represents the corrupting sparse vector. The following theorem presents theoretical recovery conditions based on the RIP of the extended matrix [Φ
I].

### Theorem 2.

*K*

_{1}=∥

**x**

_{0}∥

_{0}and

*K*

_{2}=∥

**r**

_{0}∥

_{0}. Let

*a*

_{1}≤1 and

*a*

_{2}≤1 be constants such that

*a*

_{1}

*K*

_{1}and

*a*

_{2}

*K*

_{2}are integers and define

*a*= min(

*a*

_{1},

*a*

_{2}). Let

*c*≤1 and

*b*be constants such that

**x**

_{0}and

**r**

_{0}.

*ℓ*

_{ p }norm as data fidelity term are also studied. Zeng et al. proposed robust versions of MP and OMP, coined

*ℓ*

_{ p }-MP and

*ℓ*

_{ p }-OMP, respectively, based on the notion of

*ℓ*

_{ p }-space correlation, with 0<

*p*<2, which is robust to outliers [48]. The

*ℓ*

_{ p }-correlation is defined as follows. Let \(\mathbf {a},\mathbf {b} \in \mathbb {R}^{m}\) with finite

*ℓ*

_{ p }norm. Then the

*ℓ*

_{ p }-correlation, with 0<

*p*<2, is defined as

The function \(\| \mathbf {b} -\alpha \mathbf {a}\|_{p}^{p}\) is the *ℓ*
_{
p
} norm of the fitting error of the univariate linear regression model **b**=*α*
**a**+**z** where **z** denotes the error vector. If there exist an *α* such that *c*
_{
p
}(**a**,**b**)=1, then **a** and **b** are collinear. On the other hand, if *c*
_{
p
}(**a**,**b**)=0 then **a** and **b** are said to be orthogonal [48].

### 4.3 Huber loss-based methods

*ℓ*

_{2}and

*ℓ*

_{1}norms. Then, a sparse signal can be estimated by solving the following constrained problem [50]

where *ρ* is the piece-wise continuous and convex function defined in Eq. (8), **ϕ**
_{
i
} denotes the column vector obtained by transposing the *i*-th row of Φ, and *α*>0 is a scaling factor. Note that the problem in (29) is combinatorial and that both **x** and the scale parameter *σ* are simultaneously estimated. Ollila et al. [50] derived an iterative hard thresholding algorithm coined Huber iterative hard thresholding (HIHT) to solve the problem (29). A detailed analysis of the selection of the parameters *α* and *c* is presented in [50]. Also note that this framework can be extended to any robust cost function that meets some regularity conditions, e.g., the Tukey’s bi-weight function [49].

where *σ* is estimated beforehand and *λ* is a regularization parameter that controls the sparsity level of the solution. Efficient algorithms to solve (30) based on the fast iterative shrinkage algorithm (FISTA) and ADMM and the adequate selection of the parameter *λ* are presented in [51,52].

### 4.4 Lorentzian-based methods

For a more general type of heavy-tailed noise, the reconstruction of sparse signals can be formulated using the Lorentzian norm as a fitting term. The formulations and algorithms described next are based on the Lorentzian norm as a robust error metric, which is appropriate for many impulsive environments.

**Lorentzian-based basis pursuit**Using the strong theoretical guarantees of

*ℓ*

_{1}minimization for sparse recovery in CS, Carrillo et al. studied the following nonconvex constrained optimization problem to estimate a sparse signal from the noisy measurements [41]

The following theorem presents an upper bound for the reconstruction error of the proposed estimator in (31).

### Theorem 3.

**x**

_{0}such that |supp(

**x**

_{0})|≤

*s*, and observation noise

**z**with \(\|\mathbf {z}\|_{LL_{2},\gamma }\leq \rho \), the solution to (31),

**x**

^{∗}, obeys the following bound

where the constant *C*
_{
s
} depends only on *δ*
_{2s
}.

Theorem 3 shows that the solution to (31) is a sparse signal with an *ℓ*
_{2} error that is dependent on logarithmic moments. Note that the dependence on the noise logarithmic moment, rather than its second order moment, makes the formulation in (31) robust and stable to algebraic-tailed and impulsively corrupted samples. The optimization problem in (31) is referred to as Lorentzian BP (LBP). The scale parameter *γ* controls the robustness of the norm and *ρ* the radius of the *L*
*L*
_{2} ball thus defining the feasible set. The scale parameter is estimated as *γ*=(*y*
_{(0.875)}−*y*
_{(0.125)})/2, where *y*
_{(q)} denotes the *q*-th quantile of the corrupted measurement vector **y** [41]. The reader is referred to [41] for further details on strategies to estimate *γ* and *ρ* based on the Cauchy model.

*ℓ*

_{1}norm is used in [41] to numerically solve the problem in (31). However, a less expensive approach is to solve a sequence of unconstrained problems of the form

where *λ* is a regularization parameter that is changed in a decreasing manner at every iteration following an homotopy approach. The solution of the previous problem is used as starting point for the next problem. Since the Lorentzian norm is differentiable (though not Lipschitz differentiable), a nonconvex proximal-gradient algorithm [81] can be used to efficiently solve (33).

**Lorentzian-based iterative hard thresholding algorithm**Even though Lorentzian BP provides a robust CS framework in heavy-tailed environments, as explained above, numerical algorithms to solve the proposed optimization problem are not efficient [41]. Therefore, Carrillo and Barner proposed a Lorentzian-based iterative hard thresholding (IHT) algorithm [42]. In order to estimate

**x**

_{0}from

**y**, the following optimization problem is proposed:

**x**

_{0}based on the gradient projection algorithm [42]. The proposed strategy is formulated as follows. Let

**x**

^{(t)}denote the solution at iteration time

*t*and set

**x**

^{(0)}to the zero vector. At each iteration

*t*, the algorithm makes the update

*H*

_{ s }(

**a**) is the nonlinear operator that sets all but the largest (in magnitude)

*s*elements of

**a**to zero,

*μ*

_{ t }is a step size, and W

_{ t }is an

*m*×

*m*diagonal matrix with each element defined as

**ϕ**

_{ i }denotes the column vector obtained by transposing the

*i*-th row of Φ.

The algorithm defined by the update in (35) is coined Lorentzian iterative hard thresholding (LIHT). Note that W
_{
t
}(*i,i*)≤1, thus, the weights diminish the effect of gross errors by assigning a small weight (close to zero) for large deviations compared to *γ*, and a weight near one for deviations close to zero. In fact, if W
_{
t
} is the identity matrix, the algorithm reduces to the *ℓ*
_{2}-based IHT [63]. The algorithm is a fast and simple method that only requires the application of Φ and Φ
^{
T
} at each iteration.

Although the algorithm is not guaranteed to converge to a global minimum of (34), it can be shown that LIHT converges to a local minimum [42]. In the following, we show that LIHT has theoretical stability guarantees similar to those of the *ℓ*
_{2}-based IHT. For simplicity of the analysis, we set *μ*
_{
t
}=1 and assume that ∥Φ∥≤1, where ∥·∥ denotes the spectral norm of a matrix.

### Theorem 4.

*S*=supp(

**x**

_{0}), |

*S*|≤

*s*. Suppose \(\mathsf {\Phi } \in \mathbb {R}^{m\times n}\) meets the RIP of order 3

*s*with \(\delta _{3s}<1/\sqrt {32}\). Assume

**x**

^{(0)}=0. Then, if \(\|\mathbf {z}\|_{LL_{2},\gamma }\leq \tau \), the reconstruction error of the LIHT algorithm at iteration

*t*is bounded by

where \(\alpha =\sqrt {8}\delta _{3s}\) and \(\beta =\sqrt {1+\delta _{2s}}(1-\alpha ^{t})(1-\alpha)^{-1}\).

The results in Theorem 4 can be easily extended to compressible signals using Lemma 6.1 in [82]. The scale parameter *γ* is estimated from **y** in the same manner described previosly for LBP. The step size *μ*
_{
t
} is adapted at every iteration using a line search scheme with backtracking. See [42] for details.

**Lorentzian-based coordinate descent algorithm**In the context of CS random projections contaminated with Cauchy distributed noise, a suitable formulation for the reconstruction of sparse signals is

where *τ* is a regularization parameter that balances the influence of the Lorentzian norm as fitting-term and the sparsity-inducing term (*ℓ*
_{0}-term) on the optimal solution. The coordinate-descent approach updates the estimate of each element of the sparse vector **x**, while keeping the others fixed. Without loss of generality, the solution for the one-dimensional version of (37) is given by the following theorem.

### Theorem 5.

*Q*(

**z**

_{ j };

*x*

_{ j }), with

**z**

_{ j }=[

*z*

_{1,j },…,

*z*

_{ m,j }], be the Lorentzian norm, for the one-dimensional case, defined as

*κ*is a linearity parameter and \(W_{i,j}=\frac {\kappa ^{2}}{\eta ^{2}_{i,j}}\) are the weights having the parameter

*η*

_{ i,j }given by \(\eta _{i,j}=\frac {\sum _{i=1,i \neq j}^{m} |y_{i}|}{\phi _{i,j}}\). The elements

*z*

_{ i,j }correspond to the

*i*-th observation sample weighted by the element (

*i,j*) of the sampling matrix,

*i.e.*, \(z_{i,j}=\frac {y_{i}}{\phi _{i,j}}\). The solution to the

*ℓ*

_{0}-Regularized Lorentzian problem in (37) is given by

where \(\tilde {x}_{j}=\arg \min _{x_{j}} Q(\mathbf {z}_{j};x_{j})\) and *τ* is the regularization parameter that governs the sparsity of the solution.

Since this method requires the estimation of one coordinate at a time per iteration, the method is computationally expensive. A modified version, that accelerates the reconstruction of sparse signals by determining which coordinates are allowed to be estimated at each iteration, was proposed in [83].

## 5 Illustrative numerical examples

In this section, we present numerical experiments that illustrate the robustness and effectiveness of the reviewed methods, for the recovery of a sparse signal from noisy compressive samples. In particular, we compare the performance of the following robust methods: *ℓ*
_{1}-based coordinate descent (*ℓ*
_{1}-CD) [44], the *ℓ*
_{1}-LAD method solved by ADMM, Lorentzian-based basis pursuit (LBP) [41], Lorentzian-based iterative hard thresholding (LITH) [42], Lorentzian-based coordinate descent (L-CD) [43], the robust lasso (R-Lasso) method [29], the Huber iterative hard thresholding (HIHT) method [50], and the *ℓ*
_{1}-OMP method [48]. In order to evaluate the susceptiveness to outliers of traditional CS methods, we also include the performance of the *ℓ*
_{1}-LS method [58].

*n*=400. The nonzero coefficients have equal amplitude, equiprobable sign, and randomly chosen position. Gaussian sensing matrices are employed with

*m*=100, and the measurements are then contaminated with

*α*-stable noise (with

*α*=1). Figure 3 a shows the true signal and Fig. 3 b, c shows the clean and contaminated measurements, respectively.

*ℓ*

_{2}-based

*ℓ*

_{1}-LS method. It is clear that the traditional

*ℓ*

_{1}-LS method for CS fails at estimating the sparse signal when gross errors are present in the compressed measurements. R-Lasso and

*ℓ*

_{1}-LAD are slightly more robust than

*ℓ*

_{1}-LS because the true support is correctly estimated although some components outside the true support also have strong amplitude. On the other hand, the coordinate descent approaches

*ℓ*

_{1}-CD and L-CD are greedy methods that correctly identify the true support with correct amplitudes. A few components appear at wrong coordinates but with small amplitude values. LIHT, LBP, HIHT, and

*ℓ*

_{1}-OMP methods can also correctly identify the components but the amplitudes are not completely correct. A summary of the reconstruction of the sparse one-dimensional signal for all methods is given in the third column of Table 1.

Summary of sparse reconstruction methods

Method | Optimization problem | SER for signal [dB] | SER for image [dB] | Time [s] |
---|---|---|---|---|

LBP | \(\phantom {\dot {i}\!}\|\mathbf {y}-\mathsf {\Phi } \mathbf {x}\|_{\text {LL}_{2},\gamma }+\lambda \| \mathbf {x}\|_{1}\) | 24.0 | 20.7 | 10.58 |

LIHT | \(\phantom {\dot {i}\!}\|\mathbf {y}-\mathsf {\Phi } \mathbf {x}\|_{\text {LL}_{2},\gamma }~~\text {s.t.}~~\|\mathbf {x}\|_{0}\leq s \) | 24.0 | 19.4 | 2.13 |

R-Lasso | \(\frac {1}{2}\| \mathbf {y}-\mathsf {\Phi } \mathbf {x}-\mathbf {r}\|_{2}^{2} + \tau _{x}\| \mathbf {x}\|_{1} + \tau _{r}\|\mathbf {r}\|_{1}\) | 8.9 | 18.1 | 7.23 |

L-CD | \(\phantom {\dot {i}\!}\|\mathbf {y}- \mathsf {\Phi } \mathbf {x} \|_{\text {LL}_{2},\gamma } + \tau \|\mathbf {x}\|_{0} \) | 25.1 | 19.2 | 6522.7 |

| ∥ | 28.2 | 20.3 | 3814.2 |

| \(\frac {1}{2}\|\mathbf {y}-\mathsf {\Phi } \mathbf {x}\|_{2}^{2}+\lambda \| \mathbf {x}\|_{1}\) | –6.5 | 7.3 | 4.73 |

| ∥ | 16.9 | 19.6 | 7.05 |

HIHT | \(\sum _{i=1}^{M}\rho \big (\frac {y_{i}-\mathbf {\phi }_{i}^{T}\mathbf {x}}{\sigma }\big)~~\text {s.t.}~~\|\mathbf {x}\|_{0}\leq s\) | 25.1 | 19.4 | 90.78 |

| ∥ | 24.1 | – | – |

*s*=10,

*m*=100,

*n*=400. However, now the random projections are contaminated with

*α*-stable noise, with the tail parameter,

*α*, varying from 0.2 to 2, i.e., from very impulsive to the Gaussian case. The scale parameter of the noise is set to

*σ*=0.01 for all cases. The results are depicted in Fig. 4. All results are averaged over 100 realizations of the sensing matrix, noise, and the sparse signals.

It can again be noticed that the *ℓ*
_{1}-LS and *ℓ*
_{1}-OMP methods fail at reconstructing the signals in very impulsive noise (for *α*<1). As the *α* tail parameter increases, these methods improves the average SER, giving the best result for the Gaussian case. The *ℓ*
_{1}-OMP method yields faithful reconstructions for *α*>1.2. The robust-lasso (R-Lasso) is able to reconstruct sparse signals when the noise tail parameter is larger than 0.8. For very impulse noise (*α*<0.8), the reconstruction SER is highly degraded. All other robust methods are able to reconstruct the sparse signals, even in noise environments with tail parameters of *α*>0.4. Figure 4 shows that the robust methods, not only work well in impulsive environments but also when the noise is Gaussian.

The last experiment shows the performance of the reviewed methods in the reconstruction of the cameraman image of size 256×256 from a set contaminated measurements. We take *m*=32,768 measurements, i.e., 50 *%* undersampling, acquired using a random DCT ensemble, and we used the Daubechies *d*
*b*4 wavelet as sparsity representation basis. The measurements are contaminated with *α*-stable noise, having tail parameter *α*=1 and scale parameter *σ*=0.01. The *ℓ*
_{1}-OMP method is not included in this experiment due to its high computational cost when dealing with compressible high dimensional signals.

*ℓ*

_{1}-LS,

*ℓ*

_{1}-LAD via ADMM, R-Lasso, LBP, LIHT,

*ℓ*

_{1}-CD, L-CD, and HIHT. Note that the

*ℓ*

_{1}-LS generates several artifacts, and the image is not correctly reconstructed. The

*ℓ*

_{1}-LAD via ADMM,

*ℓ*

_{1}-CD, and LBP methods generate images with better quality than R-Lasso, LIHT, L-CD, and HIHT such that even small details are preserved. A summary of the performance of the methods for this experiment is given in terms of SER (in dB) and execution times (in s), in columns 4 and 5 of Table 1, respectively.

Note that the convex methods, namely, *ℓ*
_{1}-LAD and R-Lasso, are fast and offer a good computational efficiency since there have been a lot of recent efforts in solving large-scale convex problems [84]. Also, these methods enjoy the rich theoretical guarantees for convex problems. The rest of the methods are based either on nonconvex cost functions or nonconvex constraint sets, thus only convergence to a local minimum can be guaranteed. Also note that all methods, except the coordinate descent methods, *ℓ*
_{1}-CD and L-CD, and the *ℓ*
_{1}-OMP method, do not need to explicitly form the sensing matrix Φ but only need functions that implement the matrix-vector multiplication by Φ and Φ
^{
T
} at each iteration. Thus, if fast implementations are available for such functions, the computational complexity of the algorithms can be largely reduced. On the other hand, the coordinate descent methods are not computationally efficient because only one coordinate is estimated at each iteration and an explicit representation of the matrix Φ is needed. However, these methods offer scalability when the sensing matrix is very large and can only be accessed one row per iteration. Also, fast methods have been proposed where only those coordinates with larger influence in the residuals are estimated at each iteration [83]. The *ℓ*
_{1}-OMP method is not computationally efficient for high dimensional signals because it needs an explicit representation of the matrix Φ in order to perform the *ℓ*
_{1} correlation with every column of the sensing matrix at each iteration of the algorithm. Recall that computing an *ℓ*
_{1} correlation between two vectors involves solving an scalar regression problem.

Regarding implementation issues, most methods have free parameters to tune in order to yield a good performance. The greedy methods, such as LIHT, HIHT, and *ℓ*
_{1}-OMP, are sensitive to know the correct sparsity level a priori. The other methods do not require prior assumptions of the degree of sparsity. The Lorentzian-based methods, namely LBP and LIHT, are sensitive to finding a good initial estimate of the scale parameter whereas the HIHT method estimates both the signal and the scale parameter of the cost function. The *ℓ*
_{1}-CD method depends a lot on the number of iterations and a rate decay parameter that has to be fixed beforehand. The HIHT method relies on a good tuning of the *c* constant to get a good performance.

## 6 Conclusions

We presented a review of robust sparse reconstruction strategies in CS when the compressive measurements are corrupted by outliers or impulsive noise. The reviewed methods are based on employing M-estimators as data fitting terms and include greedy and optimization-based approaches to solve the inverse problems. The robust methods are shown to outperform existing CS techniques (that traditionally use *ℓ*
_{2} norms for data fitting) when the measurements have gross errors, while having similar performance in light-tailed environments.

## Declarations

### Competing interests

The authors declare that they have no competing interests.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- DL Donoho, Compressed sensing. IEEE Trans. Inf. Theory.
**52**(4), 1289–1306 (2006).MathSciNetMATHView ArticleGoogle Scholar - EJ Candès, T Tao, Near-optimal signal recovery from random projections: universal encoding strategies?IEEE Trans. Inf. Theory.
**52**(12), 5406–5425 (2006).MathSciNetMATHView ArticleGoogle Scholar - EJ Candès, in
*Proceedings, Int. Congress of Mathematics*. Compressive sampling (European Mathematical SocietyMadrid, 2006), pp. 1433–1452.Google Scholar - EJ Candès, MB Wakin, An introduction to compressive sampling. IEEE Signal Proc. Mag.
**25**(2), 21–30 (2008).View ArticleGoogle Scholar - M Fornasier, H Rauhut, in
*Handbook of Mathematical Methods in Imaging*, ed. by O Scherzer. Compressed sensing (SpringerNew York, 2011).Google Scholar - CG Graff, EY Sidky, Compressive sensing in medical imaging. Appl. Opt.
**54**(8), 23–44 (2015).View ArticleGoogle Scholar - RE Carrillo, JD McEwen, Y Wiaux, PURIFY: a new approach to radio-interferometric imaging. Mon. Not. R. Astron. Soc.
**439**(4), 3591–3604 (2014).View ArticleGoogle Scholar - LC Potter, E Ertin, JT Parker, M Cetin, Sparsity and compressed sensing in radar imaging. Proc. IEEE.
**98**(6), 1006–1020 (2010).View ArticleGoogle Scholar - GR Arce, DJ Brady, L Carin, H Arguello, DS Kittle, Compressive coded aperture spectral imaging: an introduction. IEEE Signal Proc. Mag.
**31**(1), 105–115 (2014).View ArticleGoogle Scholar - YC Eldar, G Kutyniok,
*Compressed sensing: theory and applications*(Cambridge University Press, Cambridge, 2012).View ArticleGoogle Scholar - SA Kassam, HV Poor, Robust techniques for signal processing: a survey. Proc. IEEE.
**73**(3), 433–481 (1985).MATHView ArticleGoogle Scholar - A Swami, B Sadler, On some detection and estimation problems in heavy-tailed noise. Signal Process.
**82**(12), 1829–1846 (2002).MATHView ArticleGoogle Scholar - JG Gonzales, J-L Paredes, GR Arce, Zero-order statistics: a mathematical framework for the processing and characterization of very impulsive signals. IEEE Trans. Signal Process.
**54**(10), 3839–3851 (2006).View ArticleGoogle Scholar - A Zoubir, V Koivunen, Y Chakhchoukh, M Muma, Robust estimation in signal processing. IEEE Signal Proc. Mag.
**29**(4), 61–80 (2012).View ArticleGoogle Scholar - KE Barner, GR Arce (eds.), Nonlinear signal and image processing: theory, methods, and applications (CRC Press, Boca Raton, 2004).Google Scholar
- GR Arce,
*Nonlinear signal processing: a statistical approach*(Wiley, New York, 2005).MATHGoogle Scholar - PJ Huber,
*Robust statistics*(Wiley, New York, 1981).MATHView ArticleGoogle Scholar - F Hampel, E Ronchetti, P Rousseeuw, W Stahel,
*Robust statistics: the approach based on influence functions*(Wiley, New York, 1986).MATHGoogle Scholar - RA Maronna, RD Martin, VJ Yohai, WA Stahel,
*Robust statistics: theory and methods*(Wiley, New York, 2006).MATHView ArticleGoogle Scholar - H Wang, G Li, G Jiang, Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J. Bus. Econ. Stat.
**25**(3), 347–355 (2007).MathSciNetView ArticleGoogle Scholar - B Popilka, S Setzer, G Steidl, Signal recovery from incomplete measurements in the presence of outliers. Inverse Probl. Imaging.
**1**(4), 661–672 (2007).MathSciNetMATHView ArticleGoogle Scholar - J Wright, AY Yang, A Ganesh, S Sastryand, Y Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell.
**31**(2), 210–227 (2009).View ArticleGoogle Scholar - J Laska, M Davenport, RG Baraniuk, in
*Proc. IEEE Asilomar Conference on Signals, Systems and Computers*. Exact signal recovery from sparsely corrupted measurements through the pursuit of justice (IEEEPacific Grove, 2009).Google Scholar - J Wright, Y Ma, Dense error correction via
*ℓ*_{1}-minimization. IEEE Trans. on Inf. Theory.**56**(7), 3540–3560 (2010).MathSciNetView ArticleGoogle Scholar - Z Li, F Wu, J Wright, in
*Proc. Data Compression Conference*. On the systematic measurement matrix for compressed sensing in the presence of gross errors (IEEESnowbird, 2010).Google Scholar - C Studer, P Kuppinger, G Pope, H Bolcskei, Recovery of sparsely corrupted signals. IEEE Trans. Inf. Theory.
**58**(5), 3115–3130 (2012).MathSciNetView ArticleGoogle Scholar - X Li, Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx.
**37**(1), 73–99 (2013).MathSciNetMATHView ArticleGoogle Scholar - NH Nguyen, TD Tran, Exact recoverability from dense corrupted observations via
*ℓ*_{1}-minimization. IEEE Trans. Inf. Theory.**59**(4), 3540–3560 (2013).MathSciNetGoogle Scholar - NH Nguyen, TD Tran, Robust lasso with missing and grossly corrupted observations. IEEE Trans. Inf. Theory.
**59**(4), 2036–2058 (2013).MathSciNetView ArticleGoogle Scholar - C Studer, R Baraniuk, Stable restoration and separation of approximately sparse signals. Appl. Comput. Harmon. Anal.
**37**(1), 12–35 (2014).MathSciNetMATHView ArticleGoogle Scholar - R Foygel, L Mackey, Corrupted sensing: novel guarantees for separating structured signals. IEEE Trans. Inf. Theory.
**60**(2), 1223–1247 (2014).MathSciNetView ArticleGoogle Scholar - M McCoy, J Tropp, Sharp recovery bounds for convex demixing, with applications. Found. Comput. Math.
**14**(3), 503–567 (2014).MathSciNetMATHView ArticleGoogle Scholar - EJ Candès, X Li, Y Ma, J Wright, Robust principal component analysis?J. ACM.
**58**(3), 11–11137 (2011).MathSciNetMATHView ArticleGoogle Scholar - Y Jin, BD Rao, in
*Proc. IEEE Int. Conf. Acoust. Speech Signal Process.*Algorithms for robust linear regression by exploiting the connection to sparse signal recovery (IEEEDallas, TX, 2010), pp. 3830–3833.Google Scholar - G Mateos, G Giannakis, Robust nonparametric regression via sparsity control with application to load curve data cleansing. IEEE Trans. Signal Process.
**60**(4), 1571–1584 (2012).MathSciNetView ArticleGoogle Scholar - K Mitra, A Veeraraghavan, R Chellappa, Analysis of sparse regularization based robust regression approaches. IEEE Trans. Signal Process.
**61**(5), 1249–1257 (2013).MathSciNetView ArticleGoogle Scholar - G Papageorgiou, P Bouboulis, S Theodoridis, K Themelis, in
*Proc. Eur. Signal Processing Conf. (EUSIPCO)*. Robust linear regression analysis: the greedy way (IEEELisbon, 2014).Google Scholar - G Papageorgiou, P Bouboulis, S Theodoridis, Robust linear regression analysis—a greedy approach. IEEE Trans. Signal Process.
**63**(15), 3872–3887 (2015).MathSciNetView ArticleGoogle Scholar - EJ Candès, T Tao, Decoding by linear programming. IEEE Trans. Inf. Theory.
**51**(12), 4203–4215 (2005).MathSciNetMATHView ArticleGoogle Scholar - EJ Candès, P Randall, Highly robust error correction by convex programming. IEEE Trans. Inf. Theory.
**54**(7), 2829–2840 (2008).MathSciNetMATHView ArticleGoogle Scholar - RE Carrillo, KE Barner, TC Aysal, Robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise. IEEE J. Sel. Topic Signal Process.
**4**(2), 392–408 (2010).View ArticleGoogle Scholar - RE Carrillo, KE Barner, Lorentzian iterative hard thresholding: robust compressed sensing with prior information. IEEE Trans. Signal Process.
**61**(19), 4822–4833 (2013).MathSciNetView ArticleGoogle Scholar - A Ramirez, GR Arce, D Otero, J-L Paredes, B Sadler, Reconstruction of sparse signals from l1 dimensionality-reduced Cauchy random projections. IEEE Trans. Signal Process.
**60**(11), 5725–5737 (2012).MathSciNetView ArticleGoogle Scholar - J-L Paredes, GR Arce, Compressive sensing signal reconstruction by weighted median regression estimates. IEEE Trans. Signal Process.
**59**(3), 2585–2601 (2011).MathSciNetView ArticleGoogle Scholar - H Zhang, Y Chi, Y Liang, in
*Proc. International Conference on Machine Learning*. Provable non-convex phase retrieval with outliers: median truncated Wirtinger flow (International Machine Learning SocietyNew York, 2016). http://jmlr.org/proceedings/papers/v48/. - M Filipovic, in
*Proc. IEEE Int. Conf. Acoust., Speech, Signal Process*. Reconstruction of sparse signals from highly corrupted measurements by nonconvex minimization (IEEEFlorence, 2014).Google Scholar - F Wen, Y Liu, RC Qui, W Yu, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. Robust sparse recovery for compressive sensing in impulsive noise using ℓ p -norm model fitting (Shangai, 2016).Google Scholar
- W-J Zeng, HC So, X Jiang, Outlier-robust greedy pursuit algorithms in
*ℓ*_{ p }-space for sparse approximation. IEEE Trans. Signal Process.**64**(1), 60–75 (2016).MathSciNetView ArticleGoogle Scholar - SA Razavi, E Ollila, V Koivunen, in
*Proc. Eur. Signal Processing Conf. (EUSIPCO)*. Robust greedy algorithms for compressed sensing (IEEEBucharest, 2012).Google Scholar - E Ollila, HJ Kim, V Koivunen, in
*Proc. Int. Symp. Comm., Control and Signal Processing*. Robust iterative hard thresholding for compressed sensing (IEEEAthens, 2014).Google Scholar - DS Pham, S Venkatesh, Improved image recovery from compressed data contaminated with impulsive noise. IEEE Trans. Image Process.
**21**(1), 397–405 (2012).MathSciNetView ArticleGoogle Scholar - DS Pham, S Venkatesh, Efficient algorithms for robust recovery of images from compressed data. IEEE Trans. Image Process.
**22**(12), 4724–4737 (2013).MathSciNetView ArticleGoogle Scholar - RE Carrillo, TC Aysal, KE Barner, in
*Proc. IEEE Int. Conf. Acoust., Speech, Signal Process*. Bayesian compressed sensing using generalized Cauchy priors (IEEEDallas, 2010).Google Scholar - J Shang, Z Wang, Q Huang, A robust algorithm for joint sparse recovery in presence of impulsive noise. IEEE Signal Proc. Lett.
**22**(8), 1166–1170 (2015).View ArticleGoogle Scholar - G Mateos, G Giannakis, Robust PCA as bilinear decomposition with outlier-sparsity regularization. IEEE Trans. on Signal Process.
**60**(10), 5176–5190 (2012).MathSciNetView ArticleGoogle Scholar - EJ Candès, The restricted isometry property and its implications for compressed sensing. Compte Rendus de l’Academie des Sciences, Paris, Series I.
**346:**, 589–592 (2008).MathSciNetMATHGoogle Scholar - R Baraniuk, M Davenport, R DeVore, M Walkin, A simple proof of the restricted isometry property for random matrices. Constr. Approx.
**28**(3), 253–263 (2008).MathSciNetMATHView ArticleGoogle Scholar - R Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol.
**58**(1), 267–288 (1996).MathSciNetMATHGoogle Scholar - A Beck, M Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.
**2**(1), 183–202 (2009).MathSciNetMATHView ArticleGoogle Scholar - JA Tropp, AC Gilbert, Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory.
**53**(12), 4655–4666 (2007).MathSciNetMATHView ArticleGoogle Scholar - T Tony, L Wang, Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory.
**57**(7), 4680–4688 (2011).MathSciNetView ArticleGoogle Scholar - D Needell, R Vershynin, Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Found. Computat. Math.
**09**(3), 317–334 (2009).MathSciNetMATHView ArticleGoogle Scholar - T Blumensath, ME Davies, Iterative hard thresholding for compressed sensing. Appl. Comp. Harm. Analys.
**27**(3), 265–274 (2009).MathSciNetMATHView ArticleGoogle Scholar - W Li, JJ Swetits, The linear l1-estimator and the Huber M-estimator. SIAM J. Optimiz.
**8**(2), 457–475 (1998).MathSciNetMATHView ArticleGoogle Scholar - TE Dielman, Least absolute value regression: recent contributions. J. Statist. Computat. Simul.
**75**(4), 263–286 (2005).MathSciNetMATHView ArticleGoogle Scholar - DRK Brownrigg, The weighted median filter. Commun. ACM.
**27**(8), 807–818 (1984).View ArticleGoogle Scholar - GR Arce, A general weighted median filter structure admitting negative weights. IEEE Trans. Signal Process.
**46**(12), 3195–3205 (1999).View ArticleGoogle Scholar - M Shao, CL Nikias, Signal processing with fractional lower order moments: stable processes and their applications. Proc. IEEE.
**81**(7), 986–1010 (1993).View ArticleGoogle Scholar - MA Arcones, Lp-estimators as estimates of a parameter of location for a sharp-pointed symmetric density. Scand. J. Stat.
**25**(4), 693–715 (1998).MathSciNetMATHView ArticleGoogle Scholar - W-J Zeng, H-C So, L Huang,
*ℓ*_{ p }-MUSIC: Robust direction-of-arrival estimator for impulsive noise enviroments. IEEE Trans. Signal Process.**61**(17), 4296–4308 (2013).MathSciNetView ArticleGoogle Scholar - W-J Zeng, H-C So, AM Zoubir, An
*ℓ*_{ p }-norm minimization approach to time delay estimation in impulsive noise. Digital Signal Process.**23**(4), 1247–1254 (2013).MathSciNetView ArticleGoogle Scholar - JG Gonzalez, GR Arce, Statistically-efficient filtering in impulsive environments: weighted myriad filters. EURASIP J. Appl. Sign. Proc.
**1:**, 4–20 (2002).MATHView ArticleGoogle Scholar - S Kalluri, GR Arce, Adaptive weighted myriad filter algorithms for robust signal processing in
*α*-stable noise environments. IEEE Trans. Signal Process.**46**(2), 322–334 (1998).View ArticleGoogle Scholar - S Kalluri, GR Arce, Fast algorithms for weighted myriad computation by fixed-point search. IEEE Trans. Signal Process.
**48**(1), 159–171 (2000).MathSciNetMATHView ArticleGoogle Scholar - RC Nunez, JG Gonzalez, GR Arce, JP Nolan, Fast and accurate computation of the myriad filter via branch-and-bound search. IEEE Trans. Signal Process.
**56**(7), 3340–3346 (2008).MathSciNetView ArticleGoogle Scholar - JG Gonzales, GR Arce, Optimality of the myriad filter in practical impulsive-noise environments. IEEE Trans. Signal Process.
**49**(2), 438–441 (2001).View ArticleGoogle Scholar - RE Carrillo, TC Aysal, KE Barner, A generalized Cauchy distribution framework for problems requiring robust behavior. EURASIP J. Adv. Signal Process.
**2010**(Article ID 312989), 19 (2010).Google Scholar - TC Aysal, KE Barner, Meridian filtering for robust signal processing. IEEE Trans. Signal Process.
**55**(8), 39449–3962 (2007).MathSciNetView ArticleGoogle Scholar - J Yang, Y Zhang, Alternating direction algorithms for l1-problems in compressive sensing. SIAM J. Sci. Comput.
**33**(1), 250–278 (2011).MathSciNetMATHView ArticleGoogle Scholar - Y Xiao, H Zhu, S-Y Wu, Primal and dual alternating direction algorithms for
*ℓ*_{1}-*ℓ*_{1}norm minimization problems in compressive sensing. Comput. Optim. Appl.**54**(2), 441–459 (2013).MathSciNetMATHView ArticleGoogle Scholar - P Gong, C Zhang, Z Lu, J Huang, J Ye, in
*Proc. International Conference on Machine Learning*. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems (International Machine Learning SocietyAtlanta, 2013).Google Scholar - D Needell, JA Tropp, Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Appl. Compos. Harmon. Anal.
**26**(3), 301–321 (2008).MathSciNetMATHView ArticleGoogle Scholar - AB Ramirez, GR Arce, BM Sadler, in
*Proc. 18th European Signal Process. Conf*. Fast algorithms for reconstruction of sparse signals from Cauchy random projections (IEEEAalborg, 2010), pp. 432–436.Google Scholar - V Cevher, S Becker, M Schmidt, Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Proc. Mag.
**31**(5), 32–43 (2014).View ArticleGoogle Scholar