 Research Article
 Open Access
A Generalized Cauchy Distribution Framework for Problems Requiring Robust Behavior
 Rafael E. Carrillo^{1}Email author,
 Tuncer C. Aysal^{1} and
 Kenneth E. Barner^{1}
https://doi.org/10.1155/2010/312989
© Rafael E. Carrillo et al. 2010
 Received: 8 February 2010
 Accepted: 7 August 2010
 Published: 10 August 2010
Abstract
Statistical modeling is at the heart of many engineering problems. The importance of statistical modeling emanates not only from the desire to accurately characterize stochastic events, but also from the fact that distributions are the central models utilized to derive sample processing theories and methods. The generalized Cauchy distribution (GCD) family has a closedform pdf expression across the whole family as well as algebraic tails, which makes it suitable for modeling many reallife impulsive processes. This paper develops a GCD theorybased approach that allows challenging problems to be formulated in a robust fashion. Notably, the proposed framework subsumes generalized Gaussian distribution (GGD) familybased developments, thereby guaranteeing performance improvements over traditional GCDbased problem formulation techniques. This robust framework can be adapted to a variety of applications in signal processing. As examples, we formulate four practical applications under this framework: (1) filtering for power line communications, (2) estimation in sensor networks with noisy channels, (3) reconstruction methods for compressed sensing, and (4) fuzzy clustering.
Keywords
 Fusion Center
 Influence Function
 Stable Distribution
 Sparse Signal
 Cauchy Distribution
1. Introduction
Traditional signal processing and communications methods are dominated by three simplifying assumptions: the systems under consideration are linear; the signal and noise processes are stationary and Gaussian distributed. Although these assumptions are valid in some applications and have significantly reduced the complexity of techniques developed, over the last three decades practitioners in various branches of statistics, signal processing, and communications have become increasingly aware of the limitations these assumptions pose in addressing many realworld applications. In particular, it has been observed that the Gaussian distribution is too lighttailed to model signals and noise that exhibits impulsive and nonsymmetric characteristics [1]. A broad spectrum of applications exists in which such processes emerge, including wireless communications, teletraffic, hydrology, geology, atmospheric noise compensation, economics, and image and video processing (see [2, 3] and references therein). The need to describe impulsive data, coupled with computational advances that enable processing of models more complicated than the Gaussian distribution, has thus led to the recent dynamic interest in heavytailed models.
Robust statistics—the stability theory of statistical procedures—systematically investigates deviation from modeling assumption affects [4]. Maximum likelihood (ML) type estimators (or more generally, estimators) developed in the theory of robust statistics are of great importance in robust signal processing techniques [5]. estimators can be described by a cost functiondefined optimization problem or by its first derivative, the latter yielding an implicit equation (or set of equations) that is proportional to the influence function. In the location estimation case, properties of the influence function describe the estimator robustness [4]. Notably, ML location estimation forms a special case of estimation, with the observations taken to be independent and identically distributed and the cost function set proportional to the logarithm of the common density function.
To address as wide an array of problems as possible, modeling and processing theories tend to be based on density families that exhibit a broad range of characteristics. Signal processing methods derived from the generalized Gaussian distribution (GGD), for instance, are popular in the literature and include works addressing heavytailed process [2, 3, 6–8]. The GGD is a family of closed form densities, with varying tail parameter, that effectively characterizes many signal environments. Moreover, the closed form nature of the GGD yields a rich set of distribution optimal error norms ( , , and ), and estimation and filtering theories, for example, linear filtering, weighted median filtering, fractional low order moment (FLOM) operators, and so forth. [3, 6, 9–11]. However, a limitation of the GGD model is the tail decay rate—GGD distribution tails decay exponentially rather than algebraically. Such light tails do not accurately model the prevalence of outliers and impulsive samples common in many of today's most challenging statistical signal processing and communications problems [3, 12, 13].
As an alternative to the GGD, the stable density family has gained recent popularity in addressing heavytailed problems. Indeed, symmetric stable processes exhibit algebraic tails and, in some cases, can be justified from first principles (Generalized Central Limit Theorem) [14–16]. The index of stability parameter, , provides flexibility in impulsiveness modeling, with distributions ranging from lighttailed Gaussian ( ) to extremely impulsive ( ). With the exception of the limiting Gaussian case, stable distributions are heavytailed with infinite variance and algebraic tails. Unfortunately, the Cauchy distribution ( ) is the only algebraictailed stable distribution that possesses a closed form expression, limiting the flexibility and performance of methods derived from this family of distributions. That is, the single distribution Cauchy methods (Lorentzian norm, weighted myriad) are the most commonly employed stable family operators [12, 17–19].
The Cauchy distribution, while intersecting the stable family at a single point, is generalized by the introduction of a varying tail parameter, thereby forming the Generalized Cauchy density (GCD) family. The GCD has a closed form pdf across the whole family, as well as algebraic tails that make it suitable for modeling reallife impulsive processes [20, 21]. Thus the GCD combines the advantages of the GGD and stable distributions in that it possesses heavy, algebraic tails (like stable distributions) and closed form expressions (like the GGD) across a flexible family of densities defined by a tail parameter, . Previous GCD family development focused on the particular (Cauchy distribution) and (meridian distribution) cases, which lead to the myriad and meridian [13, 22] estimators, respectively. (It should be noted that the original authors derived the myriad filter starting from stable distributions, noting that there are only two closedform expressions for stable distributions [12, 17, 18].) These estimators provide a robust framework for heavytail signal processing problems.
In yet another approach, the generalized model is shown to provide excellent fits to different types of atmospheric noise [23]. Indeed, Hall introduced the family of generalized distributions in 1966 as an empirical model for atmospheric radio noise [24]. The distribution possesses algebraic tails and a closed form pdf. Like the stable family, the generalized model contains the Gaussian and the Cauchy distributions as special cases, depending on the degrees of freedom parameter. It is shown in [18] that the myriad estimator is also optimal for the generalized family of distributions. Thus we focus on the GCD family of operators, as their performance also subsumes that of generalized approaches.
In this paper, we develop a GCDbased theoretical approach that allows challenging problems to be formulated in a robust fashion. Within this framework, we establish a statistical relationship between the GGD and GCD families. The proposed framework subsumes GGDbased developments (e.g., least squares, least absolute deviation, FLOM, norms, means clustering, etc.), thereby guaranteeing performance improvements over traditional problem formulation techniques. The developed theoretical framework includes robust estimation and filtering methods, as well as robust error metrics. A wide array of applications can be addressed through the proposed framework, including, among others, robust regression, robust detection and estimation, clustering in impulsive environments, spectrum sensing when signals are corrupted by heavytailed noise, and robust compressed sensing (CS) and reconstruction methods. As illustrative and evaluation examples, we formulate four particular applications under this framework: filtering for power line communications, estimation in sensor networks with noisy channels, reconstruction methods for compressed sensing, and fuzzy clustering.
The organization of the paper is as follows. In Section 2, we present a brief review of estimation theory and the generalized Gaussian and generalized Cauchy density families. A statistical relationship between the GGD and GCD is established, and the ML location estimate from GCD statistics is derived. An type estimator, coined MGC estimator, is derived in Section 3 from the cost function emerging in GCDbased ML estimation. Properties of the proposed estimator are analyzed, and a weighted filter structure is developed. Numerical algorithms for multiparameter estimation are also presented. A family of robust metrics derived from the GCD are detailed in Section 4, and their properties are analyzed. Four illustrative applications of the proposed framework are presented in Section 5. Finally, we conclude in Section 6 with closing thoughts and future directions.
2. Distributions, Optimal Filtering, and Estimation
This section presents estimates, a generalization of maximum likelihood (ML) estimates, and discusses optimal filtering from an ML perspective. Specifically, it discusses statistical models of observed samples obeying generalized Gaussian statistics and relates the filtering problem to maximum likelihood estimation. Then, we present the generalized Cauchy distribution, and a relation between GGD and GCD random variables is introduced. The ML estimators for GCD statistics are also derived.
2.1. Estimation
is called an estimate (or maximum likelihood type estimate). Here is an arbitrary cost function to be designed, and . Note that estimators are a special case of estimators with , where is the probability density function of the observations. In general, estimators do not necessarily relate to probability density functions.
For estimates it can be shown that the influence function is proportional to [4, 25], meaning that we can derive the robustness properties of an estimator, namely, efficiency and bias in the presence of outliers, if is known.
2.2. Generalized Gaussian Distribution
where is the gamma function , is the location parameter, and is a constant related to the standard deviation , defined as . In this form, is an inverse scale parameter, and , sometimes called the shade parameter, controls the tail decay rate. The GGD model contains the Laplacian and Gaussian distributions as special cases, that is, for and , respectively. Conceptually, the lower the value of is the more impulsive the distribution is. The ML location estimate for GGD statistics is reviewed in the following. Detailed derivations of these results are given in [3].
There are two special cases of the GGD family that are well studied: the Gaussian ( ) and the Laplacian ( ) distributions, which yield the well known weighted mean and weighted median estimators, respectively. When all samples are identically distributed for the special cases, the mean and median estimators are the resulting operators. These estimators are formally defined in the following.
Definition 1.
where and denotes the (multiplicative) weighting operation.
Definition 2.
Through arguments similar to those above, the cases yield the fractional lower order moment (FLOM) estimation framework [9]. For , the resulting estimators are selection type. A drawback of FLOM estimators for is that their computation is, in general, nontrivial, although suboptimal (for ) selectiontype FLOM estimators have been introduced to reduce computational costs [6].
2.3. Generalized Cauchy Distribution
with . In this representation, is the location parameter, is the scale parameter, and is the tail constant. The GCD family contains the Meridian [13] and Cauchy distributions as special cases, that is, for and , respectively. For , the tail of the pdf decays slower than in the Cauchy distribution case, resulting in a heaviertailed distribution.
with .
Since the estimator defined in (12) is a special case of that defined in (13), we only provide a detailed derivation for the latter. The estimator defined in (13) can be used to extend the GCDbased estimator to a robust weighted filter structure. Furthermore, the derived filter can be extended to admit realvalued weights using the signcoupling approach [8].
2.4. Statistical Relationship between the Generalized Cauchy and Gaussian Distributions
Before closing this section, we bring to light an interesting relationship between the Generalized Cauchy and Generalized Gaussian distributions. It is wellknown that a Cauchy distributed random variable (GCD ) is generated by the ratio of two independent Gaussian distributed random variables (GGD ). Recently, Aysal and Barner showed that this relationship also holds for the Laplacian and Meridian distributions [13], that is, the ratio of two independent Laplacian (GGD ) random variables yields a Meridian (GCD ) random variable. In the following, we extend this finding to the complete set of GGD and GCD families.
Lemma 1.
The random variable formed as the ratio of two independent zeromean GGD distributed random variables and , with tail constant and scale parameters and , respectively, is a GCD random variable with tail parameter and scale parameter .
Proof.
See Appendix A.
3. Generalized CauchyBased Robust Estimation and Filtering
In this section we use the GCD ML location estimate cost function to define an type estimator. First, robustness and properties of the derived estimator are analyzed, and the filtering problem is then related to estimation. The proposed estimator is extended to a weighted filtering structure. Finally, practical algorithms for the multiparameter case are developed.
3.1. Generalized CauchyBased Estimation
The flexibility of this cost function, provided by parameters and , and robust characteristics make it wellsuited to define an type estimator, which we coin the MGC estimator. To define the form of this estimator, denote as a vector of observations and as the common location parameter of the observations.
Definition 3.
The special and cases yield the myriad [18] and meridian [13] estimators, respectively. The generalization of the MGC estimator, for , is analogous to the GGDbased FLOM estimators and thereby provides a rich and robust framework for signal processing applications.
As the performance of an estimator depends on the defining objective function, the properties of the objective function at hand are analyzed in the following.
Proposition 1.
Let denote the objective function (for fixed and ) and the order statistics of . Then the following statements hold.
is strictly decreasing for and strictly increasing for .
All local extrema of lie in the interval .
If , the solution is one of the input samples (selection type filter).
If , then the objective function has at most local extrema points and therefore a finite set of local minima.
Proof.
See Appendix B.
The following properties detail the MGC estimator behavior as goes to either or . Importantly, the results show that the MGC estimator subsumes other classical estimator families.
Property 1.
Proof.
See Appendix C.
Intuitively, this result is explained by the fact that becomes negligible as grows large compared to . This, combined with the fact that when , which is an equality in the limit, yields the resulting cost function behavior. The importance of this result is that MGC estimators include estimators with norm ( ) cost functions. Thus MGC (GCDbased) estimators should be at least as powerful as GGDbased estimators (linear FIR, median, FLOM) in lighttailed applications, while the untapped algebraic tail potential of GCD methods should allow them to substantially outperform in heavytailed applications.
In contrast to the equivalence with norm approaches for large, MGC estimators become more resistant to impulsive noise as decreases. In fact, as the MGC yields a mode type estimator with particularly strong impulse rejection.
Property 2.
where is the set of most repeated values.
Proof.
See Appendix D.
This modetype estimator treats every observation as a possible outlier, assigning greater influence to the most repeated values in the observations set. This property makes the MGC a suitable framework for applications such as image processing, where selectiontype filters yield good results [7, 13, 18].
3.2. Robustness and Analysis of MGC Estimators
To further characterize estimates, it is useful to list the desirable features of a robust influence function [4, 25].
 (i)
Robustness. An estimator is robust if the supremum of the absolute value of the influence function is finite.
 (ii)
Rejection Point. The rejection point, defined as the distance from the center of the influence function to the point where the influence function becomes negligible, should be finite. Rejection point measures whether the estimator rejects outliers and, if so, at what distance.
The MGC estimate is robust and has a finite rejection point that depends on the scale parameter and the tail parameter . As , the influence function has higher decay rate, that is, as the MGC estimator becomes more robust to outliers. Also of note is that , that is, the influence function is asymptotically redescending, and the effect of outliers monotonically decreases with an increase in magnitude [25].
The MGC also possesses the followings important properties.
Property 3 (outlier rejection).
Property 4 (no undershoot/overshoot).
where and .
According to Property 3, large errors are efficiently eliminated by an MGC estimator with finite . Note that this property can be applied recursively, indicating that MGC estimators eliminate multiple outliers. The proof of this statement follows the same steps used in the proof of the meridien estimator Property [13] and is thus omitted. Property 4 states that the MGC estimator is BIBO stable, that is, the output is bounded for bounded inputs. Proof of Property 4 follows directly from Propositions 1 and 2 and is thus omitted.
Since MGC estimates are estimates, they have desirable asymptotic behavior, as noted in the following property and discussion.
Property 5 (asymptotic consistency).
Proof of Property 5 follows from the fact that the MGC estimator influence function is odd, bounded, and continuous (except at the origin, which is a set of measure zero); argument details parallel those in [4].
The expectation is taken with respect to , the underlying distribution of the data. The last expression is the asymptotic variance of the estimator. Hence, the variance of decreases as increases, meaning that MGC estimates are asymptotically efficient.
3.3. Weighted MGC Estimators
The filtering structure defined in (24) is an Msmoother estimator, which is in essence a lowpasstype filter. Utilizing the sign coupling technique [8], the MGC estimator can be extended to accept realvalued weights. This yields the general structure detailed in the following definition.
Definition 4.
where denotes a vector of realvalued weights.
The WMGC estimators inherit all the robustness and convergence properties of the unweighted MGC estimators. Thus as in the unweighted case, WMGC estimators subsume GGDbased (weighted) estimators, indicating that WMGC estimators are at least as powerful as GGDbased estimators (linear FIR, weighted median, weighted FLOM) in lighttailed environments, while WMGC estimator characteristics enable them to substantially outperform in heavytailed impulsive environments.
3.4. Multiparameter Estimation
The location estimation problem defined by the MGC filter depends on the parameters and . Thus to solve the optimal filtering problem, we consider multiparameter estimates [26]. The applied approach utilizes a small set of signal samples to estimate and and then uses these values in the filtering process (although a fully adaptive filter can also be implemented using this scheme).
where and is the digamma function. (The digamma function is defined as , where is the Gamma function.) It can be noticed that (28) is the implicit equation for the MGC estimator with as defined in (18), implying that the location estimate has the same properties derived above.
Of note is that has a unique maximum in for fixed and , and also a unique maximum in for fixed and and . In the following, we provide an algorithm to iteratively solve the above set of equations.
Multiparameter Estimation Algorithm
For a given set of data , we propose to find the optimal joint parameter estimates by the iterative algorithm details in Algorithm 1, with the superscript denoting iteration number.
Algorithm 1: Multiparameter estimation algorithm.
Require: Data set and tolerances .
Initialize and .
while , and do
Estimate as the solution of (30).
Estimate as the solution of (28).
Estimate as the solution of (29).
end while
return , and .
The algorithm is essentially an iterated conditional mode (ICM) algorithm [27]. Additionally, it resembles the expectation maximization (EM) algorithm [28] in the sense that, instead of optimizing all parameters at once, it finds the optimal value of one parameter given that the other two are fixed; it then iterates. While the algorithm converges to a local minimum, experimental results show that initializing as the sample median and as the median absolute deviation (MAD), and then computing as a solution to (30), accelerates the convergence and most often yields globally optimal results. In the classical literaturefixedpoint algorithms are successfully used in the computation of estimates [3, 4]. Hence, in the following, we solve items 3–5 in Algorithm 1 using fixedpoint search routines.
FixedPoint Search Algorithms
with and where the subscript denotes the iteration number. The algorithm is taken as convergent when , where is a small positive value. The median is used as the initial estimate, which typically results in convergence to a (local) minima within a few iterations.
with . The algorithm terminates when for a small positive number. Since the objective function has only one minimum for fixed and , the recursion converges to the global result.
Noting that the search space is the interval , the function (27) can be evaluated for a finite set of points , keeping the value that maximizes , setting it as the initial point for the search.
Multiparameter Estimation Results for GCD Process with length and .
 10  100  1000 

 0.0035  0.0009  0.0002 
MSE  0.0302 


 0.9563  1.0224  1.0186 
MSE  0.0016 


 1.5816  1.8273  1.9569 
MSE  0.0519  0.0109 

To conclude this section, we consider the computational complexity of the proposed multiparameter estimation algorithm. The algorithm in total has a higher computational complexity than the FLOM, median, meridian, and myriad operators, since Algorithm 1 requires initial estimates of the location and the scale parameters. However, it should be noted that the proposed method estimates all the parameters of the model, thus providing advantage over the aforementioned methods that require a priori parameter tuning. It is straightforward to show that the computational complexity of the proposed method is , assuming the practical case in which the number of fixedpoint iterations is . The dominating term is the cost of selecting the input sample that minimizes the objective function, that is, the cost of evaluating the objective function times. However, if faster methods that avoid evaluation of the objective function for all samples (e.g., subsampling methods) are employed, the computational cost is lowered.
4. Robust Distance Metrics
This section presents a family of robust GCDbased error metrics. Specifically, the cost function of the MGC estimator defined in Section 3.1 is extended to define a quasinorm over and a semimetric for the same space—the development is analogous to norms emanating from the GGD family. We denote these semimetrics as the  ( ) norms. (Note that for the and case, this metric defines the  space in Banach space theory.)
Definition 5.
The norm is not a norm in the strictest sense since it does not meet the positive homogeneity and subadditivity properties. However, it follows the positive definiteness and a scale invariant properties.
Proposition 2.
 (i)
, with if and only if ;
 (ii)
, where ;
 (iii)
;
 (iv)
Proof.
The norm defines a robust metric that does not heavily penalize large deviations, with the robustness depending on the scale parameter and the exponent . The following lemma constructs a relationship between the norms and the norms.
Lemma 2.
Proof.
Noting that and for all gives the desired result.
 (i)
It is an everywhere continuous function.
 (ii)
It is convex near the origin ( ), behaving similar to an cost function for small variations.
 (iii)
Large deviations are not heavily penalized as in the or norm cases, leading to a more robust error metric when the deviations contain gross errors.
5. Illustrative Application Areas
This section presents four practical problems developed under the proposed framework: robust filtering for power line communications, robust estimation in sensor networks with noisy channels, robust reconstruction methods for compressed sensing, and robust fuzzy clustering. Each problem serves to illustrate the capabilities and performance of the proposed methods.
5.1. Robust Filtering
The use of existing power lines for transmitting data and voice has been receiving recent interest [30, 31]. The advantages of power line communications (PLCs) are obvious due to the ubiquity of power lines and power outlets. The potential of power lines to deliver broadband services, such as fast internet access, telephone, fax services, and home networking is emerging in new communications industry technology. However, there remain considerable challenges for PLCs, such as communications channels that are hampered by the presence of large amplitude noise superimposed on top of traditional white Gaussian noise. The overall interference is appropriately modeled as an algebraic tailed process, with stable often chosen as the parent distribution [31].
While the MGC filter is optimal for GCD noise, it is also robust in general impulsive environments. To compare the robustness of the MGC filter with other robust filtering schemes, experiments for symmetric stable noise corrupted PLCs are presented. Specifically, signal enhancement for the power line communication problem with a 4ASK signaling, and equiprobable alphabet , is considered. The noise is taken to be white, zero location, stable distributed with and ranging from 0.2 to 2 (very impulsive to Gaussian noise). The filtering process employed utilizes length nine sliding windows to remove the noise and enhance the signal. The MGC parameters were determined using the multiparameter estimation algorithm described in Section 3.4. This optimization was applied to the first 50 samples, yielding and . The MGC filter is compared to the FLOM, median, myriad, and meridian operators. The meridian tunable parameter was also set using the multiparameter optimization procedure, but without estimating . The myriad filter tuning parameter was set according to the curve established in [18].
5.2. Robust Blind Decentralized Estimation
where are zeromean independent channel noise samples and the transformation is made to adopt a binary phase shift keying (BPSK) scheme.
The channel noise density function is denoted by . When this noise is impulsive (e.g., atmospheric noise or underwater acoustic noise), traditional Gaussianbased methods (e.g., least squares) do not perform well. We extend the blind decentralized estimation method proposed in [33], modeling the channel corruption as GCD noise and deriving a robust estimation method for impulsive channel noise scenarios. The sensor noise, , is modeled as zeromean additive white Gaussian noise with variance , while the channel noise, , is modeled as zerolocation additive white GCD noise with scale parameter and tail constant . A realistic approach to the estimation problem in sensor networks assumes that the noise pdf is known but that the values of some parameters are unknown [33]. In the following, we consider the estimation problem when the sensor noise parameter is known and the channel noise tail constant and scale parameter are unknown.
Note that the resulting pdf is a GCD mixture with mixing parameters and . To simplify the problem, we first estimate and then utilize the invariance of the ML estimate to determine using (42).
The unknown parameter set for the estimation problem is . We address this problem utilizing the well known EM algorithm [28] and a variation of Algorithm 1 in Section 3.4. The followings are the  and steps for the considered sensor network application.
EStep
MStep
where and . We use a suboptimal estimate of in this case, choosing the value from that maximizes (46).
A sensor network with the following parameters is used: , , , and , and the results are averaged for 200 independent realizations. For the channel noise we use two models: contaminated Gaussian and stable distributions. Figure 7(a) shows results for contaminated Gaussian noise with the variance set as and varying (percentage of contamination) from to . The results show a gain of at least an order of magnitude over the Gaussianderived method. Results for stable distributed noise are shown in Figure 7(b), with scale parameter and the tail parameter, , varying from 0.2 to 2 (very impulsive to Gaussian noise). It can be observed that the GCDderived method has a gain of at least an order of magnitude for all . Furthermore, the MLUGC method has a nearly constant MSE for the entire range. It is of note that the MSE of the MLUGC method is comparable to that obtained by the MLUG (Gaussianderived) for the especial case when (Gaussian case), meaning that the GCDderived method is robust under heavytailed and lighttailed environments.
5.3. Robust Reconstruction Methods for Compressed Sensing
As a third example, consider compressed sensing, which is a recently introduced novel framework that goes against the traditional data acquisition paradigm [34]. Take a set of sensors making observations of a signal . Suppose that is sparse in some orthogonal basis , and let be a set of measurements vectors that are incoherent with the sparsity basis. Each sensor takes measurements projecting onto and communicates its observation to the fusion center over a noisy channel. The measurement process can be modeled as , where is an matrix with vectors as rows and is white additive noise (with possibly impulsive behavior). The problem is how to estimate from the noisy measurements .
A range of different algorithms and methods have been developed that enable approximate reconstruction of sparse signals from noisy compressive measurements [35–39]. Most such algorithms provide bounds for the reconstruction error based on the assumption that the corrupting noise is bounded, Gaussian, or, at a minimum, has finite variance. Recent works have begun to address the reconstruction of sparse signals from measurements corrupted by outliers, for example, due to missing data in the measurement process or transmission problems [40, 41]. These works are based on the sparsity of the measurement error pattern to first estimate the error and then estimate the true signal, in an iterative process. A drawback of this approach is that the reconstruction relies on the error sparsity to first estimate the error, but if the sparsity condition is not met, the performance of the algorithm degrades.
The following result presents an upper bound for the reconstruction error of the proposed estimator and is based on restricted isometry properties (RIPs) of the matrix (see [34, 42] and references therein for more details on RIPs).
Theorem 1 (see [42]).
where is a small constant.
Notably, controls the robustness of the employed norm and the radius of the feasibility set ball. Let be a Cauchy random variable with scale parameter and location parameter zero. Assuming a Cauchy model for the noise vector yields . We use this value for and set as MAD .
where . The final reconstruction after the regression ( ) is defined as for indexes in the subset and zero outside . The reconstruction algorithm composed of solving (48) followed by the debiasing step is referred to as Lorentzian basis pursuit (BP) [42].
Experiments evaluating the robustness of Lorentzian BP in different impulsive sampling noises are presented, comparing performance with traditional CS reconstruction algorithms orthogonal matching pursuit (OMP) [38] and basis pursuit denoising (BPD) [34]. The signals are synthetic sparse signals with and length . The number of measurements is . For OMP and BPD, the noise bound is set as , where is the scale parameter of the corrupting distributions. The results are averaged over 200 independent realizations.
The second experiment explores the behavior of Lorentzian BP in stable environments. The stable noise scale parameter is set as ( in the traditional characterization) for all cases, and the tail parameter, , is varied from 0.2 to 2, that is, very impulsive to the Gaussian case. The results are summarized in Figure 8(b), which shows that all methods perform poorly for small values of , with Lorentzian BP yielding the most acceptable results. Beyond , Lorentzian BP produces faithful reconstructions with an SNR greater than 20 dB, and often 30 dB greater than BPD and OMP results. Also of importance is that when (Gaussian case), the performance of Lorentzian BP is comparable with that of BPD and OMP, which are Gaussianderived methods. This result shows the robustness of Lorentzian BP under a broad range of noise models, from very impulsive heavytailed to lighttailed environments.
5.4. Robust Clustering
As a final example, we present a robust fuzzy clustering procedure based on the metrics defined in Section 4, which is suitable for clustering data points involving heavytailed nonGaussian processes. Dave proposed the noise clustering (NC) algorithm to address noisy data in [43, 44]. The NC approach is successful in improving the robustness of a variety of prototypebased clustering methods. This method considers the noise as a separate class and represents it by a prototype that has a constant distance .
Compared with the basic fuzzy Cmeans (FCM), the membership constraint is relaxed to . The second term in the denominator of (53) becomes large for outliers, thus yielding small membership values and improving robustness of prototypebased clustering algorithms.
where denotes the iteration number. The recursion is terminated when for some given . This method is used to find the update of the cluster centers. Alternation of (53) and (55) gives an algorithm to find the cluster centers that converge to a local minimum of the cost function.
In the NC approach, corresponds to crisp memberships, and increasing represents increased fuzziness and soft rejection of outliers. When is too large, spurious cluster may exist. The choice of the constant distance also influences the fuzzy membership; if it is too small, then we cannot distinguish good clusters from outliers, and if it is too large, the result diverges from the basic FCM. Based on [43], we set , where is a scale parameter. In order to reduce the local minimum caused by initialization of the NC approach, we use classical means on a small subset of the data to initialize a set of cluster centers. The proposed algorithm is summarized in Algorithm 2 and is coined the based Noise Clustering ( NC) algorithm.
Algorithm 2: based noise clustering algorithm.
Require: cluster number , weighting parameter , , maximum number of iterations or terminate parameter .
Initialize cluster centers.
While or a maximum number of iterations is not reached do
Compute the fuzzy set using (53) and
Update cluster centers using (55).
end while
return Cluster centroids .
Experimental results show that for multigroup heavytailed process, the results of the based method generally converges to the global minimum. However, to address the problem of local minima, the clustering algorithm is performed multiple times with different random initializations (subsets randomly sampled) and with a fixed small number of iterations. The best result is selected as the final solution.
Clustering results for GCD processes and stable process.
 MSE  MAD 
 Average Distance 

NC  0.34987  0.62897  0.0968  Cauchy 
NC  1.8186  1.8361  0.1262  15.39 
Similaritybased  1.6513  1.136  0.18236  
NC  0.85197  0.9283  0.1521  Meridian 
NC  5.887  2.7311  0.5573  50.363 
Similarity–based  5.2309  2.4627  1.8416  
NC  0.50408  0.73618  0.1896  stable 
NC  3.2105  2.7684  0.2174  44.435 
Similaritybased  1.7578  1.6322  1.0112 
To evaluate the results, we calculate the MSE, the mean absolute deviation (MAD), and the distance between the solutions and the true cluster centers, averaging the results for 200 trials. The NC approach is compared with classical NC employing the distance and the similaritybased method in [45]. The average distance between all points in the set (AD) is shown as a reference for each sample set. As the results show, GCDbased clustering outperforms both traditional NC and similaritybased methods in heavytailed environments. Of note is the meridian case, which is a very impulsive distribution. The GCD clustering results are significantly more accurate than those obtained by the other approaches.
6. Concluding Remarks
This paper presents a GCDbased theoretical approach that allows the formulation of challenging problems in a robust fashion. Within this framework, we establish a statistical relationship between the GGD and GCD families. The proposed framework, due to its flexibility, subsumes GGDbased developments, thereby guaranteeing performance improvements over the traditional problem formulation techniques. Properties of the derived techniques are analyzed. Four particular applications are developed under this framework: robust filtering for power line communications, robust estimation in sensor networks with noisy channels, robust reconstruction methods for compressed sensing, and robust fuzzy clustering. Results from the applications show that the proposed GCDderived methods provide a robust framework in impulsive heavytailed environments, with performance comparable to existing methods in less demanding lighttailed environments.
Appendices
A. Proof of Lemma 1
gives the desired result after substituting the corresponding expressions and letting and .
B. Proof of Proposition 1
 (1)
For , . Then , which implies that is strictly decreasing in that interval. Similarly for , and , showing that the function is strictly increasing in that interval.
 (2)
From we see that if , then all local extrema of lie in the interval .
 (3)
From (B.3) it can be seen that if , then for , therefore is concave in the intervals , . If all the extrema points lie in , the function is concave in , and since the function is not differentiable in the input samples (critical points), then the only possible local minimums of the objective function are the input samples.
 (4)
Clearly for each there exists a unique minima in . Also, it can be easily shown that is convex in the interval , where , and concave outside this interval (for ). The proof of this statement is divided in two parts. First we consider the case when and show that there exist at most local extrema for this case. Then by induction we generalize this result for any .
Let . If the cost function is convex in the interval since it is the sum of two convex functions (in that interval). Thus, has a unique minimizer. Now if , the cost function has at most one inflexion point (local maxima) between and at most two local minimas in the neighborhood of and since , , are concave outside the interval . Then, for we have at most local extrema points.
Suppose that we have samples. If , the cost function is convex in the interval since it is the sum of convex functions (in that interval) and it has only one global minima. Now suppose that , and also suppose that there are at most local extrema points. Let be a new sample in the data set, and without loss of generality assume that .
If , the new sample will not add a new extrema point to the cost function, due to convexity of for the interval and the fact that is strictly increasing for . If , the new sample will add at most two local extrema points (one local maxima and one local minima) in the interval . The local maxima is an inflexion point between , and the local minima is in the neighborhood of . Therefore, the total number of extrema points for is at most , which is the claim of the statement. This concludes the proof.
C. Proof of Property 1
D. Proof of Property 2
Declarations
Acknowledgment
This paper was supported in part by NSF under Grant no. 0728904.
Authors’ Affiliations
References
 Kuruoglu EE: Signal processing with heavytailed distributions. Signal Processing 2002, 82(12):18051806. 10.1016/S01651684(02)003122View ArticleMATHGoogle Scholar
 Barner KE, Arce GR: Nonlinear Signal and Image Processing: Theory, Methods and Applications. CRC Press, Boca Raton, Fla, USA; 2003.View ArticleGoogle Scholar
 Arce GR: Nonlinear Signal Processing: A Statistical Approach. John Wiley & Sons, New York, NY, USA; 2005.MATHGoogle Scholar
 Huber P: Robust Statistics. John Wiley & Sons, New York, NY, USA; 1981.View ArticleMATHGoogle Scholar
 Kassam SA, Poor HV: Robust techniques for signal processing: a survey. Proceedings of the IEEE 1985, 73(3):433481.View ArticleMATHGoogle Scholar
 Astola J, Neuvo Y: Optimal median type filters for exponential noise distributions. Signal Processing 1989, 17(2):95104. 10.1016/01651684(89)900133MathSciNetView ArticleGoogle Scholar
 Yin L, Yang R, Gabbouj M, Neuvo Y: Weighted median filters: a tutorial. IEEE Transactions on Circuits and Systems II 1996, 43(3):157192. 10.1109/82.486465View ArticleGoogle Scholar
 Arce GR: A general weighted median filter structure admitting negative weights. IEEE Transactions on Signal Processing 1998, 46(12):31953205. 10.1109/78.735296View ArticleGoogle Scholar
 Shao M, Nikias CL: Signal processing with fractional lower order moments: stable processes and their applications. Proceedings of the IEEE 1993, 81(7):9861010. 10.1109/5.231338View ArticleGoogle Scholar
 Barner KE, Aysal TC: Polynomial weighted median filtering. IEEE Transactions on Signal Processing 2006, 54(2):636650.MathSciNetView ArticleGoogle Scholar
 Aysal TC, Barner KE: Hybrid polynomial filters for Gaussian and nonGaussian noise environments. IEEE Transactions on Signal Processing 2006, 54(12):46444661.View ArticleGoogle Scholar
 Gonzales JG: Robust techniques for wireless communications in nongaussian environments, Ph.D. dissertation. ECE Department, University of Delaware; 1997.Google Scholar
 Aysal TC, Barner KE: Meridian filtering for robust signal processing. IEEE Transactions on Signal Processing 2007, 55(8):39493962.MathSciNetView ArticleGoogle Scholar
 Zolotarev V: OneDimensional Stable Distributions. American Mathematical Society, Providence, RI, USA; 1986.MATHGoogle Scholar
 Nolan JP: Stable Distributions: Models for Heavy Tailed Data. Birkhuser, Boston, Mass, USA; 2005.Google Scholar
 Brcich RF, Iskander DR, Zoubir AM: The stability test for symmetric alphastable distributions. IEEE Transactions on Signal Processing 2005, 53(3):977986.MathSciNetView ArticleGoogle Scholar
 Gonzalez JG, Arce GR: Optimality of the myriad filter in practical impulsivenoise environments. IEEE Transactions on Signal Processing 2001, 49(2):438441. 10.1109/78.902126View ArticleGoogle Scholar
 Gonzalez JG, Arce GR: Statisticallyefficient filtering in impulsive environments: weighted myriad filters. Eurasip Journal on Applied Signal Processing 2002, 2002(1):420. 10.1155/S1110865702000483View ArticleMATHGoogle Scholar
 Aysal TC, Barner KE: Myriadtype polynomial filtering. IEEE Transactions on Signal Processing 2007, 55(2):747753.MathSciNetView ArticleGoogle Scholar
 Rider PR: Generalized cauchy distributions. Annals of the Institute of Statistical Mathematics 1957, 9(1):215223. 10.1007/BF02892507MathSciNetView ArticleMATHGoogle Scholar
 Miller J, Thomas J: Detectors for discrete time signals in non Gaussian noise. IEEE Transactions on Information Theory 1972, 18(2):241250. 10.1109/TIT.1972.1054787View ArticleMATHGoogle Scholar
 Aysal TC: Filtering and estimation theory: firstorder, polynomial and decentralized signal processing, Ph.D. dissertation. ECE Department, University of Delaware; 2007.Google Scholar
 Middleton D: Statisticalphysical models of electromagnetic interference. IEEE Transactions on Electromagnetic Compatibility 1977, 19(3):106127.View ArticleGoogle Scholar
 Hall HM: A new model for impulsive phenomena: application to atmosphericnoise communication channels. Standford Electronics Laboratories, Standford University, Standford, Calif, USA; 1966.Google Scholar
 Hampel F, Ronchetti E, Rousseeuw P, Stahel W: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York, NY, USA; 1986.MATHGoogle Scholar
 Carrillo RE, Aysal TC, Barner KE: Generalized Cauchy distribution based robust estimation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), April 2008 33893392.Google Scholar
 Besag J: On the statiscal analysis of dirty pictures. Journal of the Royal Statistical Society. Series B 1986, 48(3):259302.MathSciNetMATHGoogle Scholar
 McLachlan G, Krishman T: The EM Algorithm and Extensions. John Wiley & Sons, New York, NY, USA; 1997.Google Scholar
 Hardy GH, Littlewood JE, Pólya G: Inequalities, Cambridge Mathematical Library. Cambridge University Press, Cambridge, Mass, USA; 1988.MATHGoogle Scholar
 Zimmermann M, Dostert K: Analysis and modeling of impulsive noise in broadband powerline communications. IEEE Transactions on Electromagnetic Compatibility 2002, 44(1):249258. 10.1109/15.990732View ArticleGoogle Scholar
 Ma YH, So PL, Gunawan E: Performance analysis of OFDM systems for broadband power line communications under impulsive noise and multipath effects. IEEE Transactions on Power Delivery 2005, 20(2):674682. 10.1109/TPWRD.2005.844320View ArticleGoogle Scholar
 Aysal TC, Barner KE: Constrained decentralized estimation over noisy channels for sensor networks. IEEE Transactions on Signal Processing 2008, 56(4):13981410.MathSciNetView ArticleGoogle Scholar
 Aysal TC, Barner KE: Blind decentralized estimation for bandwidth constrained wireless sensor networks. IEEE Transactions on Wireless Communications 2008, 7(5):14661471.View ArticleGoogle Scholar
 Candès EJ, Wakin MB: An introduction to compressive sampling: a sensing/sampling paradigm that goes against the common knowledge in data acquisition. IEEE Signal Processing Magazine 2008, 25(2):2130.View ArticleGoogle Scholar
 Donoho DL, Elad M, Temlyakov VN: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory 2006, 52(1):618.MathSciNetView ArticleMATHGoogle Scholar
 Haupt J, Nowak R: Signal reconstruction from noisy random projections. IEEE Transactions on Information Theory 2006, 52(9):40364048.MathSciNetView ArticleMATHGoogle Scholar
 Candès EJ, Romberg JK, Tao T: Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics 2006, 59(8):12071223. 10.1002/cpa.20124MathSciNetView ArticleMATHGoogle Scholar
 Tropp JA, Gilbert AC: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory 2007, 53(12):46554666.MathSciNetView ArticleMATHGoogle Scholar
 Needell D, Tropp JA: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis 2009, 26(3):301321. 10.1016/j.acha.2008.07.002MathSciNetView ArticleMATHGoogle Scholar
 Candès EJ, Randall PA: Highly robust error correction by convex programming. IEEE Transactions on Information Theory 2008, 54(7):28292840.View ArticleMathSciNetMATHGoogle Scholar
 Popilka B, Setzer S, Steidl G: Signal recovery from incomplete measurements in the presence of outliers. Inverse Problems and Imaging 2007, 1(4):661672.MathSciNetView ArticleMATHGoogle Scholar
 Carrillo RE, Barner KE, Aysal TC: Robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise. IEEE Journal on Selected Topics in Signal Processing 2010, 4(2):392408.View ArticleGoogle Scholar
 Dave RN: Characterization and detection of noise in clustering. Pattern Recognition Letters 1991, 12(11):657664. 10.1016/01678655(91)900024View ArticleGoogle Scholar
 Dave RN, Krishnapuram R: Robust clustering methods: a unified view. IEEE Transactions on Fuzzy Systems 1997, 5(2):270293. 10.1109/91.580801View ArticleGoogle Scholar
 Yang MS, Wu KL: A similaritybased robust clustering method. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004, 26(4):434448. 10.1109/TPAMI.2004.1265860View ArticleGoogle Scholar
 Papoulis A: Probability, Random Variables, and Stochastic Processes. McGrawHill, New York, NY, USA; 1984.MATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.