M-estimators of roughness and scale for GA0-modelled SAR imagery

The GA0 distribution is assumed as the universal model for multilook amplitude SAR imagery data under the multiplicative model. This distribution has two unknown parameters related to the roughness and the scale of the signal, that can be used in image analysis and processing. It can be seen that maximum likelihood and moment estimators for its parameters can be influenced by small percentages of "outliers"; hence, it is of outmost importance to find robust estimators for these parameters. One of the best-known classes of robust techniques is that of M-estimators, which are an extension of the maximum likelihood estimation method. In this work we derive the M-estimators for the parameters of the distribution, and compare them with maximum likelihood estimators with a Monte-Carlo experience. It is checked that this robust technique is superior to the classical approach under the presence of corner reflectors, a common source of contamination in SAR images. Numerical issues are addressed, and a practical example is provided.


INTRODUCTION
The statistical modeling of synthetic aperture radar (SAR) imagery has provided some of the best tools for the processing and understanding of this kind of data. Among the statistical approaches the most successful is the multiplicative model. This model offers a set of distributions that, with a few parameters, are able to characterize most of SAR data. This model is presented, for instance, in [1], and extended in [2].
This extension is a general and tractable set of distributions within the multiplicative model, used to describe every kind of SAR return. It was then called a universal model, and its properties are studied in [3,4,5].
In this paper, the problem of estimating the parameters of this extension, namely of the G 0 A distribution, is studied for the single look (the noisiest) situation. Two typical estimation situations arise in image processing and analysis, namely large and small samples, being the latter considered in this work. The small samples problem arises in, for instance, im-age filtering where with a few observations within a window a new value is computed.
Statistical inference with small samples is subjected to many problems, mainly bias, large variance, and sensitivity to deviations from the hypothesized model. This last issue is also a problem when dealing with large samples.
Robustness is a quite desirable property for estimators, since it allows their use even in situations where the quality of the input data is not perfect [6,7,8]. Most image processing and analysis procedures (filtering, classification, segmentation, etc.) use spatial data. Even experienced users are unable to guarantee that all the input data are free of spurious values and/or structures.
A situation where this occurs is in the presence of corner reflectors. These devices, which are essential for data calibration, produce a return which is quite higher than the rest of the image. If data from a corner reflector enter a nonrobust estimation procedure, the results may be completely unreliable. Figure 1 shows man-made corner reflectors, the white spot regularly placed horizontally along the middle of the image, possibly buildings, on a SAR image where crops predominate.
This paper presents classical estimators, namely those based on sample moments and the maximum likelihood ones, and derives robust M-estimators for the single look G 0 A model. Once these estimators are derived, several situations are compared by means of a Monte-Carlo experience. These estimators are then applied to SAR imagery, and it is shown that the robust approach exhibits a good performance even in the presence of corner reflectors.
M-estimators have been mainly used for symmetric data, being [7,8] two very relevant exceptions. In this application they are computed, implemented, and assessed for speckled imagery that, as will be seen in the next section, follow laws whose densities can be highly asymmetric. Since speckle noise also appears in ultrasound B-scan, sonar and laser images, the procedures here presented have potential application in all these techniques.

NOTATION AND MODEL DEFINITION
In the following, 1 A will denote the indicator function of the set A, that is, The single-look G 0 A (α, γ) distribution is characterized by the following probability density function: (2) Multilook, intensity, and complex versions of this distribution can be seen in [2]. The polarimetric (multivariate) extension of the G 0 distribution is presented in [9].
The parameter α in (2) describes the roughness of the target, being small values (say α < −15) usually associated to homogeneous targets, like pasture, values ranging in the [−15, −5] interval usually observed in heterogeneous clutter, like forests, and big values (−5 < α < 0 for instance) commonly seen when extremely heterogeneous areas, as urban spots, are imaged. The γ parameter is related to the scale, in the sense that if Z A is a G 0 A (α, 1)-distributed random vari-able then Z = √ γZ obeys a G 0 A (α, γ) law and, therefore, it is related to the brightness of the scene. Both parameters are essential when characterizing targets, and in image processing techniques involving statistical modeling.
In the following it will be assumed that α < 0 and γ > 0. Calling the score function of the G 0 A (α, γ) distribution is given by The cumulative distribution function of such random variable is given by This will be used to compute an estimator based on order statistics. In [5] a relation between a more general version of the G 0 A (α, γ) law and Snedekor's F distribution is shown to allow writing (5) as a function of the cumulative distribution function of the latter.
The moments of a G 0 A (α, γ) distribution are given by This distribution can be derived as the square root of the ratio of two independent random variables: one obeying a unitary-mean exponential distribution (which conveys the speckle noise in one look) and one following a Γ (α, γ) distribution, related to the unobserved ground truth, the backscatter. Densities of the G 0 A (α, γ) distribution are shown in Figure 2, for α ∈ {−1, −3, −9} and the scale parameter γ = γ α adjusted to have unitary mean, that is, using (6) and k = 1, one derives Following Barndorff-Nielsen and Blaesild [10], it is interesting to see these densities as log probability functions, particularly because the G 0 A law is closely related to the class of hyperbolic distributions [11]. Figure 3 shows the densities of the G 0 A (−2.5, 7.0686/π ) and the Gaussian distribution N (1, 4(1.1781 − π/4)/π ) distributions in semilogarithmic scale, along with their mean value. The parameters were  Normalized gray scale Densities Figure 3: Densities of the G 0 A (−2.5, 7.0686/π ) (solid line) and the N (1, 4(1.1781 − π/4)/π ) (dashes) distributions in semilogarithmic scale, with mean value in dash-dot. chosen, using (6), so that these distributions have equal mean and variance. The different decays of their tails in the logarithmic plot are evident: the former behaves logarithmically, while the latter decays quadratically. This behavior ensures the ability of the G 0 A distribution to model data with extreme variability.

INFERENCE FOR THE G 0
A MODEL It can be seen that the maximum likelihood estimators of α and γ based in X 1 , . . . , X N , represented here byα ML,N and γ ML,N , respectively, are given bŷ and bŷ Assume α < −1/2 in order to have random variables with finite mean. For each j = 1/2, 1, define the jth order moment as m j,N = N −1 N k=1 X j k . Using (6) it can be seen that the half and first order moments estimators of α and γ based on X 1 , . . . , X N , denoted here byα MOM,N and byγ MOM,N , respectively, are given by the solution of Moment estimation is extensively used in remote sensing applications [2], mainly because it is inexpensive from the computational point of view and analytically tractable in most situations. In the presence of corner reflectors, though, severe numerical instabilities were observed.
A comparison among different estimators for roughness parameter of the G 0 A distribution was performed in [4] through a Monte-Carlo experience. No contamination was considered, and theα ML estimator was the best one in almost all cases with respect to the mean square error (mse) and the closeness to the true value (bias) criteria, assuming γ is known. No analytical comparison is possible, due to the complexity of the estimators.
In previous works (see [12], for instance) it is shown that moment and maximum likelihood estimators have many optimal properties when the observations, x 1 , . . . , x N are the outcome from independent, identically distributed random variables, with common density f (·, (α, γ)). Among these properties, maximum likelihood estimators are asymptotically unbiased, that is, if X 1 , . . . , X N is a sequence of independent, identically distributed random variables with common density f (·, (α, γ)) then, when N → ∞, it holds that Nevertheless, good performance is neither warranted with finite samples nor if the sample does not obey precisely the hypothesis. A common departure from classical hypothesis is the contamination by a percentage of "outliers," that is, when some of the observed data come from a different distribution.
"Corner reflectors" can be considered outliers in SAR imagery. These are physical artifacts in the sensed area that return most of the power they receive. The image in these areas is dominated by the biggest possible values admitted by the storage characteristics, and their effect is typically limited to a few pixels. Corner reflectors are either placed on purpose, for image calibration, due to man-made objects, such as highly reflective urban areas, or the result of double-bounce reflection [1].
Since the routines that compute both ML-and Mestimators require initial guesses, the unstable (and, therefore, unacceptable) behavior of moment estimators demanded the use of another technique. A procedure based on analogy using order statistics and moments [13], defined in the following for α < −1, was used.
This scaled median does not depend on γ, soα can be estimated using the sample median q 2 (y), where y = (x i /x) i , using standard numerical tools. An estimate of γ using the first-order moment can then be computed aŝ This estimator of α derived from (11) and the one computed for γ through (12) will be called mixed estimator for (α, γ), and denoted (α MIX ,γ MIX ).
In the following, ML-and M-estimators will be assessed in two situations: the pure model, when no contamination is present, and cases where outliers simulating a corner reflector enter the data.
The contamination model here considered is defined as a sequence of independent, identically distributed random variables X 1 , . . . , X N with common distribution function F(·, (α, γ), , z) given by (13) where δ z (x) = 1 [z,+∞) (x), with z a "very big" value with respect to most of the values typically assumed by a random variable with distribution G 0 A (α, γ). Equation (13) describes a contamination that occurs at random with probability , that is, in average N out of N samples will be "very big" values (corner reflectors), while N − N samples will obey the G 0 A (α, γ) distribution (the background). The flexibility of the G 0 A (α, γ) distribution will allow the modeling of any kind of background, namely crops, forests, and urban areas. The contamination value z will be chosen as a factor of the mean value of the underlying distribution G 0 Other contamination hypothesis may include different distributions for the departure of the model, and/or spatial dependence among observations.
In order to quantitatively assess the sensitivity of MLestimators in our case, using the strong law of large numbers on (8) and assuming γ = 1, it is immediate that It can be seen that a sample that suffers from a small contamination of ε = 0.02 leads to the wrong conclusion that extremely heterogeneous targets are under observation, since the estimated value will be around −3 whilst the true value is −5 (corresponding to an heterogeneous target). The bigger the contamination the worse this behavior. This influence on the ML-estimator is noticed regardless the value of z in the chosen range. This result justifies a careful analysis of the performance of estimators for a single representative value of the contamination z, and for several values of ε, as presented in Section 5.

M-ESTIMATORS
These estimators offer a useful alternative when small proportions of values may be far from the bulk of data. A good survey on these estimators and their use in practical situations can be seen in [7]. The difficulty in our case arises due to the fact that the underlying distribution is asymmetric. Inspired in the work by Marazzi and Ruffieux [8], that robustified maximum likelihood estimators for the parameters of the gamma distribution with success, we will compute M-estimators for the parameters of the G 0 A (α, γ) distribution. One has to devise ways to prevent a large influence of outliers in the likelihood equations. M-estimators propose the use of ψ functions that truncate the score of influential observations in the likelihood equations.
Let b 1 and b 2 be two real positive numbers. We will call M-estimators of parameters α and γ based on the sample where s 1 and s 2 are given in (4), and ψ b1 and ψ b2 are with x ∈ R and i = 1, 2. Note that making these functions the identity and c 1 = c 2 ≡ 0, equations (15) reduce to the familiar system of likelihood equations. Figure 5 illustrates the ψ 6 (x) function, where its effect on the data becomes clear: it truncates the score of those values above and below its threshold. In this manner, this function prevents abnormal (too small and too big) data having excessive influence on the estimates.
The functions c i : Θ × (0, +∞) → R in (15) are defined in such way that (α M,N ,γ M,N ) N is a sequence of asymptotically unbiased estimators of (α, γ), that is, for any (α, γ) ∈ Θ, b 1 > 0 and b 2 > 0. The computation of these functions is presented in Appendices A and B. The values b 1 and b 2 , called "tuning parameters" are chosen in such a way that the efficiency of the M-estimators is not unacceptably poor with respect to the maximum likelihood ones. Thus, we will choose b 1 and b 2 such that where V denotes variance, with 0.9 ≤ e i ≤ 0.975 for example. This restriction imposes that the variance of the robust estimators will not surpass those of the maximum likelihood ones in more than a certain factor. The computation of these parameters is done in Appendix C.
In order to assess the finite sample behavior of the proposed estimators consider the situation where N observations from independent identically distributed G 0 A (α, γ) random variables are contaminated by L observations taking the value z. The contaminated sample z = (z 1 , . . . , z N , z, . . . , z ), where the brackets denote the N + L outliers, will be used to estimate α and γ. Depending on the true parameters, on the contamination (the value z and the number of outliers L) and on the sample size N, it is expected that any non-robust estimator will be mislead by the departure from the hypothesized model. Figure 6 shows the maximum likelihood and M-estimators for the situation where N = 77 pure observations and L = 4 equal contaminated values are used. It can be seen that the former suffers from both the departure from the true model and from numerical instabilities (the peak around the value 10). It is noticeable that the robust estimator remains closer to the true value (the dash-dot line) than the other in the presence of contamination. The bigger the value of the contamination the further maximum likelihood will go from the true value, while the M-estimator will render the same estimate. The same behavior was observed for the estimators of γ. The values employed for N, L, α and z are representative of the kind of problem at hand: filtering images over a heterogeneous area (α = −3) with 9 × 9 windows (N + L = 81) where some observations (L = 4) are due to a corner reflector (z varying and taking large values).

RESULTS
A Monte-Carlo experience was performed to compare maximum likelihood (ML) and M-estimators (M), since analytical comparisons are unfeasible. G 0 A deviates can be obtained in two ways, namely (a) by the inversion of the cumulative distribution function, given in (5), or (b) by the constructive definition of this distribution, that is, returning z = y/x where y is a sample of the unitary mean exponential distribution and x is a sample of the Γ (−α, γ) distribution, X independent of Y . Alternative (b) was chosen in this experience. Since the parameter γ is a scale factor, all the studied situations will use the parameter that makes the expected value 1, that is, γ = γ α = 4(Γ (−α)/Γ (−α − 1/2)) 2 /π .
We studied four models that will be identified as Models 1, 2, 3, and 4: Model 1 (the " pure" model) consisting of samples of the F(·, (α, γ α )) distribution; Model 2 consisting of samples taken from the F (·, (α, γ α ), , z) distribution with = 0.01 and z = 15 , Model 3 is a collection of samples obtained from the F (·, (α, γ α ), , z) distribution with = 0.05 and z = 15, while Model 4 is a collection of samples obtained from the F (·, (α, γ α ), , z) distribution with = 0.10 and z = 15. In this manner, the pure situation and three levels of contamination are examined. The contamination value z is held fixed, as discussed in Section 3, as fifteen times the expected value of the underlying model which is big enough to describe corner reflectors.
In order to determine the number of replications an empirical precision criterion was used. One hundred replications (1 ≤ r ≤ R = 100) with samples of size N = 9 × 9 = 81 were performed. One estimator for α is computed for each replication r , sayα(r ). The final number of replications R = R + , ≥ 1 integer, is the smallest number of samples that allows forming an asymptotic confidence interval at the 95% level on the mean of the R estimators, that is, (1), . . . ,α(R)}. This is computed for every estimator for α and every estimator for γ in every model and every situation. The biggest number of replications R (the worst case) is then used. This procedure led to R = 101 in every model and every situation.
In all the tables it can be seen that M-estimators are almost as good as ML-estimators when there is no contamination, while in the presence of outliers M-estimators exhibit smaller absolute relative bias and smaller mean squared error. As α is smaller (the observed region is more homogeneous), estima-    Table 3: Model 3 ( = 0.05) and three situations.  Table 4: Model 4 ( = 0.10) and three situations. tion becomes less reliable. This is probably due to the shape of the likelihood function, that becomes flat and, therefore, is hard to find its maximum location. The bigger the proportion of contamination the worse the behavior of both estimators, but M-estimators remain consistently closer to the true value than ML-estimators, as expected.
As an application, the image shown in Figure 7 is analyzed. Windows of size 9 × 9 are used to estimate the background parameters over a trajectory that spans from the upper left corner to the lower right corner. This trajectory passes through the three corner reflectors of the image, which are clearly seen as white spots.  tions (the gaps in the solid line). Similar results were observed estimating the scale parameter γ.

CONCLUSIONS AND FUTURE WORK
There are numerical problems with the computation of these estimators for certain parameters. For small samples there is often no solution, being this situation more critical for small values of α, that is, for homogeneous areas. This issue will be further investigated in future works. As it can be seen from the tables, in the pure model (Model 1), the behavior of the robust estimators is as good as the maximum likelihood ones, as it is expected. In the contaminated situations (Models 2 and 3), we can see that estimation by moments or maximum likelihood methods is poor.
It is relevant to notice that under a very small contamination or a very small deviation from the model the behavior of the classical estimators is not reliable.
This work will continue computing the M-estimators for the multilook case, that is, for the n > 1 situation and for polarimetric (multivariate) data. An alternative approach using alpha-stable distributions [14] is also possible, but at the expense of loosing the interpretability of the parameters that stem from the multiplicative model.

A. COMPUTATION OF c 1
Let α < 0. The problem consists of finding c 1 (α, 1) that du is the equation that has to be solved in order to define the M-estimators for the G 0 A distribution (equations (15)).
then A < 0 < B and, therefore, (A.3) This occurs when exp(2αb) > bα + 1 and c 0 must satisfy If this is not the case, that is if exp(2αb) < bα + 1, then c must be greater than b 1 + α −1 . Thus 0 < A < B and then, In this situation, c 0 satisfying I 1 (α, c 0 , b) = 0 is given by

B. COMPUTATION OF c 2
Let α < 0 and γ > 0. As for the computation of c 1 , the problem of finding the function c 2 consists of finding c 2 = c 2 (α, γ, b 2 ) that satisfies In this situation there are more cases than in the computation of c 1 due to the presence of the parameter γ.
Lemma B.1. Let α < 0 and γ > 0. In order to find the function c 2 that satisfies (B.1) one has to consider the following cases: , and it is given by the solution of and γb 2 ≤ 1.

C. COMPUTATION OF TUNING PARAMETERS
The definition of efficient M-estimators requires finding tuning parameters b 1 and b 2 ensuring that the relative asymp- (C.8) The matrices M and Q are computed taking into account all the cases that define the functions c i , i = 1, 2, and solving explicitly the integrals.