A Locally Adaptable Iterative RX Detector

We present an unsupervised anomaly detection method for hyperspectral imagery (HSI) based on data characteristics inherit in HSI. A locally adaptive technique of iteratively reﬁning the well-known RX detector (LAIRX) is developed. The technique is motivated by the need for better ﬁrst-and second-order statistic estimation via avoidance of anomaly presence. Overall, experiments show favorable Receiver Operating Characteristic (ROC) curves when compared to a global anomaly detector based upon the Support Vector Data Description (SVDD) algorithm, the conventional RX detector, and decomposed versions of the LAIRX detector. Furthermore, the utilization of parallel and distributed processing allows fast processing time making LAIRX applicable in an operational setting.

The literature on anomaly detection in HSI is quite extensive [1-5, 7-12, 14-18] with major contributions appearing rapidly after Reed and Yu [19,20].In anomaly detection, the goal has always been to distinguish background from potential targets in an automatic fashion while jointly minimizing false alarms and maximizing true positives.
The RX detector is prone to high false alarms because the local Gaussian assumption is largely inaccurate [11].The purpose of this paper is to propose a refinement of the RX detector by taking into account the anomaly dominance upon first and second order statistic estimation.That is, we wish to force stability upon the subsequences, locally defined with respect to a window size, in order to reduce the bias and error when estimating the mean and covariance, respectively.The subsequences are refined by removing anomalies, in an iterative fashion, from consideration in local statistics estimation.Even so, the refined subsequence that is used to estimate a mean vector, μ, and the covariance matrix, , is likely to still be nonGaussian; but, as is demonstrated subsequently, it often provides a better false alarm rate than the conventional RX because its estimates are not as contaminated by anomalies.
To illustrate the potential of this idea, consider the following "abbreviated" image created from a desert image (see Figure 1).The image is "abbreviated" in that the targets have been moved closer together than in the original image by simply eliminating columns of image pixels.This creates a situation with a very nonhomogeneous background.This small image is 63 × 49 pixels.A 25 × 25 window will be used in all subsequent processing.In Figure 1(a) the truth mask shows the known objects of interest.Figure 1(b) shows the RX scores for the HSI image, Figure 1(c) shows the RX scores for the 1st 10 principal components (instead of using the entire HSI data-cube), Figure 1(d) shows the output of LAIRX for 2 iterations (called LAIRX (2) in subsequent discussion), using PCA for the input.In Figures 1(b), 1(c), and 1(d) RX scores are displayed such that anomalies are expected to "fire" as bright and the background should "fire" as dark.Figure 1(b) shows that many background pixels may be declared as anomalies (depending on the threshold) for RX. Figure 1(c) shows the benefit of using principal components as input to RX. Figure 1(d) shows a large reduction in lighter pixels in the background using one iterative refinement of the covariance calculation used in RX.

Methods
The data used in our experiment are from the ARES desert and forest radiance collections.In our analysis we only consider two classes, background and certain man-made targets.The goal of our analysis is to distinguish the latter from many sources of background variation, such as brush, roads, forest, large rock formations and other natural anomalies.

Locally Adaptive Iterative RX Detector (LAIRX).
Reed and Yu derived an anomaly detector using a Generalized Likelihood Ratio Test (GLRT), which was later dubbed the RX detector [19,20].The detector GLRT is simplified by assuming that background pixel vectors are iid normal with estimated mean, μ, and covariance matrix, Σ, that is, In short, an incoming pixel vector, x, is the center of a neighborhood of size n which is checked for irregularity via the distance formulated above.That is, the pixel vector is checked to see if it lies outside the hyper ellipse whose location and shape are determined via μ and Σ, respectively.An anomalous vector is declared given that RX(x) > χ 2 α,p , where χ 2 α,p is the αth quantile of a χ 2 distribution with p degrees of freedom.For more information see [18].
The idea, following the philosophy of the RX method, is to place a window about each pixel in an image and use local image statistics to determine whether or not the point is anomalous.It is evident that such a method can suffer from at least 3 potential complications.First, the window pixel vectors are almost never statistically independent.Second, such vectors are not typically identically distributed.Further, outliers (the things we are looking for) can seriously compromise the integrity of the local statistics, particularly the estimated covariance matrix.The first and second complications are the subject of current research.In this paper, we examine the third complication.A look at outlier effects and some remedies is given in [18].The basic approach given in this paper is laid out in [21].Here, we propose to deal with the outlier effects in an iterative fashion.As we process the basic RX algorithm across the image, we maintain a catalog of anomalous pixels, this is the 1st iteration.Indeed, if we simply quit after processing the image once, we would have simply run the RX method.A second iteration is applied, only this time we withhold the anomalous pixels from consideration in calculating the local statistics.So long as we find new anomalies the iterations continue; otherwise, the algorithm terminates.Hence, in LAIRX we allow the RX detector to be iteratively refined with respect to the estimation of μ and while keeping track of detected anomalies.LAIRX has the following steps: Step 1. Reduce the dimensionality to a set of p principal components via a global estimate of .
Step 2. Apply the RX detector to the data matrix using a pixel process window.If this is not the 1st iteration, withhold anomalous pixels identified in the previous iteration from the local estimation of μ and .
Step 3. Identify those RX scores that exceed χ 2 α,p .These are referred to as anomalies.This step ends an iteration.
Step 4. If the set of pixels identified as anomalies in Step 3 are identical to the set of pixels identified as anomalies in the previous iteration then go to Step 5, otherwise; return to Step 2.
Step 5. Map detected anomalies to the image space.
Once LAIRX has terminated we are assume that the respective window sequences' anomaly indicator has converged almost surely to some target given the χ 2 α,p cut-off.The subsequences associated with the target may or may not be Gaussian but the refined μ and estimates should result in a higher true positive rate coupled with a relatively low false alarm rate when compared to the conventional RX detector because the sequence is forced to be more iid and, hence, estimation bias is reduced.The global SVDD algorithm was chosen as a competing algorithm because of its recent promise as an efficient and powerful anomaly detector.
In what follows, the LAIRX detector is compared to the SVDD algorithm, the conventional RX detector and a decomposed version of the LAIRX detector.This later comparison was made in an effort to more fully understand the performance of the LAIRX detector.Each anomaly detector was applied to four images; see Figure 2, containing multiple vehicles, varying land formations, sage brush and a road.These images vary in the difficulty they present to the algorithms as reflected in the SNR plots of Figure 3.

Parallel Implementation of LAIRX.
The RX detector is naturally a computationally inefficient task since M matrix inversions are required, M ≈ 52000 in our analysis.This computational burden was an inspiration for the application of SVDD in HSI anomaly detection [22].However, most implementations of the RX detector are not optimized.We decided to optimize the RX algorithm via parallel and distributed processing on a dual quad-core machine.The implementation is very simple given the Matlab Parallel Computing Toolbox but is described nonetheless.

Basic Setup.
The hyperspectral image is formulated as a data matrix where columns are wavelengths and rows are pixels, respectively.A window is moved row-wise across the data matrix at a single row increment where each center pixel serves as input to the RX detector.The data held locally in each window is used to estimate μ and .The components of μ and are based on a set of p principal components from the wavelength bands, where p = 10 in our analysis.The number of principal components is dependent on the data collection environment and should be determined via exploratory data analysis.From our analysis, we found that the desert radiance images required only two principal components while the forest radiance images required about ten.

Parallelization Scheme.
(1) Generate all possible window indices where each column indexes a window of data.
The row midpoint of these data matrices are the center pixels for that particular window.This step makes RX and LAIRX an "embarrassingly parallel" problem.
(2) The data partitions are batched in batches of size G where G is the number of available processor cores, G = 8 in our setup.
(3) Each batched data partitions are processed using the RX detector simultaneously.
(4) Results are pooled and used for subsequent analysis.distributions [23,24].The SVDD anomaly detector applied to HSI was proposed by Banerjee et al. [22].In general, the goal of the SVDD algorithm is to find the hypersphere with minimum volume about a set of random vectors.In our case these random vectors are the background pixels with each dimension corresponding to a different spectral band, that is, for a set of pixels, T = {x i , i = 1, . . ., M}, we seek the minimum enclosing hypersphere, S = {x : x − a 2 < R 2 }, that contains T. This is a constrained optimization problem stated as.

Global SVDD Anomaly
min(R) subject to x i ∈ S, i = 1, . . ., M. ( The center a and radius R are found by optimizing the following Lagrangian: The next step is to apply the kernel trick with a kernel function Φ(x), which is usually taken to be Gaussian [11,[22][23][24].Once L is optimized with respect to α i and after incorporating the kernel function, K(y, x), we get the following SVDD statistic [11] SVDD where The radial basis function (RBF) parameter σ 2 controls how well the SVDD generalizes to unseen data.The choice of this parameter is driven by the data and must be chosen empirically.In our analysis, y is a pixel vector and x is a support vector obtained via SVDD given background spectra.
For HSI anomaly detection, Banerjee et al. propose a fast global SVDD detector [11].The algorithm that they propose is fast because it uses a small subset of the data to build the SVDD model.The algorithm is as follows: (1) Randomly select a set of N background pixels from the training set.
(2) Estimate an optimal value for σ 2 using a crossvalidation or minimax method given the set of background spectra.
(3) Estimate the SVDD parameters (a, α i , R) to model the region of support for the background given a random subset of the background spectra and σ 2 .
(4) For each pixel in the data matrix perform the decision test: (i) if SVDD(y) is less than the detection threshold t, the pixel is part of the background.(ii) else, declare the pixel as an anomaly.
As you can see in Figure 4, there is a performance effect as σ 2 is varied.The only free parameter in the RBF is σ 2 .We wish to find the optimal σ 2 that is able to fully describe the nontarget/background spectra.
A common method is to estimate σ 2 while minimizing the false positive rate (Pfa) via an argument based on leave-one-out cross validation [11,[22][23][24][25], which gives the following upper bound on Pfa: N is the number of samples selected to train the SVDD model and #SV are the number of support vectors required to describe the data.Based upon the above inequality, Banerjee et al. [22] propose an approximate minimax estimate for σ 2 as where M is the number of replicates.Based upon the above result, the algorithm to obtain the global estimate for σ 2 is the following.
(1) Generate M equal sets of training data by randomly selecting pixels from the background.
(2) For each set of training data, the SVDD decision boundary is determined using different values for σ 2 .
(3) For each value of σ 2 , the average fraction of support vectors, (1/M) M i=1 [#SV i /N], is computed over all of the training sets.(4) The σ 2 that produces the smallest average fraction of support vectors is the minimax estimate.
We discovered that the average fraction of support vectors is at a minimum when σ 2 ≥ 905.Therefore, our minimax estimate is 905 because this is the smallest σ 2 that allows us to effectively describe our data.If σ 2 > 905 then our detection results may be poor because the resulting SVDD model is overly general.Note that when using the minimax approach, 60% of the training data is 300 pixel vectors while there are 149 features.The balance between sample size and number of variables is causing the minimax estimation to converge at a minimum that allows a fairly high Pfa while maintaining an effective characterization of the background spectra.This demonstrates that the bandwidth parameter selection is robust to small sample sizes relative to the number of dimensions or spectral bands.
EURASIP Journal on Advances in Signal Processing  Figure 5 is SVDD pixel map with an accompanying 2D principal component scatter plots.These will contrasted in subsequent analysis.

Experiment Description.
In our analysis we included all wavelength bands that were not contaminated by atmospheric absorption, which resulted in 149 bands.The global SVDD approach is supervised in the sense that it requires as input a known set of background spectra while LAIRX and RX are unsupervised.For SVDD, the sample size used when sampling from the background spectra was N = 500.This size was chosen based upon computing limitations while accommodating the high dimensional feature space.For LAIRX, the maximum number of iterations and principal components was 50 and 10, respectively.Based on our exploratory data analysis 10 principal components was sufficient.A window size of 25 × 25 was employed.
Each anomaly detector was applied to four images; see Figure 2, containing multiple vehicles, varying land formations, sage brush and a road.ARES1D is desert radiance while the other three are forest radiance.Three of the images area contains less than 1% target pixels while one image has about 3.4% target pixels.
The goal of our analysis was to compare a promising anomaly detector (global SVDD), a benchmark detector (RX) and our proposal (LAIRX).The RX algorithm is run in two different modes.RX-FULL is run on the full 149 bands of the image cube; whereas, RX-PCA is run on the 10 retained principal components.Finally, in an effort to examine the iteration sensitivity of LAIRX we created a version denoted LAIRX(2) that only performs one iterative refinement (in other words, 2 RX-type iterations).In order to make inferences about these competing algorithms we investigate the runtime, percentage of target pixels, area under the ROC curve restrained to false positive cases falling below 0.20, and the true positive rate at a fixed false positive rate of 0.05.In what follows (Table 1), AUC denotes the "area under the (ROC) curve (false positive rates from 0-.2)" and TPR is the "true positive rate" at a fixed a false positive rate of 0.05.
At the onset, we believed that performance gains would be substantial given that we force our multidimensional data sequences within each window to be more iid even though these data sequences are likely to not be locally Gaussian.Even if the multivariate populations are not Gaussian, by iteratively refining the anomaly detection we are introducing robustness to our final RX(x) score map, upon which a threshold is applied to determine the final classification.A drawback to LAIRX is that you need to know a good rejection rate for the iterative refinement which is largely data dependent as you can see in Figure 6.This problem is a topic of future research.

Results
In what follows, ROC curves are presented for each image in Figure 7 and summary statistics are offered in Table 1.
For ARES1D, the background and target pixels are linearly separable given the first and second principal components (see Figure 5(a)).Additionally, the SNR is showing a clear separation with lower SNR values present for the target pixels, which is encouraging (see Figure 5(b)).We would expect with these observations that the RX-type classifiers should do very well and that the nonlinear classifier, SVDD, should also perform reasonably well.When viewing the ROC curve for ARES1D, you can see that PCA is beneficial for the RX derivative methods.The SVDD algorithm is performing well but as you can see in the SVDD statistic map there are many locally clustered areas which display the same in value as the target pixels.
The image ARES1F poses difficulties for all the algorithms tested.ARES1F which has a very noisy SNR statistic map, as you can see in Figure 3(b).This image in particular, highlights the benefits of LAIRX's iterative approach.
The image scene is similar for both ARES2F and ARES3F (Figure 2).As you can see in Figure 3, the SNR map is showing a reasonable segmentation for both of these images and the distributions are somewhat separable.Table 1 and Figure 7 show that performance is good across the board with LAIRX being superior.It is interesting to point out that the nonlinear technique is performing poorly in contrast to hinders real-time viability while the global SVDD builds a model based upon background spectra and then classifies raw pixel vectors as anomalous as they are received.However, the global SVDD is a supervised algorithm given that a set of background spectra and RBF spread parameter are specified which may limit its real-time viability in a dynamic operational setting.
For the types of images analyzed here, our results have shown that LAIRX is a reasonable competitor to the SVDD algorithm and that a linear technique can perform well in a nonlinear environment after statistic estimation modification.We have also demonstrated, see Table 1 and Figure 7, that the algorithmic steps taken to create LAIRX interact in a way that lead to higher true positives coupled with low false positives.By introducing iterative refinement, we are getting better performance because the first and second order statistic estimation have less bias and error.Our method has demonstrated potential in an image scene with sparse vehicle activity.Whether or not similar results follow in a densely populated target environment remains to be seen.
Both LAIRX and LAIRX(2) are implemented in a parallel and distributed fashion which makes these algorithms computationally efficient.The processing time for LAIRX and LAIRX(2) are constrained by the number of available processor cores.In general, LAIRX is somewhat slow as you can see in Table 1 but the ready availability of cluster machines and even affordable 8 processor core machines make LAIRX a viable algorithm in an operational setting.In contrast, the runtime of the global SVDD algorithm is fixed from an algorithmic perspective of parallel and distributed computing.
Detector.The Support Vector Data Description (SVDD) algorithm was originally proposed by Tax and Duin as a means for exploring one class

Table 1 :
Tabular results by image.