In the experiments that follow, the entries of *A* are drawn from\mathcal{N}(0,1/K).

### Dictionary signal detection

To test the effectiveness of our approach, we formed a dictionary\mathcal{D} of nine spectra (corresponding to different kinds of trees, grass, water bodies and roads) obtained from a labeled HyMap (Hyperspectral Mapper) remote sensing data set[57], and simulated a realistic dataset using the spectra from this dictionary. Each HyMap spectrum is of length N=106. We generated projection measurements of these data such that{\mathit{z}}_{i}={\alpha}_{i}\mathit{\Phi}({\mathit{f}}_{i}^{\ast}+{\mathit{b}}_{i})+{\mathit{w}}_{i} according to (1), where{\mathit{w}}_{i}\sim \mathcal{N}(0,{\sigma}^{2}\mathit{I}),{\mathit{f}}_{i}^{\ast}\in \mathcal{D} for i=1,…,8100,{\mathit{b}}_{i}\sim \mathcal{N}\left({\mathit{\mu}}_{\mathit{b}},{\mathit{\Sigma}}_{\mathit{b}}\right) such that *Σ*_{
b
} satisfies the condition in (4), and{\alpha}_{i}={\alpha}_{i}^{\ast}\sqrt{K} where{\alpha}_{i}^{\ast}\sim \mathcal{U}[21,25] and\mathcal{U} denotes uniform distribution. We let σ^{2}=5 and model {α_{i}} to be proportional to\sqrt{K} to account for the fact that the total observed signal energy increases as the number of detectors increases. We transform the *z*_{i}by a series of operations to arrive at a model of the form discussed in (2), which is{\mathit{y}}_{i}={\alpha}_{i}\mathit{A}{\mathit{f}}_{i}^{\ast}+{\mathit{n}}_{i}. For this dataset, p_{min}=0.04938, p_{max}=0.1481, and d_{min}=0.04341.

We evaluate the performance of our detector (7) on the transformed observations, relative to the number of measurements K, by comparing the detection results to the ground truth. Our MAP detector returns a label{L}_{i}^{\text{MAP}} for every observed spectrum which is determined according to

{L}_{i}^{\text{MAP}}={argmin}_{\ell \in \{1,\dots ,m\},{\mathit{f}}^{\left(\ell \right)}\in \mathcal{D}}\left(\frac{1}{2}\left|\right|{\mathit{y}}_{i}-{\alpha}_{i}\mathit{A}{\mathit{f}}^{\left(\ell \right)}|{|}^{2}-\phantom{\rule{0.3em}{0ex}}\text{log}\phantom{\rule{0.3em}{0ex}}{p}^{\left(\ell \right)}\right)

where m is the number of signals in\mathcal{D}, and p^{(ℓ)} is the a priori probability of target class ℓ. In our experiments we evaluate the performance of our classifier when (a) {α_{i}} are known (AK) and (b) {α_{i}} are unknown (AU) and must be estimated from *y*, respectively. The empirical pFDR^{(j)}for each target spectrum j is calculated as follows:

{\mathrm{\text{pFDR}}}^{\left(j\right)}=\frac{\sum _{i=1}^{M}{\mathbb{I}}_{\left\{{L}_{i}^{\mathrm{GT}}=j\right\}}{\mathbb{I}}_{\left\{{L}_{i}^{\text{MAP}}\ne j\right\}}}{\sum _{i=1}^{M}{\mathbb{I}}_{\left\{{L}_{i}^{\text{MAP}}\ne j\right\}}}

where\left\{{L}_{i}^{\mathrm{\text{GT}}}\right\} denote the ground truth labels. The empirical pFDR^{(·)}is the ratio of the number of missed targets to the total number of signals that were declared to be nontargets. The plots in Figure1a show the results obtained using our target detection approach under the AK case (shown by a dark gray dashed line) and the AU case (shown by a light gray dashed line), compared to the theoretical upper bound (shown by a solid line). These results are obtained by averaging the pFDR values obtained over 1000 different noise, sensing matrix and background realizations. Note that theoretical results only apply to the AK case since they were derived under the assumption of {α_{i}} being known. The experimental results are shown for both AK and AU cases to provide a comparison between the two scenarios. In both these cases, the worst-case empirical pFDR curves decay with the increase in the values of K. In the AK case, in particular, the worst-case empirical pFDR curve decays at the same rate as the upper bound. In this experiment, for a fixed α_{min}and d_{min}, we chose K to satisfy (13c). The theory is somewhat conservative, and in practice the method works well even when the values of K are below the bound in (13c).

In the experiment that follows, we let{\alpha}_{i}^{\ast}\sim \mathcal{U}[10,20], where\mathcal{U} denotes a uniform random variable,{\alpha}_{i}=\sqrt{K}{\alpha}_{i}^{\ast} and evaluate the performance of our detector for different values of *K* that are not necessarily chosen to satisfy (13c). In addition, we also compare the performance of our detection method to that of a MAP based target detector operating on downsampled versions of our simulated spectral input image. The reason behind such a comparison is to show what kinds of measurements yield better results given a fixed number of detectors.

For an input spectrum\mathit{g}\in {\mathbb{R}}^{N}, we let\stackrel{~}{\mathit{g}}\in {\mathbb{R}}^{K} denote its downsampled approximation. Specifically, the *j* th element of{\stackrel{~}{g}}_{i} is\sum _{\ell =1}^{r}{g}_{(j-1)r+\ell} where *r*=⌈*N*/*K*⌉. Let us consider making observations of the form

{\mathit{y}}_{i}=\frac{{\stackrel{~}{\mathit{g}}}_{i}}{c}+{\mathit{n}}_{i}\in {\mathbb{R}}^{K}

(23)

where{\stackrel{~}{\mathit{g}}}_{i}={\alpha}_{i}{\stackrel{~}{\mathit{f}}}_{i}^{\ast}+{\stackrel{~}{\mathit{b}}}_{i} is the *K*-dimensional downsampled version of{\mathit{f}}_{i}^{\ast}+{\mathit{b}}_{i} for *K*≤*N*,{\mathit{n}}_{i}\sim \mathcal{N}(0,{\sigma}^{2}\mathit{I}) for *σ*^{2}=5 and *c* is a constant that is chosen to preserve the mean signal-to-noise ratio corresponding to the downsampled and projection measurements. The MAP-based detector operating on the downsampled data returns a label{D}_{i}^{\text{MAP}} for every observed spectrum which is determined according to

\begin{array}{l}{D}_{i}^{\text{MAP}}=\stackrel{\phantom{\rule{-1em}{0ex}}\text{arg min}}{\ell \in \{1,\dots ,m\},{\mathit{f}}^{\left(\ell \right)}\in \mathcal{D}}{\left({\mathit{y}}_{i}-{\alpha}_{i}{\stackrel{~}{\mathit{f}}}^{\left(\ell \right)}\right)}^{T}{G}^{-1}\\ \phantom{\rule{3.5em}{0ex}}\times \left({\mathit{y}}_{i}-{\alpha}_{i}{\stackrel{~}{\mathit{f}}}^{\left(\ell \right)}\right)-\phantom{\rule{0.3em}{0ex}}\text{log}\phantom{\rule{0.3em}{0ex}}{p}^{\left(\ell \right)}\end{array}

whereG={\stackrel{~}{\mathit{\Sigma}}}_{b}+{\sigma}^{2}\mathit{I} and{\stackrel{~}{\mathit{\Sigma}}}_{b} is the covariance matrix obtained from the downsampled versions of the background training data and{\stackrel{~}{\mathit{f}}}^{\left(\ell \right)} is the downsampled version of{\mathit{f}}^{\left(\ell \right)}\in \mathcal{D}. The algorithm declares that target spectrum{\mathit{f}}^{\left(j\right)}\in \mathcal{D} is present in the *i* th location if{D}_{i}^{\text{MAP}}=j. In order to illustrate the advantages of using a *Φ* designed according to (24), we compare the performances of the proposed anomaly detector when *Φ* is chosen to be a random Gaussian matrix whose entries are drawn from\mathcal{N}\left(0,1/K\right) and when *Φ* is chosen according to (24). Figure1b shows a comparison of the results obtained using the projection measurements obtained using *Φ* designed according to (24), *Φ* chosen at random, and the downsampled measurements under the AK case. These results show that the detection algorithm operating on projection measurements using *Φ* designed using background and sensor noise statistics yield significantly better results than the one operating on the downsampled data, and that the empirical pFDR values in our method decays with *K*. The improvement in performance using projection measurements comes from the distance-preservation property of the projection operator *A*. While a Gaussian sensing matrix *A* preserves distances between any pair of vectors from a finite collection of vectors with high probability[51, 52], downsampling loses some of the fine differences between similar-looking spectra in the dictionary. Furthermore, when *Φ* is chosen at random, the resulting whitened transformation matrix is not necessarily distance-preserving. This may worsen the performance as illustrated in Figure1b.

### Anomaly detection

In this section, we evaluate the performance of our anomaly detection method on (a) a simulated dataset and provide a comparison of the results obtained using the proposed projection measurements and the ones obtained using downsampled measurements, and (b) real AVIRIS (Airborne Visible InfraRed Imaging Spectrometer) dataset.

#### Experiments on simulated data

We simulate a spectral image *f*^{∗}composed of 8100 spectra, where each of them is either drawn from a dictionary\mathcal{D}=\{{\mathit{f}}^{\left(1\right)},\cdots \phantom{\rule{0.3em}{0ex}},{\mathit{f}}^{\left(5\right)}\} consisting of five labeled spectra from the HyMap data that correspond to a natural landscape (trees, grass and lakes) or is anomalous. The anomalous spectrum is extracted from unlabeled AVIRIS data, and the minimum distance between the anomalous spectrum *f*^{(a)} and any of the spectra in\mathcal{D} is{d}_{\mathrm{\text{min}}}={\mathrm{\text{min}}}_{\mathit{f}\in \mathcal{D}}\parallel \mathit{f}-{\mathit{f}}^{\left(\mathrm{a}\right)}\parallel =0.5308. The simulated data has 625 locations that contain the anomalous spectrum. Our goal is to find the spatial locations that contain the anomalous AVIRIS spectrum given noisy measurements of the form{\mathit{z}}_{i}=\mathit{\Phi}\left({\alpha}_{i}{\mathit{f}}_{i}^{\ast}+{\mathit{b}}_{i}\right)+{\mathit{w}}_{i} where *b*_{
i
}∼(*μ*_{
b
},*Σ*_{
b
}), *Φ* is designed according to (24),{\mathit{w}}_{i}\sim \mathcal{N}(0,{\sigma}^{2}\mathit{I}) and{\mathit{f}}_{i}^{\ast}\in \mathcal{D} under{\mathcal{\mathscr{H}}}_{0i}. As discussed in Section “Anomalous signal detection”,{\mathit{f}}_{i}^{\ast} is anomalous under{\mathcal{\mathscr{H}}}_{1i}, and our goal is to control the FDR below a user-specified false discovery level *δ*. We simulate\left\{{\alpha}_{i}\right\}=\sqrt{K}{\alpha}_{i}^{\ast} where{\alpha}_{i}^{\ast}\sim \mathcal{U}[2,3]. In this experiment we assume the availability of background training data to estimate the background statistics and the sensor noise variance *σ*^{2}. Given the knowledge of the background statistics, we perform the whitening transformation discussed in Section “Whitening compressive observations” and evaluate the detection performance on the preprocessed observations given by (2).

For a fixed *τ*=0*.* 1 and *ε*=0*.* 1, we evaluate the performance of the detector as the number of measurements *K* increases under the AK and AU cases respectively, by comparing the *pseudo-ROC* (receiver operating characteristic) curves obtained by plotting the empirical FDR against 1−FNR, where FNR is the false nondiscovery rate. Note that 1−FNR is the expected ratio of the number of null hypotheses that are correctly rejected to the number of declared null hypotheses. The empirical FDR and FNR are computed according to

\begin{array}{l}\mathrm{\text{FDR}}=\frac{\sum _{i=1}^{M}{\mathbb{I}}_{\left\{{L}_{i}^{\mathrm{\text{GT}}}=0\right\}}{\mathbb{I}}_{\{{p}_{i}\le {p}_{t}\}}}{\sum _{i=1}^{M}{\mathbb{I}}_{\{{p}_{i}\le {p}_{t}\}}}\phantom{\rule{2.77695pt}{0ex}}\text{and}\\ \mathrm{\text{FNR}}=\frac{\sum _{i=1}^{M}{\mathbb{I}}_{\left\{{L}_{i}^{\mathrm{GT}}=1\right\}}{\mathbb{I}}_{\{{p}_{i}>{p}_{t}\}}}{\sum _{i=1}^{M}{\mathbb{I}}_{\{{p}_{i}>{p}_{t}\}}}\end{array}

where *p*_{
t
}is the *p*-value threshold such that the BH procedure rejects all null hypotheses for which *p*_{
i
}≤*p*_{
t
}, and the ground truth label{L}_{i}^{\mathrm{\text{GT}}}=0 if the *i* th spectrum is not anomalous, and 1 otherwise. In this experiment, we consider three different values of *K* approximately given by *K*∈{*N*/6,*N*/3,*N*/2} where *N*=106, and evaluate the performance of our detector for each *K*. Furthermore, in our experiments with simulated data, we declare a spectrum to be anomalous if *d*_{
i
}≥*η* where *η* is a user-specified threshold and *d*_{
i
}is defined in (16). We use the *p*-value upper bound in (20) in our experiments with real data where the ground truth is unknown.

We compare the performance of our method to a generalized likelihood ratio test (GLRT)-based procedure operating on downsampled data, where we collect measurements of the form in (23) and{\mathit{f}}_{i}^{\ast}\in \mathcal{D} under{\mathcal{\mathscr{H}}}_{0i}. Observe that{\mathit{y}}_{i}|{\mathcal{\mathscr{H}}}_{0i}\sim \sum _{\mathit{f}\in \mathcal{D}}\mathbb{P}\left({\mathit{f}}_{i}^{\ast}=\mathit{f}\right)\mathcal{N}({\alpha}_{i}\stackrel{~}{\mathit{f}},{\stackrel{~}{\mathit{\Sigma}}}_{b}+\mathit{I}), where\stackrel{~}{\mathit{f}} refers to the downsampled version of\mathit{f}\in \mathcal{D}. In this experiment we assume that each spectrum in\mathcal{D} is equally likely under{\mathcal{\mathscr{H}}}_{0i} for *i*=1,…,*M*. The GLRT-based approach declares the *i* th spectrum to be anomalous if

\begin{array}{l}-\phantom{\rule{0.3em}{0ex}}\text{log}\phantom{\rule{0.3em}{0ex}}\mathbb{P}\left({\mathit{y}}_{i}|{\mathcal{\mathscr{H}}}_{0i}\right)\stackrel{{\mathcal{\mathscr{H}}}_{1i}}{\underset{{\mathcal{\mathscr{H}}}_{0i}}{\gtrless}}\eta \end{array}

for *i*=1,…,*M*, where *η* is a user-specified threshold[26]. While our anomaly detection method is designed to control the FDR below a user-specified threshold, the GLRT-based method is designed to increase the probability of detection while keeping the probability of false alarm as low as possible. To facilitate a fair evaluation of these methods, we compare the pseudo-ROC curves (FDR versus 1−FNR) and the actual ROC curves (probability of false alarm *p*_{
f
}versus probability of detection *p*_{
d
}) corresponding to these methods obtained by averaging the empirical FDR, FNR, *p*_{
d
} and *p*_{
f
} over 1,000 different noise and sensing matrix realizations for different values of *K*. We also compare the performance of the proposed method when *Φ* is chosen according to (24) and when it is chosen at random, as discussed in the previous section. Figure2a,e show the pseudo-ROC plots and the conventional ROC plots obtained using the GLRT-based method operating on downsampled data when {*α*_{
i
}} are known. Figure2b,f show the results obtained by using a random Gaussian *Φ* instead of the *Φ* in (24). Figure2c,g show the pseudo-ROC plots and the conventional ROC plots obtained using our method when {*α*_{
i
}} are known. These plots show that performing anomaly detection from our *designed* projection measurements yields better results than performing anomaly detection on downsampled measurements and on measurements obtained using a random Gaussian *Φ*. This is largely due to the fact that carefully chosen projection measurements preserve distances (up to a constant factor) among pairs of vectors in a finite collection, where as the downsampled measurements fail to preserve distances among vectors that are very similar to each other. Similarly, a random projection matrix *Φ* is not necessarily distance-preserving post-whitening transformation, which leads to poor performance as illustrated in Figure2b,f. Figure2d,h shows the pseudo-ROC plots and the conventional ROC plots obtained using our method when {*α*_{
i
}} are unknown, and are estimated from the measurements. Note that the value of *ζ* decreases as *K* increases since the estimation accuracy of {*α*_{
i
}} increases with increase in *K*. These plots show that the performance improves as we collect more observations, and that, as expected, the performance under the AK case is better than the performance under the AU case.

#### Experiments on real AVIRIS data

To test the performance of our anomaly detector on a real dataset, we consider the unlabeled AVIRIS Jasper Ridge dataset\mathit{g}\in {\mathbb{R}}^{614\times 512\times 197}, which is publicly available from the NASA AVIRIS website,http://aviris.jpl.nasa.gov/html/aviris.freedata.html. We split this data spatially to form equisized training and validation datasets, *g*^{t} and *g*^{v} respectively, each of which is of size 128×128×197. Figure3a,b show images of the AVIRIS training and validation data summed through the spectral coordinates. The training data are comprised of a rocky terrain with a small patch of trees. The validation data seems to be made of a similar rocky terrain, but also contain an anomalous lake-like structure. The goal is to evaluate the performance of the detector in detecting the anomalous region in the validation data for different values of *K*. We cluster the spectral targets in the normalized training data to eight different clusters using the *K*-means clustering algorithm and form a dictionary\mathcal{D} comprising of the cluster centroids. Given the dictionary and the validation data, we find the ground truth by labeling the *i* th validation spectrum as anomalous if{\mathrm{\text{min}}}_{\mathit{f}\in \mathcal{D}}\u2225\mathit{f}-\frac{{\mathit{g}}_{i}^{v}}{\parallel {\mathit{g}}_{i}^{v}\parallel}\u2225>\tau. Since the statistics of the possible background contamination in the data could not be learned in this experiment because of the lack of labeled training data, the dictionary might be background contaminated as well. The parameter *τ* encapsulates this uncertainty in our knowledge of the dictionary. In this experiment, we set *τ*=0*.* 2.

We generate measurements of the form{\mathit{y}}_{i}=\sqrt{K}{\mathit{g}}_{i}^{v}+{\mathit{n}}_{i} for *i*=1,…,128×128, where{\mathit{n}}_{i}\sim \mathcal{N}(0,\mathit{I}). The\sqrt{K} factor indicates that the observed signal strength increases with *K*. For a fixed FDR control value of 0.01, Figure3c,d shows the results obtained for *K*≈*N*/5 and *K*≈*N*/2, respectively. Figure3e shows how the probability of error decays as a function of the number of measurements *K*. The results presented here are obtained by averaging over 1,000 different noise and sensing matrix realizations. From these results, we can see that the number of detected anomalies increases with *K* and the number of misclassifications decrease with *K*.