Single-snapshot DOA estimation by using Compressed Sensing

This paper deals with the problem of estimating the directions of arrival (DOA) of multiple source signals from a single observation vector of an array data. In particular, four estimation algorithms based on the theory of compressed sensing (CS), i.e., the classical ℓ1 minimization (or Least Absolute Shrinkage and Selection Operator, LASSO), the fast smooth ℓ0 minimization, and the Sparse Iterative Covariance-Based Estimator, SPICE and the Iterative Adaptive Approach for Amplitude and Phase Estimation, IAA-APES algorithms, are analyzed, and their statistical properties are investigated and compared with the classical Fourier beamformer (FB) in different simulated scenarios. We show that unlike the classical FB, a CS-based beamformer (CSB) has some desirable properties typical of the adaptive algorithms (e.g., Capon and MUSIC) even in the single snapshot case. Particular attention is devoted to the super-resolution property. Theoretical arguments and simulation analysis provide evidence that a CS-based beamformer can achieve resolution beyond the classical Rayleigh limit. Finally, the theoretical findings are validated by processing a real sonar dataset.


Introduction
The problem of estimating the directions of arrival (DOA) of a certain number of sources has been an active research area for decades [1,2], with applications to monostatic and multi-static radar systems [3][4][5][6][7] and remote sensing [8,9]. The first approach to carrying out space processing, i.e., DOA estimation, from data sampled by an array of sensors was the well-known Fourier beamformer (FB). However, the main drawbacks of the FB are the high level of secondary lobes and poor angular resolution [9]. In fact, the FB suffers from the Rayleigh resolution limit, which is independent of the signalto-noise ratio (SNR). In order to overcome these limitations, adaptive beamformers, such as Capon [10] and MUSIC [11], have been proposed, and their performance is widely investigated, also in the presence of multiplicative noise [8,9] and array errors [12]. However, most of these adaptive algorithms rely on asymptotic assumptions, e.g., high SNR level and large number of snapshots. In many practical applications, for example, in sonar processing, due to physical constraints, e.g., sound speed, only a very small number of snapshots or, in the worst case, a single snapshot is available for DOA estimation [13,14]. Another application in which the number of available snapshots is a critical parameter is the DOA estimation in automotive radar systems (see, e.g., [15]). In the single-snapshot scenario, the adaptive algorithms that require calculating the inverse of the estimated noise covariance matrix, e.g., the Sample Covariance Matrix (SCM), cannot be used since the estimate is rank deficient. Recently, new algorithms, based on the emerging field of the Compressed Sensing (CS) theory have been proposed in the array processing literature (see e.g., [16][17][18]).
The aim of this paper is to investigate the statistical properties of CS-based beamformers. The analysis is carried out in the single-snapshot scenario, which is of practical relevance in sonar and in automotive radar applications. The multi-snapshot scenario is left to future works. The focus here is on three statistical properties: (i) the estimation performance, i.e., the efficiency in the DOA estimation; (ii) the detection performance, i.e. evaluation of the receiver operating characteristic (ROC) curves; and (iii) the resolution capability. In particular, we show that a CS-based DOA estimator is able to guarantee the super-resolution property, typical of the adaptive estimation algorithms.
The remainder of this paper is organized as follows. Section 2 describes the single-snapshot DOA estimation problem. In particular, a brief description of the classical FB and of the four considered CSBs is provided. In Section 3, the estimation and detection performance of the four CSBs are evaluated and compared with that of the FB for two different noise models. The super-resolution property of the CSBs is investigated in Section 4, whereas some results on real sonar data are presented in Section 5. Finally, our conclusions are summarized in Section 6.

The measurement model
Assume a uniformly linear array (ULA) of N omnidirectional sensors spaced by d and a single narrowband source impinging on the array from conic angle θ . Moreover, suppose that only one snapshot is collected at the output of the matched filter for each range cell. In the narrowband case, the vector snapshot can be modeled as [1,19]: where ν ¼ d=λ 0 sin θ is the source spatial frequency, λ 0 is the wavelength of the transmitted signal, v ν ð Þ ¼ 1; exp j2π ν ð Þ; …; exp j2π N−1 ð Þ ν ð Þ ½ T is the N × 1 source steering vector and n is the complex N × 1 measurement noise vector (either Gaussian or non-Gaussian), with zero mean and covariance matrix C. Finally, ρ is a complex scalar that accounts for the transmitted complex amplitude, the radiation pattern of the array sensors, the two-way path loss, the sonar or radar cross section (RCS) of the slowly fluctuating source, and the straddling losses. a The parameter ρ can be modeled as a complex unknown scalar factor of the form ρ = |ρ|e jϕ where the phase ϕ is a uniformly distributed random variable in [0, 2π) and (i) the magnitude |ρ| is a deterministic parameter or (ii) the magnitude |ρ| is a Rayleigh random variable (rv) with statistical power E{|ρ| 2 } = σ ρ 2 , which is equivalent to assume that ρ is a complex zero-mean Gaussian rv with variance σ ρ 2 , i.e., ρ∈CN 0; σ 2 ρ . The model in Equation 1 is relative to a single source; in the multi-source scenario [1,19], the data model is where K is the number of sources and ν k f g K k ¼ 1 are their K spatial frequencies, relative to the K DOAs θ k È É K k ¼ 1 , which are the parameters to be estimated. In this paper, when the random signal model is adopted to characterize the multi-source scenario, the sources are assumed to be independent.

Classical beamforming
Due to the fundamental importance of the DOA estimation problem in a multitude of practical applications, many estimation algorithms have been proposed in the literature. Without claiming to be complete, the estimation methods associated with Equations 1 and 2 can be categorized in two large classes: the non-parametric (spectral-based) algorithms and the parametric algorithms [2]. The non-parametric algorithms (e.g., Fourier, Capon, and MUSIC beamformers) exploit some spectrum-based function of the parameters to be estimated, e.g., the DOAs. More precisely, the DOA's estimation problem is solved by finding the locations of the highest peaks of a spectrumbased function. The parametric techniques, e.g., Deterministic [20] and Stochastic [21] Maximum Likelihood (DML and SML) algorithms, on the other hand, fully exploit the statistical characterization of the measurements and, in general, require a simultaneous search over all the unknown parameters to be estimated. The latter approach often guarantees higher estimation performance than the spectral-based algorithm, albeit at the expense of an increased computational complexity.
However, almost all these algorithms (both spectralbased and parametric methods) have to work in the socalled asymptotic region, i.e., they need high SNR values and a large enough number of snapshots in order to provide reliable estimates. However, in some applications, e.g., sonar applications, due to physical constraints, only a very small number of snapshots or, in the worst case, a single snapshot is available for DOA estimation. In the single-snapshot scenario, adaptive algorithms (such as e.g., Capon, MUSIC, DML, and SML) that rely on an estimate of the noise covariance matrix C cannot be applied. In fact, if the standard Sample Covariance Matrix (SCM) estimator is used, the resulting estimate of C would be rank deficient (see, e.g., [22]). In the single snapshot case then, the only feasible algorithm is the FB.
Under the following three assumptions, 1. The noise vector n is a complex zero-mean Gaussian-distributed random vector with covariance matrix C = σ n 2 I. 2. The number of sources K in the scenario is equal to 1. 3. ρ is a deterministic unknown complex factor.
The maximum likelihood (ML) estimator for ν is given by the location of the maximum of the data periodogram p F (ν) (see e.g., [1,2]): In Equation 3, ν is a continuous variable. However, in practical applications, since the periodogram is evaluated using the fast Fourier Transform (FFT), then p F (ν) is calculated only on a discrete set of spatial frequencies where |ϒ| is the cardinality of the set Υ. Generally, the number of spatial frequencies used to evaluate p F (ϒ) is chosen to be equal to N (the number of array elements). This is the reason for the low resolution of the Fourier beamformer. The resolution property will be extensively discussed in the next section. Then, an estimate of the source spatial frequency is obtained as follows: Again, it must be stressed that the FB suffers of two main drawbacks: the high level of secondary lobes and the Rayleigh resolution limit, which is a problem when K > 1.

A CS approach to single-snapshot DOA estimation
The application of the CS theory to the DOA estimation problem has been investigated in many recent works and it is based on the observation that the number of possible sources in the scenario is much lower than the 'number' of all possible spatial frequencies, that is, in general, a continuous parameter. As shown e.g., in [16], the measurement model in Equations 1 and 2 can be recast as a sparse linear problem by defining an overcomplete dictionary of steering vectors evaluated over a set of possible spatial frequencies Ω = {ν 1 , …, ν G }. In general, the true source spatial frequencies could not belong to this set, i.e., ν k f g K k ¼ 1 ⊄Ω , since Ω is arbitrary chosen without any a priori knowledge on ν k f g K k ¼ 1 . However, in order to guarantee a coherence between the signal in Equations 1 and 2 and the CS-like signal model, we assume The effects of the violation of this assumption (called off-grid effects) are discussed in Section 3.1. An overcomplete representation matrix can be built by collecting all the possible G steering vectors in a matrix: It must be noted that the representation matrix A does not depend on the actual source spatial frequencies but is only function of Ω. In this framework, the signal component is represented by a G × 1 column vector x whose gth entry is equal to ρ g if a source has a spatial frequency ν g and zero otherwise. Since the cardinality G of Ω, i.e., the number of grid points used to cover the spatial frequency domain, is much larger than the number of possible sources, then the vector x is sparse. Finally, the measurement model of Equation 2 can be recast in the well-known linear CS measurement model: Estimating the spectrum-like function p CS Ω ð Þ ¼x Ω ð Þ j j 2 , which is a sort of sparse periodogram, from Equation 6 is equivalent to estimate the spatial energy as a function of the set of assumed spatial frequencies Ω. By assuming to have a single source in the scenario (K = 1), a CS-based DOA estimator is given by the following: where a sparse estimate of x is obtained from the measurement vector y by solving the following constrained optimization problem: x Some consideration on the linear model in Equation 6 should now be done. It is well known from basic CS theory that in order to reconstruct the sparse signal x using the ℓ 1 minimization problem given in Equation 8, the matrix A in Equation 5 must satisfy the restricted isometry property (RIP). It is easy to shown that the matrix A does not satisfy the RIP, since a submatrix composed of a very small number of contiguous columns is already very close to singular [23]. However, in a recent paper [24], the problem of reconstructing a sparse signal from incomplete frequency samples is discussed and analyzed. In particular, consider a discrete time signal x ∈ ℂ G and a randomly chosen set of frequencies Ω. It has been shown in [24] that it is still possible to exactly reconstruct x from the partial knowledge of its Fourier coefficients on the set Ω. We return to this result later on, when the super-resolution property is discussed. As it is obvious from the previous discussion, also in the CSbased approach, the spatial frequency ν is assumed to be a discrete variable. It must be noted that recent works deal with the more challenging case of continuous parameter space (see e.g., [23,25]). However, these recent results fall beyond the scope of this paper.

CS-based beamformers
In this work, four different algorithms are used to find a feasible solution for the constrained optimization problem in Equation 8, i.e., the classical ℓ 1 minimization (L1) algorithm (or least absolute shrinkage and selection operator, LASSO), the fast smoothed ℓ 0 minimization (SL0) algorithm, the sparse iterative covariance-based estimator (SPICE) algorithm and the iterative adaptive approach for amplitude and phase estimation (IAA-APES) algorithm. Even if these four algorithms have been derived starting under different hypotheses, we will show that they are strictly related. In the following, a brief description of the main advantages and drawback of each algorithm is provided.

The ℓ 1 minimization (L1)algorithm
In its most general form, the problem in Equation 8 belongs to the well-known class of constrained optimization problem that can be solved using a LASSO solver (see e.g., [26]). One big advantage of the LASSO algorithm is that it promotes sparse solutions irrespective of the particular noise distribution. On the other hand, the LASSO solver requires the setting of some additional parameters, which have to be chosen heuristically by the user. A wrong choice of these parameters could compromise the convergence of the minimization algorithm. A LASSO-based algorithm is used in [16] to solve the DOA estimation problem. An example of a critical parameter is the threshold value δ in the constraint of Equation 8. Clearly, δ is a function of the noise covariance matrix C that is, in general, unknown, but there are few theoretical studies on this point and the analytical relation between δ and C has not been explicitly derived so far. Moreover, an estimator of δ from the data snapshot y is not yet available in the literature. For the numerical simulation, the NESTA [27] algorithm is used to evaluate the LASSO solution of the minimization problem in Equation 8.

The fast smoothed ℓ 0 minimization(SL0) algorithm
The SL0 algorithm is a suboptimal algorithm based on a continuous approximation of the ℓ 0 norm [28]. It is well known that in order to force the solution of a minimization problem similar to the one in Equation 8 to be the 'sparsest' solution, the function to be minimized by definition of sparsity is the ℓ 0 norm and not the ℓ 1 norm. Since the problem is that the ℓ 0 norm is a discrete and non-convex function, then its minimization is a very difficult problem, at least from a numerical point of view. In order to make the problem more tractable, the ℓ 1 norm is used and the large majority of the theoretical results in CS have been derived for this norm. However, it is also possible to exploit some continuous (but, in general, not convex) approximation of the ℓ 0 norm, as proposed in [28]. Instead of a problem similar to the one in Equation 8, in [28], the authors propose to solve the following problem: where F is some continuous function that approximates the ℓ 0 norm. Of course, the SL0 is a suboptimal algorithm for the DOA estimate. In fact, as it can be seen from Equation 9, the SL0 algorithm does not take into account the measurement noise. In [28], the authors claim that the SL0 is robust with respect to the noise, but there is no theoretical guarantee for this. However, the SL0 algorithm has two advantages with respect to the classical LASSO algorithm: (i) the numerical minimization algorithm (a gradient-based algorithm) is very fast, and (ii) the SL0 algorithm requires the choice of a very small number of critical parameters.

The SPICE algorithm
The SPICE algorithm is an iterative algorithm that, as the previous two algorithms, provides an estimate of a spectrum-like function p SPICE (Ω) of the data snapshot on an assigned set Ω of possible spatial frequencies. The SPICE algorithm was derived for the single snapshot case in [29] and then generalized to the multi-snapshot case in [30]. The SPICE algorithm has a different and stronger statistical foundation with respect to the LASSO algorithm. Moreover, it does not require any difficult and heuristic selection of parameters, since they are jointly estimated within the iterations. In the following, a brief description of the fundamental concepts behind the SPICE algorithm is provided. Suppose that the noise vector n in the measurement model in Equation 2 is a zero-mean Gaussian distributed complex random vector, with covariance matrix C ¼ diag σ 2 n;1 ; …; σ 2 n;N . The covariance matrix of the snapshot y is then: where where σ ρ,k 2 = E{|ρ k | 2 }. Using the notation introduced in the previous section and by assuming a deterministic signal model (the extension to the random signal model is trivial), Equation 10 can be rewritten as follows: where, as before, x g , the gth entry of x, is equal to ρ g if a source has a spatial frequency equal to ν g and zero otherwise. It must be noted that R ¼R if and only if ν k f g K k ¼ 1 ⊂Ω, i.e., the true source spatial frequencies belong to the set of the assumed possible frequencies. The parameters to be estimated are (1) the spectrum-like and (2) the noise covariance matrix C. The joint estimate of these parameters is obtained by minimizing the following covariance-based objective function [31]: where ‖ ⋅ ‖ F is the Frobenius norm. The minimization problem in Equation 13 has an iterative closed form solution [29,30]. Interestingly, even if they have been derived from two completely different perspectives, the SPICE and the LASSO algorithms are strictly related. This connection is based on the Elfving theorem [32] and it has been extensively discussed in [33] and [34].

The IAA-APES algorithm
The IAA-APES algorithm [35] is an iterative and nonparametric algorithm that provides an estimate of a spectrum-like function p IAA -APES (Ω) of the data snapshot on an assigned set Ω of possible spatial frequencies.
As for the SPICE algorithm, it does not require any selection of parameters and can deal with the single snapshot case. In the following, a brief description of the basic principles of the IAA-SPICE algorithm is provided. Let P = diag(p 1 , … p |Ω| ) be a diagonal matrix whose diagonal entries, defined as in Equation 11, represent the power at each spatial frequency on the grid Ω. Furthermore, define a matrix Q(ν g ) to be: where T = A(Ω)PA(Ω) H and A(Ω) is the overcomplete matrix of steering vectors defined in Equation 5. Following [36] and [37], the spectrum-like function is obtained by minimizing the objective function: where ‖z‖ W 2 ≜ z H Wz. A closed form solution of the problem in Equation 15 can be obtained as follows: Since to estimate the spectrum-like function , the IAA-APES algorithm requires the matrix T, which itself depends on the unknown signal power, it has to be implemented as an iterative algorithm [35]. Remarkably, as shown in [35], the IAA-APES algorithm is a close approximation of the ML estimator in the multi-source scenario.

Estimation and detection performance
In this section, we investigate the estimation and the detection performance of the CSB DOA estimators. Regarding the DOA estimation, the root mean square error in the estimation of the true spatial frequency is evaluated for the three described CSBs in the single snapshot-single target scenario and compared with the RMSE of the FB and with the Cramér-Rao Lower Bound (CRLB). The signal model assumed in the simulations is the deterministic model. The RMSE and the ROC curves are evaluated for two different kinds of disturbance: Gaussian white noise and Gaussian white noise plus spatially correlated Gaussian clutter. In this last case, the clutter is modeled as an autoregressive (AR) process of order 1, so its Power Spectral Density (PSD) is given by: where ξ is a complex scalar factor. The covariance matrix of the noise vector n in Equations 1 and 2 is where Q(ξ) has the Toeplitz structure typical of an AR (1) process, i.e., [Q(ξ)] i,j = (ξ |i−j| ) * where the asterisk defines the conjugate operator.

RMSE and CRLB on DOA estimation
The Gaussian white noise case is considered first. The CRLB on the accuracy of DOA estimation in white noise has already been derived in the literature [19]: In order to evaluate the RMSE, the measurement model in Equation 1 has been adopted with the following parameters: n is a white, zero-mean, complex Gaussian vector with covariance matrix C = σ n 2 I with σ n 2 = 1.
|ρ| is a complex unknown scalar factor with ρ j j 2 ¼ SNR Á σ 2 n , where SNR is the signal-to-noise ratio.
The number of independent Monte Carlo trials is 10 4 . The number N of array sensors is 32. The nominal value of the target spatial frequency ν is chosen uniformly at random between −0.5 and 0.5, i.e., ν s ∈U −0:5; 0:5 ½ Þ ð Þ . Since the number of grid points is chosen to be equal to 2 9 for all the beamformers (FB and CSBs), then |ϒ| = |Ω| = G = 2 9 .
In Figure 1, the comparison between the RMSE of the five beamformers and the CRLB is shown. In this scenario, since the FB is the ML estimator, then it is, at least asymptotically, the most efficient estimator. However, from Figure 1, we also get that the FB and the CSBs have very close performance. In particular, the FB and the SL0-CSB algorithms have the same performance. This means that even if the CSBs are suboptimal algorithms in terms of RMSE (at least in the white noise case), the loss in estimation accuracy is negligible. Moreover, we also observe that for SNR below 0 dB, all the estimators are in the 'low SNR' region where the CRLB is not tight. From 0 to 10 dB, the estimation accuracy of all the beamformers is close to the CRLB. However, for SNR greater than 10 dB, the so-called off-grid effects become evident. The off-grid effects are bias errors in the DOA estimation that arise when the nominal target spatial frequency ν does not belong to the set Υ for the FB and to Ω for the CSBs. Of course, the residual bias depends on the 'thinness' of the grid: as the number of the grid points G tends to infinity, the bias tends to zero. Moreover, the residual bias is upper-bounded by 1/2G: in fact, this value is achieved when ν falls exactly between two grid points. In our simulations, the number of grid points is chosen to be equal to 2 9 in order to make the grid-off effects significant only for high SNR.
In Figure 2, the comparison between the RMSE of the five beamformers and the CRLB is shown for the spatially correlated Gaussian clutter scenario. The CRLB for this case has already been derived in the literature [38]: For notation simplicity, we omitted the dependence of the steering vector v on the actual spatial frequency ν . As before, the measurement model in Equation 1 has been adopted with the following parameters: n is a white, zero-mean, complex Gaussian vector with covariance matrix σ n 2 I + σ c 2 Q(ξ), where σ n 2 = 1, [Q(ξ)] ij = (ξ |i−j| ) * , ξ = 0.98e jϑ , ϑ is uniformly distributed in [0, 2π), and σ c 2 is chosen accordingly to the given clutter-to-noise ratio (CNR) value, σ 2 c ¼ CNR Á σ 2 n . In this simulation, CNR = 15 dB. |ρ| is a complex unknown scalar factor with ρ where SINR is the signal-tointerference-plus-noise ratio. Since the number of grid points is chosen to be equal to 2 10 for all the beamformers (FB and CSBs), then |ϒ| = |Ω| = G = 2 10 .
All the other parameters are equal to the ones used in the white noise case. As we can see from Figure 2, also in this case, the RMSEs of the five beamformers are very close to each other. However, for high SINR values, i.e., greater than 25 dB, all the CSBs slightly outperform the FB, that is, no more ML estimator in this scenario [2].

The ROC curves
The ROC curves show the Probability of Detection (P D ) as a function of the Probability of False Alarm (P FA ). More precisely, given a cell under test (CUT) in the spatial frequency domain and by defining H 1 as the event of presence of the source in the CUT, P D can be defined as: where p(·) is one of the 'periodograms' described in Section 2.4. For both the FB and the CSBs, the size of the CUT is chosen to be equal to the Rayleigh resolution limit (see Section 4).
On the other hand, by defining with H 0 the event of absence of source, P FA is defined as The signal model used to evaluate the ROC curves is the random model. In Figures 3 and 4, the ROC curves relative to the white-noise-only case and to the white noise plus spatially correlated Gaussian clutter case are shown. The measurement model in Equation 1 and the random signal model are adopted. The simulation parameters (noise and clutter powers, grid points, and so on) for both the scenarios are chosen to be equal to those used in Section 3.1, except for the signal component ρ, which is assumed to be a zero-mean complex Gaussian rv with . In this simulation, SINR = −10, 0, and 10 dB.
In Figure 3, the ROC curves relative to the white-noiseonly case are shown. In this case, the FB slightly outperforms the CSBs. However, this behavior is somehow expected since, as shown in Section 3.1, the FB is the ML estimator, so it is, at least asymptotically, the most efficient. Nevertheless, the loss in terms of P D for a given P FA of the CSBs with respect to the FB is small. In particular, we observe that the FB and the SL0-CSB algorithms have almost the same performance.
In Figure 4, the ROC curves for the scenario characterized by a spatially correlated clutter model are reported. We note that in this case, all the three CSBs outperform the classical FB. In particular, the SPICE and the L1 algorithms have the best detection performance.

The super-resolution property
It is well known that the FB suffers from the Rayleigh resolution limit, which is independent of the SNR. Some adaptive methods, e.g., MUSIC and Capon, are able to resolve two sources within a Rayleigh cell. However, as discussed before, to achieve super-resolution, they need a sufficiently high SNR level and a suitable number of temporal snapshots (to estimate the disturbance covariance matrix). In this section, we investigate the super-resolution property of the four proposed CSBs. The results show that unlike Capon and MUSIC estimators, a CSB can achieve the superresolution with only one temporal snapshot, without the need to estimate the disturbance covariance matrix.
For a ULA of N array elements, the Rayleigh resolution limit, i.e., the beamwidth in the spatial frequency space, defined as the full width of the main lobe at the half-power level, is [1] Then, if two sources are spaced by less than the Rayleigh resolution limit Δv, they cannot be resolved by a classical non-adaptive FB. Instead, providing a sufficient level of SNR and a suitable number of grid points, a CSB is able to resolve sources that are in the same Rayleigh resolution cell. The ability of a CSB to achieve super-resolution has been also discussed in recent works. For example, in [16], the authors investigate the super-resolution property for a CSB for the single and the multi-snapshot scenarios. However, only a qualitative proof of this property is provided, neither a strong theoretical justification nor a statistical characterization of the CS super-resolution capability is reported in [16]. In this paper, a fundamental result of the CS theory is exploited to provide theoretical justification and a rigorous definition of the CS super-resolution capability. Moreover, this property of the CSB is also statistically characterized.
The ability of a CSB to resolve two sources below the Rayleigh limit, even using a single snapshot, is strictly related to the fundamental Theorem 1.3 in [24]. Roughly speaking, this theorem claims that under the sparsity assumption, it is possible to exactly reconstruct (with overwhelming probability) a complex signal x from a very low number of its Fourier coefficients (or anti-coefficient). This theorem is clearly related to the CS beamforming by the Equation 6. In fact, as discussed previously, by assuming that the true source spatial frequencies belong to the set Ω Equation 6 is equivalent to the measurement model of Equation 2, then it is clear from the particular structure of the matrix A(Ω) that the entries of the measurement vector y represent N Fourier (anti-) coefficients of the complex vector x ∈ ℂ G (with N ≪ G = |Ω|) corrupted by noise. Theorem 1.3 can be recast in the following form, more suitable in the array processing framework. Theorem 1.3 [24] (Array processing formulation). Let K and N be the number of sources in a given range cell and the number of array elements, respectively. Let Ω = {ν 1 , …, ν G } be the set of cardinality G of spatial frequencies in the grid and let A(Ω) = [a(ν 1 )| ⋯ |a(ν G )] be the overcomplete matrix of steering vectors on Ω. If the sparse complex signal x (and consequently the source DOA) is recovered from the single noise-free spatial snapshot y by solving the following optimization problem then, with probability at least η = 1 − O(G −d ), the solution of the problem in Equation 24 is unique and is equal to x. The value η represents the probability of exact reconstruction, P ex ≜ Pr x ¼x CS f g¼η. The value of C d is explicitly derived in [24] under asymptotic conditions, i.e., valid for N ≤ G/4, d ≥ 2, and G ≥ 20, as C d = 1 / (23(d + 1)). In particular, C d depends on the desired value of P ex . Using the result of Theorem 1.3, a CS super-resolution limit can be defined. In fact, if it is possible to reconstruct x on a set Ω of cardinality G with probability at least η then, with probability at least η, it is also possible to resolve two sources spaced by It can be seen that while the Rayleigh resolution limit in Equation 23 decreases as N −1 , the CS super-resolution limit in Equation 26 decreases as exp ( − C d N/K). In particular, given a fixed number of array elements N, the minimum possible spatial frequency separation between two sources is at least Þif CSB is used. Roughly speaking, the Rayleigh resolution limit decreases linearly with the number N of sensors, while the CS superresolution limit decreases exponentially with N.
Finally, some comment needs to be made regarding the constant C d . As stated in [24], the asymptotic value of C d provided by Theorem 1.3 is a very conservative value. In other words, the same value of P ex can be guaranteed by a smaller value of C d than the one provided in Theorem 1.3 and therefore, a finer resolution can be achieved in practice. Additional theoretical investigations are necessary to refine the value of C d , and this is an active research area within the CS community. A first attempt to refine this constant can be found in [39]. Theorem 1.3 refers to the noise-free case. Moreover, the true spatial frequency is assumed to belong to the set Ω. In array processing applications, however, since a certain amount of noise is always present, then the optimization problem in Equation 24 should be replaced with the problem in Equation 8. Of course, in the noisy case and in the presence of the off-grid events, one cannot expect exact recovery [24] and Theorem 1.3 is no longer valid. In the    following, the robustness of the previous results on CS super-resolution is verified against the measurement noise and the off-grid effects as a function of the SNR.
To perform this robustness analysis, a resolution criterion is necessary [40]. To this purpose, we exploit the procedure proposed in [41] to investigate the superresolution property of the MUSIC algorithm. Following [41], a super-resolution event can be characterized by means of the following random inequality: where ν m = (ν 1 + ν 2 )/2. Two sources located at spatial frequencies ν 1 and ν 2 are said to be resolvable if the inequality in (27) holds true and to be irresolvable otherwise. Hence, this problem can be seen as a binary decision problem, where γ is the decision statistic. Finally, the probability of resolution can be defined as: In the following, P res is evaluated as a function of the SNR and of the frequency separation between the two sources. The measurement noise generated in the simulation is white Gaussian. Figure 5 shows the probability of resolution (P res ) of the various beamformers as a function of SNR. The simulation parameters are the following: 9 . ν m = 0.3, while ν 1 and ν 2 are sampled, in the same Rayleigh resolution cell (Δν ≃ 0.0277), from two uniform and independent probability density functions, such that ν i ∼U μ i −1=2G; μ i þ 1=2G ð Þ where μ 1 = 0.2922 and μ 2 = 0.3078. This allows us to model the grid-off effects. The number of independent Monte Carlo trials is 10 3 .
In accordance with the previous result on the estimation accuracy (see Figure 1), for SNR lower than 0 dB, all the estimators are in the non-asymptotic region: they do not provide reliable DOA estimates. Beyond 0 dB, the P res of the four CSBs is much better than that of the FB. In particular, the P res of the FB is much lower than that of the CSBs and independent of the SNR. Regarding the CSBs, the best estimator, at least in terms of P res , is the SPICE algorithm: its P res , as well as the one of the IAA-APES, tends to 1 as the SNR increases. Figure 11 Bistatic geometry during the COLLAB13 experiment. A fixed source insonifies the surveillance region, while a 32-element acoustic array towed by an AUV acts as a receiver to detect echoes from the test target. In Figure 6, P res is evaluated as a function of the source separation Δ in the spatial frequency domain, for SNR = 10 dB and |ϒ| = |Ω| = G = 2 9 . The frequency separation, denoted as Δ = 2l/G for l = 1, 2, …, L < G, is defined with reference frequency ν m ∼U μ m −1=2G; μ m þ 1=2G ð Þ with μ m =0.3 so that the spatial frequencies of the two sources are given by ν 1 = ν m − Δ/2, ν 2 = ν m + Δ/2. The number of Monte Carlo trials is 10 3 . Even in this case, the P res of the CSBs is always higher than that of the FB. Clearly, by decreasing Δ, P res decreases. The SPICE algorithm is the one that provides the best P res .
In order to get a deeper insight in the performance of the beamforming algorithms, in the presence of two sources in the same Rayleigh resolution cell in Figures 7  and 8, the RMSE of the investigated beamformers have been compared with that of the RELAX algorithm [42,9], a well-known parametric DOA estimator, and with the CRLB as function of the SNR. The deterministic signal model is exploited; the CRLB is given by Eq. (4.1) in [19]. The simulation parameters are the following: |ϒ| = |Ω| = G = 2 10 , ν m = 0.3, while, in order to take into account the grid-off effects, ν 1 and ν 2 are sampled, in the same Rayleigh resolution cell (Δν ≃ 0.0277), from two uniform and independent probability density function, such that ν i ∼U μ i −1=2G; μ i þ 1=2G ð Þ where μ 1 = 0.2930 and μ 2 = 0.3066. The number of independent Monte Carlo trials is 10 5 .
As expected, the FB presents a high RMSE and a bias in the estimate of ν 1 and ν 2 , since it is not able to resolve the two sources. The CSBs and the RELAX beamformers that have the super-resolution property provide better estimation performance.
Finally, the RMSE in the DOA estimation of two sources in different Rayleigh resolution cells is shown in Figures 9  and 10. The simulation parameters are the same used in the previous case (i.e., sources in the same resolution cell) except for the values of μ 1 and μ 2 . In this simulation, we set μ 1 = 0.2803 and μ 2 = 0.3193. The results highlight the ability of the CSBs (except for the SL0 algorithm) and of the RELAX algorithm to achieve better performances in  terms of RMSE with respect to the FB. However, such estimation performance tends to be equal to the one obtained in the single-source scenario as the distance in spatial frequency between the two sources increases.

Testing on real sonar data
The Cooperative Littoral ASW Behaviour (COLLAB) 2013 experiment has been conducted by CMRE in La Spezia waters, Italy, from 29 June to 7 July 2013. The purpose of the experiment was to test environmentally adaptive, collaborative area search algorithms and behaviors for Autonomous Underwater Vehicles (AUVs) which act as the receiving nodes in the Generic Littoral Intelligent Network Technology (GLINT) Autonomous Sensor Networks (AuSN) antisubmarine warfare (ASW) demonstration system [43,44]. Figure 11 shows an example of the bistatic geometry used during the experiment. The transmitter, at fixed known position, insonifies the surveillance region with a pre-defined pulse repetition interval (PRI) by transmitting a frequency-modulated chirp signal. An underwater towed target (an echo repeater) was used to test detection and tracking performance by a 32-element hydrophone array towed by an AUV. Target and AUV navigation data has been registered to allow the generation of ground truth data to be used in the post analysis and validation of the performance. The parameters of the bistatic geometry (see Figure 11) include the following: the transmitter/target distance R tx , the receiver/target distance R rx , and the transmitter/receiver baseline L; θ rx , θ tx , and θ tg are the heading angles of receiver, transmitter, and target, respectively; ϕ is the bearing angle of the target with respect to an (x, y) local coordinate reference system on the array, as depicted in Figure 11. Figure 12 highlights the position of the AUV and the target for the scan at time T 1 = 08:55:12Z, 1 July 2013, as well as the heading of the receiving array and the bistatic geometry. The received scan data are processed to compare CSBs against the classical Fourier beamformer. Preprocessing of the received data includes baseband conversion, complex matched filtering and normalization for the bistatic range sum (BRS = R tx + R rx , estimated from the echo time of arrival at the receiver) profile attenuation. The attenuation profile has been estimated from the data by using a set of mathematical morphology filters as in [45] and [46]. Figure 13 shows the average power of the array elements after the normalization for the attenuation profile at scan time T 1 . The target echo is present at an  approximated BRS of 4,600 m. The average target excess with respect to the mean clutter level is about 5 to 6 dB. The direct echo from the transmitter is also present at BRS of 2,800 m. The figure plots the power normalized with respect to the direct blast maximum power.
The complex normalized data have been spatially processed using a spatial frequency grid of 180 points for each range cell of a scan. Figure 14 shows the results of the FB at scan time T 1 . The map scale represents normalized power in decibel with respect to the direct blast maximum power. The target (red circle) is visible at a BRS of 4,600 m, as expected from Figure 9, and spatial frequency 0.3. The direct blast (green circle) is visible at a BRS of 2,800 m and spatial frequency −0.27. It is worthwhile to mention that the results of the spatial beamforming are affected by the so called left-right ambiguity [44] that shows up in linear arrays, so the real target bearing (or spatial frequency) can be actually the opposite of the one observed in the map. Here, the disambiguation is achieved using the target and the AUV navigation data, the transmitter position, and the array parameters to locate the target within the map. Figures 15,16,17,18 show the SPICE, the L1, the SL0, and the IAA-APES beamformer outputs, respectively, at scan time T 1 . As showed in Figure 19, the resolution of the CSBs is higher than the classical FB.
The analysis of the processing time per range cell per iteration of the L1, SPICE, SL0, and IAA-APES solvers is reported in Figure 20a for 289 different sonar scans (a fixed spatial frequency grid of 180 points was considered). b The statistics were evaluated by processing 180 range cells per scan centered on the test target (the echo repeater). The processing time averaged over the scans is T L1 = 1.68 × 10 − 4 s, T SPICE = 1.42 × 10 − 3 s, T SL0 = 2.5 × 10 − 4 s, and T IAA -APES = 4.27 × 10 − 3 s for L1, SPICE, SL0, and IAA-APES, respectively. The processing time per range cell of the Fourier beamformer (T FFT = 5.79 × 10 − 6 s) is also reported as a reference (conventionally, the number of iterations of the Fourier beamformer is set to 1). The L1 solver has lower processing time than SPICE, SL0, and IAA-APES. In particular, SPICE and IAA-APES solvers have a processing time that is an order of magnitude higher than the two other CSBs. For a fair comparison among the CSBs, the number of iterations to reach the convergence is also evaluated and reported in Figure 20b for the same scans. The average over the scans is N I,L1 = 280, N I,SPICE = 95, N I,SL0 = 18, and N I,IAA-APES = 93  for L1, SPICE, SL0, and IAA-APES, respectively. Considering a typical application setup, with a number of range cells per scan N R = 4,916, a number of processors N P = 8 (eight range cells are processed in parallel) and making the hypothesis that the time per scan scales linearly, the average processing time per scan, T S,(·) = T (·) N R N I,(·) /N P , of the four CSBs is T S,L1 = 29 s, T S , SPICE = 83 s, T S , SL0 = 2.8 s, and T S , IAA-APES = 244 s. Even if a comprehensive and more detailed analysis of the processing time is out of the scope of this work, the preliminary tests here reported show that the SL0 solver outperforms L1, SPICE, and IAA-APES by one and two orders of magnitude, respectively, providing the indication that the SL0 may be a candidate for a future real-time implementation of the CSB.

Conclusions
In this paper, some CS-based beamformers, i.e., the classical ℓ 1 minimization (or LASSO), the fast smooth ℓ 0 minimization, the SPICE, and the IAA-APES algorithms, have been analyzed and compared with the classical FB for target DOA estimation in a single-snapshot scenario. We analyze the estimation accuracy, the detection performance, and the resolution capability. The performance of the CSBs has been investigated, both in the presence of white Gaussian noise and in the presence of spatially correlated Gaussian noise. Regarding the estimation performance, the RMSE of the FB and of the four CSBs has been compared with the CRLB in the white noise scenario (that is the case when the FB is the ML estimator) and in the spatially correlated noise scenario. As concerning the estimation accuracy and the detection performance, we found that the FB slightly outperforms the CSBs in the white noise scenario, whereas the four CSBs outperform the classical FB in the spatially correlated noise scenario. In particular, the SPICE and the L1 algorithms have the best detection performance, especially at low SNR values. Concerning the resolution capability, we verified that the CSBs can achieve super-resolution beyond the Rayleigh limit even with a single pulse, while classical super-resolution algorithms like MUSIC need multiple snapshots. Theoretical arguments have been proposed here to link the super-resolution property of the CSBs to the CS theory, and a new rigorous definition of CS super-resolution limit has been provided. Moreover, a robustness analysis of the CS super-resolution property has been carried out exploiting a classical method already used in the array processing literature to statistically characterize the MUSIC super-resolution capability in terms of probability of resolution. The simulations have shown that the SPICE algorithm has the best superresolution capability. Finally, the performance of the four CS-based beamformers has been tested on real sonar data. In particular, the range-spatial frequency maps at the output of the four CS-based beamformers have been evaluated  and compared with the map at the output of the Fourier beamformer. From this comparison, the ability of the CSbased algorithms to reduce the secondary lobes and then, to reduce the probability of false alarm, becomes clear. Using the same range-spatial frequency maps, the superresolution capability of the CS-based beamformers has been verified as well. As concerning the processing time of the four CSBs, both the simulated and the real data analyses have shown that the SL0 algorithm is at least one order of magnitude faster than SPICE, IAA-APES, and L1 algorithms. Since, in many practical applications, a low processing time is a stringent requirement, the SL0 algorithm could represent a good tradeoff between the statistical optimality and the practical implementation.
Future research efforts will be devoted to the multisnapshot case. Moreover, a deeper comprehension of the statistical properties of the CSBs in different noise and clutter distributions, e.g., the widely known compound-Gaussian distributions, has to be developed.
Endnotes a The straddling losses arise because a source is not always precisely centered in a range-Doppler gate, so the acquired sample is not located at the maximum of the matched filter output. b The tests were performed on an Intel® Xeon® E5620 2.40 GHz multicore processor (Intel Corporation, Sta Clara, CA, USA) using a Matlab® (MathWorks Inc., Natick, MA, USA) implementation of the three CSBs.