 Research
 Open Access
 Published:
Sparse covariance fitting for direction of arrival estimation
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 111 (2012)
Abstract
This article proposes a new algorithm for finding the angles of arrival of multiple uncorrelated sources impinging on a uniform linear array of sensors. The method is based on sparse signal representation and does not require either the knowledge of the number of the sources or a previous initialization. The proposed technique considers a covariance matrix model based on overcomplete basis representation and tries to fit the unknown signal powers to the sample covariance matrix. Sparsity is enforced by means of a l_{1}norm penalty. The final problem is reduced to an objective function with a nonnegative constraint that can be solved efficiently using the LARS/homotopy algorithm. The method described herein is able to provide high resolution with a low computational burden. It proceeds in an iterative fashion solving at each iteration a small linear system of equations until a stopping condition is fulfilled. The proposed stopping criterion is based on the residual spectrum and arises in a natural way when the LARS/homotopy is applied to the considered objective function.
1. Introduction
Brief summary of classical direction of arrival estimators
The estimation of the directions of arrival (DoA) of multiple sources using sensor arrays is an old problem and plays a key role in array signal processing. During the last five decades, a plethora of methods have been proposed for finding the DoA of different narrowband signals impinging on a passive array of sensors. These methods can be divided into two categories: parametric and nonparametric estimators.
Nonparametric methods include beamforming and subspace methods. The former relies on scanning the power from different locations. Exponents of this category are conventional beamformer [1] and Capon's method [2]. Conventional beamformer, a.k.a. Barlett beamformer, suffers from poor spatial resolution and cannot resolve sources within the Rayleigh resolution limit [1]. As it is well known, this lack of resolution can be mitigated only by increasing the number of sensors of the array because improving the SNR or increasing the number of time observations does not increase the resolution. On the contrary, Capon's minimum variance method can resolve sources within the Rayleigh cell if the SNR is high enough, the number of observations is sufficient and the sources are not correlated. Unfortunately, in practice, Capon's power profile is strongly dependent on the beamwidth, which, on its turn, depends on the explored direction and in some scenarios this could lead to a resolution loss. To counteract this, an estimator of the spectral density obtained from the Capon's power estimate was derived in [3] achieving better resolution properties. Herein this method will be referred as Normalized Capon. Another wellknown category of nonparametric DoA estimators is the one composed by subspace methods. These algorithms are able to provide highresolution and outperform beamforming methods. The most prominent member of this family is MUltiple SIgnals Classification (MUSIC) [4], it relies on an appropriate separation between signal and noise subspaces. This characterization is costly and needs a previous estimation of the number of incoming signals.
Parametric methods based on the maximum likelihood criterion [5] exhibit a good performance at expenses of a high computational cost. These techniques estimate the parameters of a given model instead of searching the maxima of the spatial spectrum. Unfortunately, they often lead to difficult multidimensional optimization problems with a heavy computational burden.
An interesting algorithm that lies in between the class of parametric and nonparametric techniques is the CLEAN algorithm. This method was first introduced by Högbom [6] and have applications in several areas: array signal processing, image processing, radar and astronomy. Recently, Stoica and Moses throw light on the semiparametric nature of the algorithm [7]. In broad outline, it operates in a recursive manner subtracting at each iteration a fraction of the strongest signal from the observed spatial spectrum.
For those readers interested on a more detailed and comprehensive summary of angle of arrival estimators, the authors refer them to [1, 8].
Sparse signal representation
Sparse representation of signals over redundant dictionaries is a hot topic that has attracted the interest of researchers in many fields during the last decade, such as image reconstruction [9], variable selection [10], and compressed sensing [11]. The most basic problem aims to find the sparsest vector x such that y = Ax, where y is the measured vector and A is known. This matrix A is called dictionary and is overcomplete, i.e., it has more columns that rows. As a consequence, without imposing a sparsity prior on x, the set of equations y = Ax is underdetermined and admits many solutions. Formally, the objective is to minimize ∥x∥_{0} subject to y = Ax, where ∥·∥_{0} denotes the l_{0}norm [12]. This is an intractable NPhard combinatorial problem in general [13]. Fortunately, if the vector is sufficiently sparse the problem can be relaxed replacing the l_{0}norm by a l_{1}norm, defined as ∥x∥_{1} = ∑_{ i }x_{ i }, leading to a convex optimization problem with a lower computational burden. The conditions that ensure the uniqueness of the solution were studied in [14].
In case of an observation vector contaminated by noise, a natural variation is to relax the equality constraint to allow some error tolerance ε ≥ 0:
or alternatively,
where the constraint ∥x∥_{1} ≤ β with β ≥ 0 promotes sparsity. This formulation is known as Least Absolute Shrinkage and Selector Operator (LASSO) and was originally proposed by Tibishirani [15]. The augmented formulation of (2) is wellknown in signal processing and is commonly called Basis Pursuit Denoising (BPDN) [16]:
The three formulations (1)(3) are equivalent in the sense that the sets of solutions are the same for all the possible choices of the parameters τ, ε, β. To go from one formulation to the other we only need a proper correspondence of the parameters. Nevertheless, even if the mapping between the regularization parameters exists, this correspondence is not trivial and it is possibly nonlinear and discontinuous [17].
When the vector x is real, the LASSO problem (2), or its equivalent formulation (3), can be solved with standard quadratic programming techniques [15]. However, these techniques are time demanding and faster methods are preferred. Osborne et al. [18] and later Efron et al. [19] proposed an efficient algorithm for solving the LASSO. This algorithm is known as "homotopy method" [18] or LARS (Least Angle Regression) [19]. In this article this technique will be referred to as LARS/homotopy. A variant of the traditional LASSO problem, that will be specially useful in the covariance fitting that will be addressed later on, is the socalled positive LASSO. In this case, an additional constraint over the entries of the vector x is considered in the LASSO problem to enforce the components of the vector to be nonnegative:
The positive LASSO problem (4) can be solved in a efficient way introducing some slight modifications in the traditional LARS/homotopy. This approach was proposed by Efron et al. [19], but is not as widely known as the traditional one. Briefly, the algorithm starts with a very large value of τ, and gradually decreases the regularization parameter, until the desired value is attained. As τ evolves, the optimal solution for a given τ, x(τ), moves on a piecewise affine path. As the minimizer x(τ) is a piecewiselinear function of τ we only need to find the critical regularization parameters τ_{0}, τ_{1}, τ_{2}, ..., τ_{stop} where the slope changes [17], these values are the socalled breakpoints. The algorithm starts with x = 0 and operates in an iterative fashion calculating the critical regularization parameters τ_{0} > τ_{1} > ⋯ > τ_{stop} ≥ 0 and the associated minimizers x (τ_{0}), x (τ_{1}), ..., x (τ_{stop}) where an inactive component of x becomes positive or an active element becomes equal to zero. Normally, the number of active components increases as τ decreases. Nevertheless, this fact cannot be guaranteed: at some breakpoints, some entries may need to be removed from the active set.
Sparse representation in source location
Although there are some pioneering studies carried out in the late nineties, e.g., [20, 21], the application of sparse representation to direction finding has gained noticeable interest during the last decade. Recent techniques based on sparse representation show promising results that outperform conventional highresolution methods such as MUSIC. In [20] a recursive weighted minimumnorm algorithm called FOCUSS was presented. This algorithm considers a single snapshot and requires a proper initialization. The extension to the multiplesnapshot case was carried out in [22] and it is known as MFOCUSS. Unfortunately, as it is described in [23], this technique is computationally expensive and requires the tuning of two hyperparameters that can affect the performance of the method significantly.
If multiple snapshots can be collected in an array of sensors, they can be used to improve the estimation of the angles of arrival. Several approaches for summarizing multiple observations have been proposed in the literature. The first of these approaches is the socalled l_{1}SVD presented by Malioutov et al. [24]. This method is based on the application of a singular value decomposition (SVD) over the received data matrix and leads to a secondorder cone optimization problem. This algorithm requires an initial estimation of the number of sources. Although it does not have to be exact, a small error is needed for a good performance. An underestimation or an overestimation of the number of sources provokes a degradation in the performance of the method. Even if the effect of an incorrect determination of the number of sources has no catastrophic consequences, such as the disappearance of the sources in MUSIC, the performance of the algorithm can be considerably degraded. Another important drawback is that l_{1}SVD depends on a userdefined parameter which is not trivial to select. An alternative approach to summarize multiple snapshots is the use of mixed norms over multiple measurement vectors (MMV) that share the same sparsity pattern [22, 25]. This formulation is useful in array signal processing, specially, when the number of snapshots is smaller than the number of sensors. If we assume that the snapshots are collected during the coherence time of the angles, the position of the sources keep identical among the snapshots; the only difference between them resides in the amplitudes of the impinging rays. Basically, this approach, which is out of the scope of the article, tries to combine multiple snapshots using the l_{2} norm and to promote sparsity on the spatial dimension by means of the l_{1}norm. Unfortunately, this joint optimization problem is complex and requires a high computational burden. When the number of snapshots increases, the computational load becomes too high for practical realtime source location. Recently, new techniques based on a covariance matrix fitting approach have been considered to summarize multiple snapshots, e.g., [26–28]. Basically, these methods try to fit the covariance matrix to a certain model. The main advantage of covariance fitting approaches is that they lead to convex optimization problems with an affordable computational burden. Moreover, they do not require a previous estimation of the number of incoming sources or heavy computations such as SVD of the data. It should be also pointed out that as these methods work directly with the covariance matrix less storage space is needed because they do not need to store huge amounts of time data. The technique proposed by Yardibi et al. [26] leads to an optimization problem that can be solved efficiently using Quadratic Programming (QP). In the case of the approach exposed by Picard and Weiss [27], the solution is obtained by means of linear programming (LP). The main drawback of this last method is that it depends on a user defined parameter that is difficult to adjust. In the same way, Liu et al. [29] propose a new method which is based on a hyperparameter that has been heuristically determined. On the contrary, Stoica et al. [28, 30] propose an iterative algorithm named SParse Iterative Covariancebased Estimation approach (SPICE), that can be used in noisy data scenarios without the need for choosing any hyperparameter. The major drawback of this method is that it needs to be initialized.
Article contribution
This article proposes a simple, fast, and accurate algorithm for finding the angles of arrival of multiple sources impinging on a uniform linear array (ULA). In contrast to other methods in the literature, the proposed technique does not depend on userdefined parameters and does not require either the knowledge of the number of sources or initialization. It assumes white noise and that the point sources are uncorrelated.
The method considers a structured covariance matrix model based on overcomplete basis representation and tries to fit the unknown signal powers of the model to the sample covariance. Sparsity is promoted by means of a l_{1}norm penalty imposed on the powers. The final problem is reduced to an objective function with a nonnegative constraint that can be solved efficiently using the LARS/homotopy algorithm, which is, in general, faster than QP [19] and LP [17]. The method described herein proceeds in an iterative manner solving at each iteration a small linear system of equations until a stopping condition is fulfilled. The proposed stopping criterion is based on the residual spectrum and arises in a natural way when the LARS/homotopy is applied to the considered objective function. From the best of our knowledge this stopping condition has never been considered before in sparse signal representation.
2. The proposed method: sparse covariance fitting for source location
Consider L narrowband signals ${\left\{{x}_{i}\left[k\right]\right\}}_{i=1}^{L}$ impinging on an array of M sensors. The k th observation can be expressed as:
where x [k] = [x_{1} [k] ⋯ x_{ L }[k]]^{T}is the vector of unknown source signals, the matrix S (θ) ∈ ℂ^{M}^{×} ^{L}is the collection of the steering vectors corresponding to the angles of arrival of the sources θ= [θ_{1}, ..., θ_{ L }]^{T}, that is, S (θ) = [s (θ_{1}) ⋯ s (θ_{ L })], and w [k] ∈ ℂ^{M}^{× 1} denotes a zeromean additive noise, spatially, and temporally white, independent of the sources with covariance matrix ${\sigma}_{w}^{2}{\mathbf{I}}_{M}$, being I_{ M }the identity matrix of size M.
Taking into account (5) the spatial covariance matrix can be expressed as:
being P = E {x [k] x^{H}[k]}. The classical direction finding problem can be reformulated as a sparse representation problem. With this aim, let us consider an exploration grid of G equally spaced angles Φ = {ϕ_{1}, ..., ϕ_{ G }} with G >> M and G >> L. If the set of angles of arrival of the impinging signals θ is a subset of Φ, the received signal model (5) can be rewritten in terms of an overcomplete matrix S_{ G }constructed by the horizontal concatenation of the steering vectors corresponding to all the potential source locations.
where S_{ G }∈ ℂ^{M}^{×} ^{G}contains the steering vectors corresponding to the angles of the grid S_{ G }= [s_{1} ⋯ s_{ G }], with s_{ i }= s(ϕ_{ i }), and x_{ G }[k] ∈ ℂ^{G}^{× 1} is a sparse vector. The nonzero entries of x_{ G }[k] are the positions that corresponds to the source locations. In other words, the n th element of x_{ G }[k] is different from zero and equal to the q th component of the vector x [k] defined in (5), denoted by x_{ q }[k], if and only if ϕ_{ n }= θ_{ q }. It is important to point out that the matrix S_{ G }is known and does not depend on the source locations.
The assumption that the set of angles of arrival is a subset of Φ is only required for the derivation of the algorithm. Obviously, it does not always hold. Actually, this is a common assumption in many exploration methods in the direction finding literature (e.g., Capon, Normalized Capon, MUSIC, etc). In the case that θ⊈ Φ, the contribution of the sources leaks into the neighboring elements of the grid.
Bearing in mind (7) and assuming a white noise with covariance matrix ${\sigma}_{w}^{2}{\mathbf{I}}_{M}$, the spatial covariance matrix of (5) can be rewritten in terms of S_{ G }and takes the form:
with $\mathbf{D}=E\left\{{\mathbf{x}}_{G}\left[k\right]{\mathbf{x}}_{G}^{H}\left[k\right]\right\}$. An important remark is that D ∈ ℂ^{G}^{×} ^{G}is different to the source covariance matrix P ∈ ℂ^{L}^{×} ^{L}introduced in (6). Actually, since only L^{2} entries out of G^{2} can differ from zero, D is a sparse matrix.
A common assumption in many direction finding problems is that sources are uncorrelated. Under this assumption, the matrix D is a diagonal matrix with only L nonzero entries given by diag (D) = [p_{1} ⋯ p_{ G }]^{T}= p, being $\mathbf{p}\in {\mathbb{R}}_{+}^{G\times 1}$.
Note that p is a G × 1 sparse vector with nonzero entries at positions corresponding to source locations. Furthermore, the elements of p are realvalued and nonnegative.
To cast the problem into a positive LASSO with real variables let us make some manipulations on (8). Applying vectorization to (8) it yields:
where ⊗ and vec {·} denote the Kronecker product and the vectorization operator. It should be remarked that the result of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}\in {\u2102}^{{M}^{2}\times {G}^{2}}$.
Since D is a diagonal matrix because the sources are uncorrelated, only G columns of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}$ have to be taken into account. Using this fact, the dimensionality of the problem can be reduced. In this way, it is straightforward to rewrite the expression (9) in terms of vector p just removing some columns of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}$:
with $\stackrel{\u0303}{\mathbf{A}}=\left[{\mathbf{s}}_{1}^{*}\otimes {\mathbf{s}}_{1}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{s}}_{2}^{*}\otimes {\mathbf{s}}_{2}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{s}}_{G}^{*}\otimes {\mathbf{s}}_{G}\right]$. Note that $\stackrel{\u0303}{\mathbf{A}}\in {\u2102}^{{M}^{2}\times G}$.
Separating real and imaginary parts the above equation takes the form:
where
In the expression (11), vec{I_{ M }} denotes the vectorization of the identity matrix of dimensions M × M and ${0}_{{M}^{2}\times 1}$ is a vector of zeros of size M^{2} × 1. More compactly, the expression (11) can be rewritten as:
with obvious definitions for r, A, p, and n. Note that r and $\mathbf{n}\in {\mathbb{R}}^{2{M}^{2}\times 1}$ and $\mathbf{A}\in {\mathbb{R}}^{2{M}^{2}\times G}$.
Unfortunately, the spatial covariance matrix is unknown in practice and is normally replaced by the sample covariance matrix obtained from a set of N observations $\widehat{\mathbf{R}}=\frac{1}{N}\sum _{k=1}^{N}\mathbf{y}\left[k\right]{\mathbf{y}}^{H}\left[k\right]$. A possible method for finding p is the following constrained least squares problem:
Where $\widehat{\mathbf{r}}=\left[\begin{array}{c}\hfill \text{Re}\left\{\text{vec}\left[\widehat{\mathbf{R}}\right]\right\}\hfill \\ \hfill \text{Im}\left\{\text{vec}\left[\widehat{\mathbf{R}}\right]\right\}\hfill \end{array}\right]$.
Note that (13) is positive LASSO problem. The main idea behind (13) is to fit the unknown powers to the model such that the solution is sparse. The method tries to minimize the residual, or in other words, tries to maintain the fidelity of the sparse representation with the received data subject to a nonnegative constraint on the powers and $\sum _{j=1}^{G}{p}_{i}\le \beta $. This last constraint promotes sparsity, as it was exposed in (2), but also imposes a bound in the received signal power. Unfortunately, the parameter β is unknown and has to be estimated. Even worse, the solution of (13) is very sensitive to the parameter β, a little error in the estimation of the parameter can lead to a wrong solution vector.
Instead of solving (13) let us consider the next equivalent formulation:
The problems (13) and (14) are equivalent in the sense that the path of solutions of (13) parametrized by a positive β matches with the solution path (14) as τ varies. To go from one formulation to the other one we need a proper correspondence between the parameters.
The problem (14) can be solved in an efficient way with the LARS/homotopy algorithm for positive LASSO. The method operates in an iterative fashion computing the critical regularization parameters τ_{0} > τ_{1} > ⋯ > τ_{stop} ≥ 0 and the associated minimizers p (τ_{0}), p (τ_{1}), ..., p (τ_{stop}), where an inactive component of p becomes positive or an active element becomes equal to zero. Let us remark that there is only one new candidate to enter or leave the active set at each iteration (this is the "one at a time condition" described by Efron et al. [19]).
The algorithm is based on the computation of the socalled vector of residual correlations, or just residual correlation, $\mathbf{b}\left(\tau \right)={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left(\tau \right)\right)$ at each iteration. The method starts with p = 0 which is the solution of (14) for all the $\tau \ge {\tau}_{0}=2\underset{i}{\text{max}}{\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i}$, being ${\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i}$ the i th component of the vector ${\mathbf{A}}^{T}\widehat{\mathbf{r}}$, and proceeds in an iterative manner solving reducedorder linear systems. The whole algorithm is summarized in Algorithm 1 (see [19, 31] for further details). This iterative procedure must be halted when a stopping condition is satisfied. This stopping criterion, which is the main contribution of this article, will be described later in Section 3.
It should be pointed out that the least squares error of the covariance fitting method exposed in (14) decreases at each iteration of the LARS/homotopy algorithm. This result is justified by the next two theorems.
Theorem 1: The sum of the powers increases monotonically at each iteration of the algorithm. Given two vectors with nonnegative elements p(τ_{n+1}) and p(τ_{ n }) that are minimizers of (14) for two breakpoints τ_{n+1}and τ_{ n }, respectively, with τ_{ n }> τ_{n+ 1}, it can be stated that ║p(τ_{n+1})║_{1} ≥ ║p(τ_{ n })║_{1}.
Proof: See Appendix 1.
Theorem 2: The least squares error ${\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left(\tau \right)\u2225}_{2}^{2}$ decreases at each iteration of LARS/homotopy algorithm. Given two vectors with nonnegative elements p(τ_{ n }) and p(τ_{n+1}) that are minimizers of (14) for two consecutive breakpoints τ_{ n }and τ_{n+1}of the LARS/homotopy, with τ_{ n }> τ_{n+1}, it can be stated that ${\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}\le {\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}$.
Proof: See Appendix 2.
Algorithm 1 Proposed method
INITIALIZATION: $\mathbf{p}=0,\phantom{\rule{2.77695pt}{0ex}}{\tau}_{0}=2\underset{i}{\text{max}}{\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i},\phantom{\rule{2.77695pt}{0ex}}n=0$
J = active set = ∅, I = inactive set = J^{c}
while ≠ stopping criterion and ∃ i ∈ I such that b_{ i }> 0 do

1)
Compute the residual correlation $\mathbf{b}={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\right)$

2)
Determine the maximal components of b. These will be the nonzero elements of p(τ_{ n }) (active components).
$$J=\text{arg}\text{max}\left\{{b}_{j}\right\},\phantom{\rule{1em}{0ex}}I={J}^{c}$$ 
3)
Calculate the update direction u such that all the active components lead to an uniform decrease of the residual correlation (equiangular direction).
$${\mathbf{u}}_{J}={\left({\mathbf{A}}_{J}^{T}{\mathbf{A}}_{J}\right)}^{1}{1}_{J}$$ 
4)
Compute the step size γ such that a new element of the b becomes equal to the maximal ones (∃ i ∈ I such that b_{ i }(τ_{n+1}) = b_{j∈J}(τ_{n+1})) or one nonzero component of p becomes zero (∃ j ∈ J such that p_{ j }(τ_{n+1}) = 0).

5)
Actualize p → p + γu, τ_{n+1}= τ_{ n } 2γ, n = n + 1
end while
3. Stopping criterion: the cumulative spectrum
The definition of an appropriate stopping criterion is of paramount importance because it determines the final regularization parameter τ_{stop} and consequently the number of active positions in the solution vector. In general, larger values of τ produce sparser solutions. Nevertheless, this fact cannot be guaranteed: at some breakpoints, some entries may need to be removed from the active set.
Most of the traditional approaches exposed in the literature for choosing the regularization parameter in discrete illposed problems are based on the norm of the residual error in one way or another, e.g., discrepancy principle, crossvalidation, and the Lcurve. Nevertheless, recent publications [32, 33] suggest the use of a new parameterchoice method based on the residual spectrum. This technique is based on the evaluation of the shape of the Fourier transform of the residual error. From the best of authors' knowledge, this approach has never been used as a stopping criterion in sparse representation problems. The method exposed herein is inspired in the same idea with some slight modifications. The main difference resides in the fact that no Fourier transform needs to be computed over the residual, as it will be exposed later on, the spatial spectrum of the residual arises in a natural way when the LARS/homotopy is applied to (14). The following result is the key point of the stopping criterion proposed in this article.
Theorem 3: When the LARS/homotopy is applied to the problem (14), the residual correlation obtained at the k th iteration of the algorithm, expressed as $\mathbf{b}\left({\tau}_{k}\right)={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{k}\right)\right)$, is equivalent to the Barlett estimator applied to the residual covariance matrix ${\widehat{\mathbf{C}}}_{k}=\widehat{\mathbf{R}}\sum _{i=1}^{G}{p}_{i}\left({\tau}_{k}\right){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$. Then, the i th component of the vector of residual correlations satisfies ${\mathbf{b}}_{i}\left({\tau}_{k}\right)={\mathbf{s}}_{i}^{H}{\widehat{\mathbf{C}}}_{k}{\mathbf{s}}_{i}$.
Proof: See Appendix 3.
This theorem provides an alternative interpretation of the residual correlation at the k th iteration b (τ_{ k }) which can be seen as a residual spatial spectrum. Bearing in mind this idea and under the assumption that the noise is zeromean and spatially white the following parameterchoice method is proposed: to stop as soon as the residual correlation resembles white noise.
Under the assumption that the noise is spatially white, the power is distributed uniformly over all the angles of arrival and the spatial spectrum has to be flat. To determine whether the residual correlation corresponds to a white noise spectrum a statistical tool has to be considered. Several tests are available in the literature to test the hypothesis of white noise. Herein the metric that will be considered to see if the residual looks like noise is:
where the subindex k, with k = 0, ..., k_{stop}, denotes the k th iteration of the LARS/homotopy algorithm. The metric c_{ k }is a slight modification of the conventional normalized cumulative periodogram proposed by Barlett [34] and later by Durbin [35]. Traditionally, the cumulative periodogram has been defined for realvalued time series. In the real case, the spectrum is symmetric and only half of the spectrum needs to be computed. However, it can be easily extended to embrace complexvalued vectors as it is shown in (15). Throughout this entire document c_{ k }will be referred to as normalized cumulative spectrum (NCS).
For an ideal white noise the plot of the NCS is a straight line and resembles the cumulative distribution of a uniform distribution. Thus, any distributional test, such as the KolmogorovSmirnov (KS) test, can be considered to determine the "goodness of fit" between the cumulative spectrum and the theoretical straight line. In [34], Barlett proposed the use of the KS test which is based on the largest deviation in absolute value between the cumulative spectrum and the theoretical straight line. The KS test rejects the hypothesis of white noise whenever the maximum deviation between the cumulative spectrum and the straight line is too large. On the contrary, the cumulative spectrum is considered white noise if it lies within the KS limits. The upper and the lower KS limits, as a function of index l, are given by
where δ = 1.36 for the 95% confidence band and δ = 1.63 for the 99% band.
Notice that the NCS does not require an accurate estimation of the noise power at the receiver. Since the cumulative spectrum (15) is normalized with respect to the average power at each k th iteration, the decision metric only depends on the shape of the spatial spectrum.
The proposed stopping condition is: to stop as soon as the residual correlation resembles white noise, that is, when the NCS lies within the KS limits.
4. Numerical results
The aim of this section is to analyze the performance of the covariance fitting method proposed in this article. To carry out this objective, some simulations have been done in Matlab. Throughout the simulations, a uniform grid with 1° of resolution has been considered for all the analyzed techniques. Furthermore, a zeromean white Gaussian noise with power ${\sigma}_{w}^{2}=1$ has been considered. The generated source signals are uncorrelated and distributed as circularly symmetric i.i.d complex Gaussian variables with zero mean. Since the same power P will be considered for all the sources, throughout this entire section the signal to noise ratio (SNR) is defined by $\text{SNR}\left(\text{dB}\right)=10{\text{log}}_{10}\left(\frac{P}{{\sigma}_{w}^{2}}\right)$.
To illustrate the algorithm and the new stopping condition based on the cumulative spectrum, we have considered four uncorrelated sources located at 36°, 30°, 30°, 50° that impinge on a ULA with M = 10 sensors separated by half the wavelength. The SNR is set to 0dB and the sample covariance matrix is computed with N = 600 snapshots. Figures 1 and 2 show the evolution of the NCS and the vector of residual correlations, respectively. As it is shown in Figure 1, the algorithm is stopped after 16 iterations when the NCS lies within the KS limits of the 99% confidence band. The final solution p is shown in the Figure 3. Note that the residual spectrum of the final solution in Figure 2 is almost flat and the residual correlation resembles white noise.
Next, the probability of resolution of the covariance fitting method as a function of the SNR is investigated. With this aim we have considered two uncorrelated sources located at 36° and 30° that impinge on a ULA with M = 9 sensors. Both sources transmit with the same power and the sample covariance has been computed with N = 1000 snapshots. Figure 4 shows the results of the covariance fitting method compared to other classical estimators: MUSIC [4], Capon [2] and Normalized Capon [3]. In order to make a fair comparison between the different techniques, the number of sources of the MUSIC algorithm has been estimated with the Akaike information criterion (AIC) [7]. The curves in Figure 4 are averaged over 300 independent simulation runs. From this figure, it is clear that the proposed covariance fitting technique outperforms the other classical estimators and it is about 6dB better than the MUSIC algorithm and about 12dB better than the Normalized Capon method.
Next, the performance of the proposed method in terms of root mean square error (RMSE) is analyzed and presented in Figure 5. Two uncorrelated sources separated by Δθ = 6° that impinge on an array of M = 9 sensors were taken into account in the simulations. In this case, the positions of the sources do not correspond to the angles of the grid. With this aim, the angle of the first source θ_{1} is generated as a random variable following a uniform distribution between 80° and 80° and the angle of the second source is generated as θ_{2} = θ_{1} + Δθ. The sample covariance has been computed with 900 snapshots. Figure 5 shows the RMSE of the proposed method and MUSIC as a function of the SNR as long as the two sources are resolved with a probability equal to 1. In the case of MUSIC the determination of the number of signal sources is performed by the AIC. The two curves are based on the average of 300 independent runs. From Figure 5 it can be concluded that at low SNR the proposed method outperforms MUSIC. When the SNR increases both methods tends to exhibit the same performance.
Finally, the resolution capability of the method as a function of the number of snapshots is investigated. The scenario considered for this purpose is the following: two sources located at θ_{ 1 }= 36° and θ_{1} = 30° that impinge on a ULA with M = 9 sensors. In this case, the transmitted signals have constant modulus, which is a common situation in communications applications, ${s}_{1}\left(t\right)={e}^{j{\phi}_{1}\left(t\right)}$ and ${s}_{2}\left(t\right)={e}^{j{\phi}_{2}\left(t\right)}$. The signal phases ${\left\{{\phi}_{k}\left(t\right)\right\}}_{k=1}^{2}$ are independent and follow a uniform distribution in [0, 2π]. Figure 6 shows the probability of resolution of the proposed method and MUSIC as a function of the number of snapshots N. In this case the signal to noise ratio is fixed to 1 dB. As in the previous cases, in order to make a fair comparison between the two techniques, the number of sources of the MUSIC algorithm has been determined using AIC. The curves were obtained by averaging the results of 500 independent trials. Note that the covariance fitting method clearly outperforms MUSIC and is able to resolve the two sources with a probability greater than 95% if N ≥ 30.
5. Conclusions
A new method for finding the DoA of multiple sources that impinge on a ULA has been presented in this article. The proposed technique is based on sparse signal representation and outperforms classical direction finding algorithms, even subspace methods, in terms of RMSE and probability of resolution. The proposed technique assumes white noise and uncorrelated point sources. Furthermore, it does not require either the knowledge of the number of sources or a previous initialization.
Appendix 1: proof of Theorem 1
The LARS/homotopy provides all the breakpoints τ_{0} > τ_{1} > ⋯ > τ_{stop} ≥ 0 and the associated solutions p(τ_{0}), p(τ_{1}), ..., p(τ_{stop}) where a new component enter or leaves the support (the set of active elements) of p(τ). It can be proved that the sum of powers increases monotonically at each iteration of the algorithm. Suppose two nonnegative vectors p(τ_{ n }) and p(τ_{n+1}) that are minimizers of (14) for the regularization parameters τ_{ n }and τ_{n+1}, respectively, with τ_{ n }> τ_{n+1}≥ 0. The following inequality holds for the breakpoint τ_{ n }:
Note that the regularization parameter τ_{ n }is the same on both sides of the inequality. The expression on the righthand side of the inequality (17) is equal to ${\Vert \widehat{\mathbf{r}}\mathbf{A}\mathbf{p}({\tau}_{n+1})\Vert}_{2}^{2}+{\tau}_{n+1}\Vert \mathbf{p}{({\tau}_{n+1})}_{1}\Vert +({\tau}_{n}{\tau}_{n+1}){\Vert \mathbf{p}({\tau}_{n+1})\Vert}_{1}$. Therefore, the expression (17) can be rewritten as:
By using minimization properties, if p (τ_{n+1}) is the minimizer of (14) for the regularization parameter τ_{n+1}. Then, next inequality holds:
Note that the regularization parameter τ_{n+1}is the same on both sides of the inequality. Bearing in mind (19) and (18), it is straightforward to obtain
If the term τ_{ n }║p (τ_{ n })║_{1} is added and subtracted from expression on the righthand side of the inequality (20), the next expression is obtained
From this expression we can conclude that (τ_{ n } τ_{n+1}) (║p(τ_{n+1})║_{1}  ║p(τ_{ n })║_{1}) ≥ 0. As τ_{ n }> τ_{n+1}≥ 0, then ║p(τ_{n+1})║_{1}  ║p(τ_{ n })║_{1} ≥ 0. Finally, we obtain ║p(τ_{n+1})║_{1} ≥ ║p(τ_{ n })║_{1}.
Appendix 2: proof of Theorem 2
If p(τ_{n+1}) is a vector with nonnegative components that minimizes the problem (14) for τ_{n+ 1}> 0, then the following inequality is fulfilled:
which can be rewritten as:
Since τ_{n+1}> 0 and ║p(τ_{n+1})║_{1}  ║p(τ_{ n })║_{1} ≥ 0, as it was proved in Theorem 1, the following inequality is fulfilled ${\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}{\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}\ge 0$. Finally, we obtain ${\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}\ge {\u2225\widehat{\mathbf{r}}\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}$.
Appendix 3: an alternative interpretation of the residual
The residual correlation b that appears when the LARS/homotopy algorithm is applied to the problem (14), has a clear physical interpretation.
Bearing in mind (11), the residual correlation b when LARS/homotopy is applied to (14) takes the form
which can be rewritten in terms of complex matrices Ã exposed in (10) and the sample covariance $\widehat{\mathbf{R}}$.
The term Ãp(τ) can be expressed as
Since ${\mathbf{s}}_{i}^{*}\otimes {\mathbf{s}}_{i}=\text{vec}\text{(}{\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H})$, then $\stackrel{\u0303}{\mathbf{A}}\mathbf{p}\text{(}\tau \text{)=vec}\left\{\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}\right\}$
Applying from (26) to (25) the residual correlation at breakpoint τ yields:
Bearing in mind the matrix Ã presented in (10), the last expression can be rewritten as:
being ${\widehat{\mathbf{C}}}_{\tau}=\widehat{\mathbf{R}}\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$.
The i th component of b(τ) is real because it fulfills ${\mathbf{s}}_{1}^{H}{\widehat{\mathbf{C}}}_{\tau}{\mathbf{s}}_{1}={\mathbf{s}}_{1}^{H}{\widehat{\mathbf{C}}}_{\tau}^{H}{\mathbf{s}}_{1}$. Therefore, the residual correlation yields:
This result provides an alternative interpretation of the residual correlation. At each breakpoint τ, the corresponding residual b(τ) can be seen as the Barlett estimator applied to the residual covariance matrix ${\widehat{\mathbf{C}}}_{\tau}=\widehat{\mathbf{R}}\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$.
References
 1.
Trees HLV: Detection, Estimation, and Modulation Theory, Part IV: Optimum Array Processing. John Wiley & Sons, New York, USA; 2002.
 2.
Capon J: Highresolution frequencywavenumber spectrum analysis. Proc IEEE 1969, 57(8):14081418.
 3.
Lagunas MA, Gasull A: An improved maximum likelihood method for power spectral density estimation. IEEE Trans Acoustics Speech Signal Process 1984, ASSP32(1):170173.
 4.
Schmidt R: Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 1986, 34(3):276280. 10.1109/TAP.1986.1143830
 5.
Stoica P, Nehorai A: Performance study of conditional and unconditional directionofarrival estimation. IEEE Trans Acoustics Speech Signal Process 1990, 38(10):17831795. 10.1109/29.60109
 6.
Högbom J: Aperture synthesis with an nonregular distribution of interferomer baselines. Astron Astrophys 1974, 15: 417426.
 7.
Stoica P, Moses R: Spectral Analysis of Signals. Prentice Hall, NJ, USA; 2005.
 8.
Tuncer TE, Friedlander B: Classical and Modern DirectionofArrival Estimation. Elsevier Academic Press, Burlington, USA; 2009.
 9.
Charbonnier P, BlancFeraud L, Aubert G, Barlaud M: Deterministic edgepreserving regularization in computed imaging. IEEE Trans Image Process 1997, 6: 298311. 10.1109/83.551699
 10.
Zou H, Hastie T: Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B 2005, 67(2):301320. 10.1111/j.14679868.2005.00503.x
 11.
Donoho DL: Compressed sensing. IEEE Trans Inf Theory 2006, 52(4):12891306.
 12.
Boyd S, Vandenberghe L: Convex Optimization. Cambridge University Press, Cambridge; 2004.
 13.
Donoho DL, Elad M: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc Nat Aca Sci 2003, 100(5):21972202. 10.1073/pnas.0437847100
 14.
Donoho DL: For most large underdetermined systems of equations, the minimal l1norm nearsolution approximates the sparsest nearsolution. Commun Pure Appl Math 2006, 59(7):907934. 10.1002/cpa.20131
 15.
Tibshirani R: Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B 1996, 58: 267288.
 16.
Chen SS, Donoho DL, Saunders MA: Atomic decomposition by basis pursuit. SIAM J Sci Comput 1998, 20: 3361. 10.1137/S1064827596304010
 17.
Donoho DL, Tsaig Y: Fast solution of l1norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 2008, 54: 47894812.
 18.
Osborne MR, Presnell B, Turlach BA: A new approach to variable selection in least squares problems. IMA J Numer Anal 2000, 20(3):389403. 10.1093/imanum/20.3.389
 19.
Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression. Annals Stat 2004, 32: 407499. 10.1214/009053604000000067
 20.
Gorodnitsky IF, Rao BD: Sparse signal reconstruction from limited data using focuss: a reweighted minimum norm algorithm. IEEE Trans Signal Process 1997, 45(3):600616. 10.1109/78.558475
 21.
Fuchs J: Linear programming in spectral estimation: application to array processing. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). Atlanta, GA; 1996:31613164. 6
 22.
Cotter SF, Rao BD, KE , Kreutzdelgado K: Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans Signal Process 2005, 53(7):24772488.
 23.
Yardibi T, Li J, Stoica P, Xue M, Baggeroboer AB: Source localization and sensing: A nonparametric iterative adaptive approach based on weighted least squares. IEEE Trans Aerospace Electron Syst 2010, 46(1):425443.
 24.
Malioutov DM, Çetin M, Willsky AS: A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans Signal Process 2005, 53: 30103022.
 25.
Eldar YC, Rauhut H: Saverage case analysis of multichannel sparse recovery using convex relaxation. IIEEE Trans Inf Theory 2010, 56(1):505519.
 26.
Yardibi T, Li J, Stoica P, Cattafesta LN: Sparsity constrained deconvolution approaches for acoustic source mapping. J Acoust Soc Am 2008, 123(5):26312642. 10.1121/1.2896754
 27.
Picard JS, Weiss AJ: Direction finding of multiple emitters by spatial sparsity and linear programming. In International conference on Communications and information technologies (ISCIT). Incheon, Korea; 2009:12581262.
 28.
Stoica P, Babu P, Li J: Spice: A sparse covariancebased estimation method for array processing. IEEE Trans Signal Process 2011, 59(2):629638.
 29.
Liu Z, Huang Z, Zhou Y: Directionofarrival estimation of wideband signals via covariance matrix sparse representation. IEEE Trans Signal Process 2011, 59(9):42564270.
 30.
Stoica P, Babu P, Li J: A sparse covariancebased method for direction of arrival estimation. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic; 2011:28442847.
 31.
Mørup M, Madsen KH, Hansen LK: Approximate L_{0}constrained nonnegative matrix and tensor factorization. In International Conference on Circuits and Systems (ISCAS). Seattle, WA; 2008:13281331.
 32.
Rust BW, O'Leary DP: Residual periodograms for choosing regularization parameters for illposed problems. Inverse Probl 2008, 24(3):130.
 33.
Hansen P, Kilmer M, Kjeldsen R: Exploiting residual information in the parameter choice for discrete illposed problems. BIT Numer Math 2006, 46(1):4159. 10.1007/s1054300600427
 34.
Barlett M: An introduction to Stochastic Processes. Cambridge University Press, Cambridge; 1966.
 35.
Durbin J: Tests for serial correlation in regression analysis based on the periodogram of leastsquares residuals. Biometrika 1969, 56(1):115. 10.1093/biomet/56.1.1
Acknowledgements
This study was partially supported by the Spanish Ministry of Economy and Competitiveness under projects TEC201129006C0301 (GRE3NPHY) and TEC201129006C0302 (GRE3NLINKMAC) and by the Catalan Government under grant 2009 SGR 891.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Blanco, L., Nájar, M. Sparse covariance fitting for direction of arrival estimation. EURASIP J. Adv. Signal Process. 2012, 111 (2012). https://doi.org/10.1186/168761802012111
Received:
Accepted:
Published:
Keywords
 Sparse Representation
 Spatial Spectrum
 Residual Correlation
 Music Algorithm
 Array Signal Processing