# Sparse covariance fitting for direction of arrival estimation

- Luis Blanco
^{1}Email author and - Montse Nájar
^{1, 2}

**2012**:111

https://doi.org/10.1186/1687-6180-2012-111

© Blanco and Nájar; licensee Springer. 2012

**Received: **3 October 2011

**Accepted: **17 May 2012

**Published: **17 May 2012

## Abstract

This article proposes a new algorithm for finding the angles of arrival of multiple uncorrelated sources impinging on a uniform linear array of sensors. The method is based on sparse signal representation and does not require either the knowledge of the number of the sources or a previous initialization. The proposed technique considers a covariance matrix model based on overcomplete basis representation and tries to fit the unknown signal powers to the sample covariance matrix. Sparsity is enforced by means of a *l*_{1}-norm penalty. The final problem is reduced to an objective function with a non-negative constraint that can be solved efficiently using the LARS/homotopy algorithm. The method described herein is able to provide high resolution with a low computational burden. It proceeds in an iterative fashion solving at each iteration a small linear system of equations until a stopping condition is fulfilled. The proposed stopping criterion is based on the residual spectrum and arises in a natural way when the LARS/homotopy is applied to the considered objective function.

## 1. Introduction

### Brief summary of classical direction of arrival estimators

The estimation of the directions of arrival (DoA) of multiple sources using sensor arrays is an old problem and plays a key role in array signal processing. During the last five decades, a plethora of methods have been proposed for finding the DoA of different narrowband signals impinging on a passive array of sensors. These methods can be divided into two categories: parametric and nonparametric estimators.

Nonparametric methods include beamforming and subspace methods. The former relies on scanning the power from different locations. Exponents of this category are conventional beam-former [1] and Capon's method [2]. Conventional beamformer, a.k.a. Barlett beamformer, suffers from poor spatial resolution and cannot resolve sources within the Rayleigh resolution limit [1]. As it is well known, this lack of resolution can be mitigated only by increasing the number of sensors of the array because improving the SNR or increasing the number of time observations does not increase the resolution. On the contrary, Capon's minimum variance method can resolve sources within the Rayleigh cell if the SNR is high enough, the number of observations is sufficient and the sources are not correlated. Unfortunately, in practice, Capon's power profile is strongly dependent on the beamwidth, which, on its turn, depends on the explored direction and in some scenarios this could lead to a resolution loss. To counteract this, an estimator of the spectral density obtained from the Capon's power estimate was derived in [3] achieving better resolution properties. Herein this method will be referred as Normalized Capon. Another well-known category of nonparametric DoA estimators is the one composed by subspace methods. These algorithms are able to provide high-resolution and outperform beamforming methods. The most prominent member of this family is MUltiple SIgnals Classification (MUSIC) [4], it relies on an appropriate separation between signal and noise subspaces. This characterization is costly and needs a previous estimation of the number of incoming signals.

Parametric methods based on the maximum likelihood criterion [5] exhibit a good performance at expenses of a high computational cost. These techniques estimate the parameters of a given model instead of searching the maxima of the spatial spectrum. Unfortunately, they often lead to difficult multidimensional optimization problems with a heavy computational burden.

An interesting algorithm that lies in between the class of parametric and nonparametric techniques is the CLEAN algorithm. This method was first introduced by Högbom [6] and have applications in several areas: array signal processing, image processing, radar and astronomy. Recently, Stoica and Moses throw light on the semiparametric nature of the algorithm [7]. In broad outline, it operates in a recursive manner subtracting at each iteration a fraction of the strongest signal from the observed spatial spectrum.

For those readers interested on a more detailed and comprehensive summary of angle of arrival estimators, the authors refer them to [1, 8].

### Sparse signal representation

Sparse representation of signals over redundant dictionaries is a hot topic that has attracted the interest of researchers in many fields during the last decade, such as image reconstruction [9], variable selection [10], and compressed sensing [11]. The most basic problem aims to find the sparsest vector **x** such that **y** = **Ax**, where **y** is the measured vector and **A** is known. This matrix **A** is called dictionary and is overcomplete, i.e., it has more columns that rows. As a consequence, without imposing a sparsity prior on **x**, the set of equations **y** = **Ax** is underdetermined and admits many solutions. Formally, the objective is to minimize ∥**x**∥_{0} subject to **y** = **Ax**, where ∥·∥_{0} denotes the *l*_{0}-norm [12]. This is an intractable NP-hard combinatorial problem in general [13]. Fortunately, if the vector is sufficiently sparse the problem can be relaxed replacing the *l*_{0}-norm by a *l*_{1}-norm, defined as ∥**x**∥_{1} = ∑_{
i
}|*x*_{
i
}|, leading to a convex optimization problem with a lower computational burden. The conditions that ensure the uniqueness of the solution were studied in [14].

*ε*≥ 0:

**x**∥

_{1}≤

*β*with

*β*≥ 0 promotes sparsity. This formulation is known as Least Absolute Shrinkage and Selector Operator (LASSO) and was originally proposed by Tibishirani [15]. The augmented formulation of (2) is well-known in signal processing and is commonly called Basis Pursuit Denoising (BPDN) [16]:

The three formulations (1)-(3) are equivalent in the sense that the sets of solutions are the same for all the possible choices of the parameters *τ*, *ε*, *β*. To go from one formulation to the other we only need a proper correspondence of the parameters. Nevertheless, even if the mapping between the regularization parameters exists, this correspondence is not trivial and it is possibly non-linear and discontinuous [17].

**x**is real, the LASSO problem (2), or its equivalent formulation (3), can be solved with standard quadratic programming techniques [15]. However, these techniques are time demanding and faster methods are preferred. Osborne et al. [18] and later Efron et al. [19] proposed an efficient algorithm for solving the LASSO. This algorithm is known as "homotopy method" [18] or LARS (Least Angle Regression) [19]. In this article this technique will be referred to as LARS/homotopy. A variant of the traditional LASSO problem, that will be specially useful in the covariance fitting that will be addressed later on, is the so-called positive LASSO. In this case, an additional constraint over the entries of the vector

**x**is considered in the LASSO problem to enforce the components of the vector to be non-negative:

The positive LASSO problem (4) can be solved in a efficient way introducing some slight modifications in the traditional LARS/homotopy. This approach was proposed by Efron et al. [19], but is not as widely known as the traditional one. Briefly, the algorithm starts with a very large value of *τ*, and gradually decreases the regularization parameter, until the desired value is attained. As *τ* evolves, the optimal solution for a given *τ*, **x**(*τ*), moves on a piecewise affine path. As the minimizer **x**(*τ*) is a piecewise-linear function of *τ* we only need to find the critical regularization parameters *τ*_{0}, *τ*_{1}, *τ*_{2}, ..., *τ*_{stop} where the slope changes [17], these values are the so-called breakpoints. The algorithm starts with **x** = **0** and operates in an iterative fashion calculating the critical regularization parameters *τ*_{0} > *τ*_{1} *> ⋯ > τ*_{stop} ≥ 0 and the associated minimizers **x** (*τ*_{0}), **x** (*τ*_{1}), ..., **x** (*τ*_{stop}) where an inactive component of **x** becomes positive or an active element becomes equal to zero. Normally, the number of active components increases as *τ* decreases. Nevertheless, this fact cannot be guaranteed: at some breakpoints, some entries may need to be removed from the active set.

### Sparse representation in source location

Although there are some pioneering studies carried out in the late nineties, e.g., [20, 21], the application of sparse representation to direction finding has gained noticeable interest during the last decade. Recent techniques based on sparse representation show promising results that outperform conventional high-resolution methods such as MUSIC. In [20] a recursive weighted minimum-norm algorithm called FOCUSS was presented. This algorithm considers a single snapshot and requires a proper initialization. The extension to the multiple-snapshot case was carried out in [22] and it is known as M-FOCUSS. Unfortunately, as it is described in [23], this technique is computationally expensive and requires the tuning of two hyperparameters that can affect the performance of the method significantly.

If multiple snapshots can be collected in an array of sensors, they can be used to improve the estimation of the angles of arrival. Several approaches for summarizing multiple observations have been proposed in the literature. The first of these approaches is the so-called *l*_{1}-SVD presented by Malioutov et al. [24]. This method is based on the application of a singular value decomposition (SVD) over the received data matrix and leads to a second-order cone optimization problem. This algorithm requires an initial estimation of the number of sources. Although it does not have to be exact, a small error is needed for a good performance. An underestimation or an overestimation of the number of sources provokes a degradation in the performance of the method. Even if the effect of an incorrect determination of the number of sources has no catastrophic consequences, such as the disappearance of the sources in MUSIC, the performance of the algorithm can be considerably degraded. Another important drawback is that *l*_{1}-SVD depends on a user-defined parameter which is not trivial to select. An alternative approach to summarize multiple snapshots is the use of mixed norms over multiple measurement vectors (MMV) that share the same sparsity pattern [22, 25]. This formulation is useful in array signal processing, specially, when the number of snapshots is smaller than the number of sensors. If we assume that the snapshots are collected during the coherence time of the angles, the position of the sources keep identical among the snapshots; the only difference between them resides in the amplitudes of the impinging rays. Basically, this approach, which is out of the scope of the article, tries to combine multiple snapshots using the *l*_{2} norm and to promote sparsity on the spatial dimension by means of the *l*_{1}-norm. Unfortunately, this joint optimization problem is complex and requires a high computational burden. When the number of snapshots increases, the computational load becomes too high for practical real-time source location. Recently, new techniques based on a covariance matrix fitting approach have been considered to summarize multiple snapshots, e.g., [26–28]. Basically, these methods try to fit the covariance matrix to a certain model. The main advantage of covariance fitting approaches is that they lead to convex optimization problems with an affordable computational burden. Moreover, they do not require a previous estimation of the number of incoming sources or heavy computations such as SVD of the data. It should be also pointed out that as these methods work directly with the covariance matrix less storage space is needed because they do not need to store huge amounts of time data. The technique proposed by Yardibi et al. [26] leads to an optimization problem that can be solved efficiently using Quadratic Programming (QP). In the case of the approach exposed by Picard and Weiss [27], the solution is obtained by means of linear programming (LP). The main drawback of this last method is that it depends on a user defined parameter that is difficult to adjust. In the same way, Liu et al. [29] propose a new method which is based on a hyperparameter that has been heuristically determined. On the contrary, Stoica et al. [28, 30] propose an iterative algorithm named SParse Iterative Covariance-based Estimation approach (SPICE), that can be used in noisy data scenarios without the need for choosing any hyperparameter. The major drawback of this method is that it needs to be initialized.

### Article contribution

This article proposes a simple, fast, and accurate algorithm for finding the angles of arrival of multiple sources impinging on a uniform linear array (ULA). In contrast to other methods in the literature, the proposed technique does not depend on user-defined parameters and does not require either the knowledge of the number of sources or initialization. It assumes white noise and that the point sources are uncorrelated.

The method considers a structured covariance matrix model based on over-complete basis representation and tries to fit the unknown signal powers of the model to the sample covariance. Sparsity is promoted by means of a *l*_{1}-norm penalty imposed on the powers. The final problem is reduced to an objective function with a non-negative constraint that can be solved efficiently using the LARS/homotopy algorithm, which is, in general, faster than QP [19] and LP [17]. The method described herein proceeds in an iterative manner solving at each iteration a small linear system of equations until a stopping condition is fulfilled. The proposed stopping criterion is based on the residual spectrum and arises in a natural way when the LARS/homotopy is applied to the considered objective function. From the best of our knowledge this stopping condition has never been considered before in sparse signal representation.

## 2. The proposed method: sparse covariance fitting for source location

*L*narrowband signals ${\left\{{x}_{i}\left[k\right]\right\}}_{i=1}^{L}$ impinging on an array of

*M*sensors. The

*k*th observation can be expressed as:

where **x** [*k*] = [*x*_{1} [*k*] ⋯ *x*_{
L
}[*k*]]^{
T
}is the vector of unknown source signals, the matrix **S** (θ) ∈ ℂ^{
M
}^{×} ^{
L
}is the collection of the steering vectors corresponding to the angles of arrival of the sources θ= [*θ*_{1}, ..., *θ*_{
L
}]^{
T
}, that is, **S** (θ) = [**s** (*θ*_{1}) ⋯ **s** (*θ*_{
L
})], and **w** [*k*] ∈ ℂ^{
M
}^{× 1} denotes a zero-mean additive noise, spatially, and temporally white, independent of the sources with covariance matrix ${\sigma}_{w}^{2}{\mathbf{I}}_{M}$, being **I**_{
M
}the identity matrix of size *M*.

**P**=

*E*{

**x**[

*k*]

**x**

^{ H }[

*k*]}. The classical direction finding problem can be reformulated as a sparse representation problem. With this aim, let us consider an exploration grid of

*G*equally spaced angles

**Φ**= {

*ϕ*

_{1}

*, ..., ϕ*

_{ G }} with

*G*>>

*M*and

*G*>>

*L*. If the set of angles of arrival of the impinging signals θ is a subset of

**Φ**, the received signal model (5) can be rewritten in terms of an overcomplete matrix

**S**

_{ G }constructed by the horizontal concatenation of the steering vectors corresponding to all the potential source locations.

where **S**_{
G
}∈ ℂ^{
M
}^{×} ^{
G
}contains the steering vectors corresponding to the angles of the grid **S**_{
G
}= [**s**_{1} ⋯ **s**_{
G
}], with **s**_{
i
}= **s**(*ϕ*_{
i
}), and **x**_{
G
}[*k*] ∈ ℂ^{
G
}^{× 1} is a sparse vector. The non-zero entries of **x**_{
G
}[*k*] are the positions that corresponds to the source locations. In other words, the *n* th element of **x**_{
G
}[*k*] is different from zero and equal to the *q* th component of the vector **x** [*k*] defined in (5), denoted by *x*_{
q
}[*k*], if and only if *ϕ*_{
n
}*= θ*_{
q
}. It is important to point out that the matrix **S**_{
G
}is known and does not depend on the source locations.

The assumption that the set of angles of arrival is a subset of **Φ** is only required for the derivation of the algorithm. Obviously, it does not always hold. Actually, this is a common assumption in many exploration methods in the direction finding literature (e.g., Capon, Normalized Capon, MUSIC, etc). In the case that θ⊈ **Φ**, the contribution of the sources leaks into the neighboring elements of the grid.

**S**

_{ G }and takes the form:

with $\mathbf{D}=E\left\{{\mathbf{x}}_{G}\left[k\right]{\mathbf{x}}_{G}^{H}\left[k\right]\right\}$. An important remark is that **D** ∈ ℂ^{
G
}^{×} ^{
G
}is different to the source covariance matrix **P** ∈ ℂ^{
L
}^{×} ^{
L
}introduced in (6). Actually, since only *L*^{2} entries out of *G*^{2} can differ from zero, **D** is a sparse matrix.

A common assumption in many direction finding problems is that sources are uncorrelated. Under this assumption, the matrix **D** is a diagonal matrix with only L non-zero entries given by diag (**D**) = [*p*_{1} ⋯ *p*_{
G
}]^{
T
}= **p**, being $\mathbf{p}\in {\mathbb{R}}_{+}^{G\times 1}$.

Note that **p** is a *G* × 1 sparse vector with non-zero entries at positions corresponding to source locations. Furthermore, the elements of **p** are real-valued and non-negative.

where ⊗ and vec {·} denote the Kronecker product and the vectorization operator. It should be remarked that the result of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}\in {\u2102}^{{M}^{2}\times {G}^{2}}$.

**D**is a diagonal matrix because the sources are uncorrelated, only

*G*columns of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}$ have to be taken into account. Using this fact, the dimensionality of the problem can be reduced. In this way, it is straightforward to rewrite the expression (9) in terms of vector

**p**just removing some columns of ${\mathbf{S}}_{G}^{*}\otimes {\mathbf{S}}_{G}$:

with $\stackrel{\u0303}{\mathbf{A}}=\left[{\mathbf{s}}_{1}^{*}\otimes {\mathbf{s}}_{1}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{s}}_{2}^{*}\otimes {\mathbf{s}}_{2}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{s}}_{G}^{*}\otimes {\mathbf{s}}_{G}\right]$. Note that $\stackrel{\u0303}{\mathbf{A}}\in {\u2102}^{{M}^{2}\times G}$.

**I**

_{ M }} denotes the vectorization of the identity matrix of dimensions M × M and ${0}_{{M}^{2}\times 1}$ is a vector of zeros of size

*M*

^{2}× 1. More compactly, the expression (11) can be rewritten as:

with obvious definitions for **r**, **A**, **p**, and **n**. Note that **r** and $\mathbf{n}\in {\mathbb{R}}^{2{M}^{2}\times 1}$ and $\mathbf{A}\in {\mathbb{R}}^{2{M}^{2}\times G}$.

**p**is the following constrained least squares problem:

Where $\widehat{\mathbf{r}}=\left[\begin{array}{c}\hfill \text{Re}\left\{\text{vec}\left[\widehat{\mathbf{R}}\right]\right\}\hfill \\ \hfill \text{Im}\left\{\text{vec}\left[\widehat{\mathbf{R}}\right]\right\}\hfill \end{array}\right]$.

Note that (13) is positive LASSO problem. The main idea behind (13) is to fit the unknown powers to the model such that the solution is sparse. The method tries to minimize the residual, or in other words, tries to maintain the fidelity of the sparse representation with the received data subject to a non-negative constraint on the powers and $\sum _{j=1}^{G}{p}_{i}\le \beta $. This last constraint promotes sparsity, as it was exposed in (2), but also imposes a bound in the received signal power. Unfortunately, the parameter *β* is unknown and has to be estimated. Even worse, the solution of (13) is very sensitive to the parameter *β*, a little error in the estimation of the parameter can lead to a wrong solution vector.

The problems (13) and (14) are equivalent in the sense that the path of solutions of (13) parametrized by a positive *β* matches with the solution path (14) as *τ* varies. To go from one formulation to the other one we need a proper correspondence between the parameters.

The problem (14) can be solved in an efficient way with the LARS/homotopy algorithm for positive LASSO. The method operates in an iterative fashion computing the critical regularization parameters *τ*_{0} > *τ*_{1} *>* ⋯ *> τ*_{stop} ≥ 0 and the associated minimizers **p** (*τ*_{0}), **p** (*τ*_{1}), ..., **p** (*τ*_{stop}), where an inactive component of **p** becomes positive or an active element becomes equal to zero. Let us remark that there is only one new candidate to enter or leave the active set at each iteration (this is the "one at a time condition" described by Efron et al. [19]).

The algorithm is based on the computation of the so-called vector of residual correlations, or just residual correlation, $\mathbf{b}\left(\tau \right)={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left(\tau \right)\right)$ at each iteration. The method starts with **p** = **0** which is the solution of (14) for all the $\tau \ge {\tau}_{0}=2\underset{i}{\text{max}}{\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i}$, being ${\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i}$ the *i* th component of the vector ${\mathbf{A}}^{T}\widehat{\mathbf{r}}$, and proceeds in an iterative manner solving reduced-order linear systems. The whole algorithm is summarized in Algorithm 1 (see [19, 31] for further details). This iterative procedure must be halted when a stopping condition is satisfied. This stopping criterion, which is the main contribution of this article, will be described later in Section 3.

It should be pointed out that the least squares error of the covariance fitting method exposed in (14) decreases at each iteration of the LARS/homotopy algorithm. This result is justified by the next two theorems.

*Theorem 1:* The sum of the powers increases monotonically at each iteration of the algorithm. Given two vectors with non-negative elements **p**(*τ*_{n+1}) and **p**(*τ*_{
n
}) that are minimizers of (14) for two breakpoints *τ*_{n+1}and *τ*_{
n
}, respectively, with *τ*_{
n
}> *τ*_{n+ 1}, it can be stated that ║**p**(*τ*_{n+1})║_{1} ≥ ║**p**(*τ*_{
n
})║_{1}.

*Proof:* See Appendix 1.

*Theorem 2:* The least squares error ${\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left(\tau \right)\u2225}_{2}^{2}$ decreases at each iteration of LARS/homotopy algorithm. Given two vectors with non-negative elements **p**(*τ*_{
n
}) and **p**(*τ*_{n+1}) that are minimizers of (14) for two consecutive breakpoints *τ*_{
n
}and *τ*_{n+1}of the LARS/homotopy, with *τ*_{
n
}> *τ*_{n+1}, it can be stated that ${\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}\le {\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}$.

*Proof:* See Appendix 2.

**Algorithm 1** Proposed method

INITIALIZATION: $\mathbf{p}=0,\phantom{\rule{2.77695pt}{0ex}}{\tau}_{0}=2\underset{i}{\text{max}}{\left({\mathbf{A}}^{T}\widehat{\mathbf{r}}\right)}_{i},\phantom{\rule{2.77695pt}{0ex}}n=0$

*J = active set* = ∅*, I = inactive set = J*^{
c
}

**while**≠ stopping criterion and ∃

*i*∈

*I*such that

*b*

_{ i }> 0

**do**

- 1)
Compute the residual correlation $\mathbf{b}={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\right)$

- 2)Determine the maximal components of
**b**. These will be the non-zero elements of**p**(*τ*_{ n }) (active components).$J=\text{arg}\text{max}\left\{{b}_{j}\right\},\phantom{\rule{1em}{0ex}}I={J}^{c}$ - 3)Calculate the update direction
**u**such that all the active components lead to an uniform decrease of the residual correlation (equiangular direction).${\mathbf{u}}_{J}={\left({\mathbf{A}}_{J}^{T}{\mathbf{A}}_{J}\right)}^{-1}{1}_{J}$ - 4)
Compute the step size γ such that a new element of the

**b**becomes equal to the maximal ones (∃*i*∈*I*such that*b*_{ i }(*τ*_{n+1}) =*b*_{j∈J}(τ_{n+1})) or one non-zero component of**p**becomes zero (∃*j*∈*J*such that**p**_{ j }(τ_{n+1}) = 0). - 5)
Actualize

**p**→**p**+ γ**u**,*τ*_{n+1}*= τ*_{ n }- 2γ,*n = n*+ 1

**end while**

## 3. Stopping criterion: the cumulative spectrum

The definition of an appropriate stopping criterion is of paramount importance because it determines the final regularization parameter *τ*_{stop} and consequently the number of active positions in the solution vector. In general, larger values of *τ* produce sparser solutions. Nevertheless, this fact cannot be guaranteed: at some breakpoints, some entries may need to be removed from the active set.

Most of the traditional approaches exposed in the literature for choosing the regularization parameter in discrete ill-posed problems are based on the norm of the residual error in one way or another, e.g., discrepancy principle, cross-validation, and the L-curve. Nevertheless, recent publications [32, 33] suggest the use of a new parameter-choice method based on the residual spectrum. This technique is based on the evaluation of the shape of the Fourier transform of the residual error. From the best of authors' knowledge, this approach has never been used as a stopping criterion in sparse representation problems. The method exposed herein is inspired in the same idea with some slight modifications. The main difference resides in the fact that no Fourier transform needs to be computed over the residual, as it will be exposed later on, the spatial spectrum of the residual arises in a natural way when the LARS/homotopy is applied to (14). The following result is the key point of the stopping criterion proposed in this article.

*Theorem 3:* When the LARS/homotopy is applied to the problem (14), the residual correlation obtained at the *k* th iteration of the algorithm, expressed as $\mathbf{b}\left({\tau}_{k}\right)={\mathbf{A}}^{T}\left(\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{k}\right)\right)$, is equivalent to the Barlett estimator applied to the residual covariance matrix ${\widehat{\mathbf{C}}}_{k}=\widehat{\mathbf{R}}-\sum _{i=1}^{G}{p}_{i}\left({\tau}_{k}\right){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$. Then, the *i* th component of the vector of residual correlations satisfies ${\mathbf{b}}_{i}\left({\tau}_{k}\right)={\mathbf{s}}_{i}^{H}{\widehat{\mathbf{C}}}_{k}{\mathbf{s}}_{i}$.

*Proof:* See Appendix 3.

This theorem provides an alternative interpretation of the residual correlation at the *k* th iteration **b** (*τ*_{
k
}) which can be seen as a residual spatial spectrum. Bearing in mind this idea and under the assumption that the noise is zero-mean and spatially white the following parameter-choice method is proposed: to stop as soon as the residual correlation resembles white noise.

where the subindex *k*, with *k* = 0, ..., *k*_{stop}, denotes the *k* th iteration of the LARS/homotopy algorithm. The metric *c*_{
k
}is a slight modification of the conventional normalized cumulative periodogram proposed by Barlett [34] and later by Durbin [35]. Traditionally, the cumulative periodogram has been defined for real-valued time series. In the real case, the spectrum is symmetric and only half of the spectrum needs to be computed. However, it can be easily extended to embrace complex-valued vectors as it is shown in (15). Throughout this entire document *c*_{
k
}will be referred to as normalized cumulative spectrum (NCS).

*l*, are given by

where *δ* = 1.36 for the 95% confidence band and *δ* = 1.63 for the 99% band.

Notice that the NCS does not require an accurate estimation of the noise power at the receiver. Since the cumulative spectrum (15) is normalized with respect to the average power at each *k* th iteration, the decision metric only depends on the shape of the spatial spectrum.

The proposed stopping condition is: to stop as soon as the residual correlation resembles white noise, that is, when the NCS lies within the K-S limits.

## 4. Numerical results

The aim of this section is to analyze the performance of the covariance fitting method proposed in this article. To carry out this objective, some simulations have been done in Matlab. Throughout the simulations, a uniform grid with 1° of resolution has been considered for all the analyzed techniques. Furthermore, a zero-mean white Gaussian noise with power ${\sigma}_{w}^{2}=1$ has been considered. The generated source signals are uncorrelated and distributed as circularly symmetric i.i.d complex Gaussian variables with zero mean. Since the same power *P* will be considered for all the sources, throughout this entire section the signal to noise ratio (SNR) is defined by $\text{SNR}\left(\text{dB}\right)=10{\text{log}}_{10}\left(\frac{P}{{\sigma}_{w}^{2}}\right)$.

*M*= 10 sensors separated by half the wavelength. The SNR is set to 0dB and the sample covariance matrix is computed with

*N*= 600 snapshots. Figures 1 and 2 show the evolution of the NCS and the vector of residual correlations, respectively. As it is shown in Figure 1, the algorithm is stopped after 16 iterations when the NCS lies within the K-S limits of the 99% confidence band. The final solution

**p**is shown in the Figure 3. Note that the residual spectrum of the final solution in Figure 2 is almost flat and the residual correlation resembles white noise.

*M*= 9 sensors. Both sources transmit with the same power and the sample covariance has been computed with

*N*= 1000 snapshots. Figure 4 shows the results of the covariance fitting method compared to other classical estimators: MUSIC [4], Capon [2] and Normalized Capon [3]. In order to make a fair comparison between the different techniques, the number of sources of the MUSIC algorithm has been estimated with the Akaike information criterion (AIC) [7]. The curves in Figure 4 are averaged over 300 independent simulation runs. From this figure, it is clear that the proposed covariance fitting technique outperforms the other classical estimators and it is about 6dB better than the MUSIC algorithm and about 12dB better than the Normalized Capon method.

*θ*= 6° that impinge on an array of

*M*= 9 sensors were taken into account in the simulations. In this case, the positions of the sources do not correspond to the angles of the grid. With this aim, the angle of the first source

*θ*

_{1}is generated as a random variable following a uniform distribution between -80° and 80° and the angle of the second source is generated as

*θ*

_{2}

*= θ*

_{1}+ Δ

*θ*. The sample covariance has been computed with 900 snapshots. Figure 5 shows the RMSE of the proposed method and MUSIC as a function of the SNR as long as the two sources are resolved with a probability equal to 1. In the case of MUSIC the determination of the number of signal sources is performed by the AIC. The two curves are based on the average of 300 independent runs. From Figure 5 it can be concluded that at low SNR the proposed method outperforms MUSIC. When the SNR increases both methods tends to exhibit the same performance.

*θ*

_{ 1 }= -36° and

*θ*

_{1}= -30° that impinge on a ULA with

*M*= 9 sensors. In this case, the transmitted signals have constant modulus, which is a common situation in communications applications, ${s}_{1}\left(t\right)={e}^{j{\phi}_{1}\left(t\right)}$ and ${s}_{2}\left(t\right)={e}^{j{\phi}_{2}\left(t\right)}$. The signal phases ${\left\{{\phi}_{k}\left(t\right)\right\}}_{k=1}^{2}$ are independent and follow a uniform distribution in [0, 2

*π*]. Figure 6 shows the probability of resolution of the proposed method and MUSIC as a function of the number of snapshots

*N*. In this case the signal to noise ratio is fixed to 1 dB. As in the previous cases, in order to make a fair comparison between the two techniques, the number of sources of the MUSIC algorithm has been determined using AIC. The curves were obtained by averaging the results of 500 independent trials. Note that the covariance fitting method clearly outperforms MUSIC and is able to resolve the two sources with a probability greater than 95% if

*N*≥ 30.

## 5. Conclusions

A new method for finding the DoA of multiple sources that impinge on a ULA has been presented in this article. The proposed technique is based on sparse signal representation and outperforms classical direction finding algorithms, even subspace methods, in terms of RMSE and probability of resolution. The proposed technique assumes white noise and uncorrelated point sources. Furthermore, it does not require either the knowledge of the number of sources or a previous initialization.

## Appendix 1: proof of Theorem 1

*τ*

_{0}>

*τ*

_{1}

*>*⋯ >

*τ*

_{stop}≥ 0 and the associated solutions

**p**(

*τ*

_{0}),

**p**(

*τ*

_{1}), ...,

**p**(

*τ*

_{stop}) where a new component enter or leaves the support (the set of active elements) of

**p**(

*τ*). It can be proved that the sum of powers increases monotonically at each iteration of the algorithm. Suppose two non-negative vectors

**p**(

*τ*

_{ n }) and

**p**(

*τ*

_{n+1}) that are minimizers of (14) for the regularization parameters

*τ*

_{ n }and

*τ*

_{n+1}, respectively, with

*τ*

_{ n }>

*τ*

_{n+1}≥ 0. The following inequality holds for the breakpoint

*τ*

_{ n }:

*τ*

_{ n }is the same on both sides of the inequality. The expression on the right-hand side of the inequality (17) is equal to ${\Vert \widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}({\tau}_{n+1})\Vert}_{2}^{2}+{\tau}_{n+1}\Vert \mathbf{p}{({\tau}_{n+1})}_{1}\Vert +({\tau}_{n}-{\tau}_{n+1}){\Vert \mathbf{p}({\tau}_{n+1})\Vert}_{1}$. Therefore, the expression (17) can be rewritten as:

**p**(τ

_{n+1}) is the minimizer of (14) for the regularization parameter

*τ*

_{n+1}. Then, next inequality holds:

*τ*

_{n+1}is the same on both sides of the inequality. Bearing in mind (19) and (18), it is straightforward to obtain

*τ*

_{ n }║

**p**(

*τ*

_{ n })║

_{1}is added and subtracted from expression on the right-hand side of the inequality (20), the next expression is obtained

From this expression we can conclude that (*τ*_{
n
}- *τ*_{n+1}) (║**p**(*τ*_{n+1})║_{1} - ║**p**(*τ*_{
n
})║_{1}) ≥ 0. As *τ*_{
n
}> *τ*_{n+1}≥ 0, then ║**p**(*τ*_{n+1})║_{1} - ║**p**(*τ*_{
n
})║_{1} ≥ 0. Finally, we obtain ║**p**(*τ*_{n+1})║_{1} ≥ ║**p**(*τ*_{
n
})║_{1}.

## Appendix 2: proof of Theorem 2

**p**(

*τ*

_{n+1}) is a vector with non-negative components that minimizes the problem (14) for

*τ*

_{n+ 1}

*>*0, then the following inequality is fulfilled:

Since *τ*_{n+1}> 0 and ║**p**(*τ*_{n+1})║_{1} - ║**p**(*τ*_{
n
})║_{1} ≥ 0, as it was proved in Theorem 1, the following in-equality is fulfilled ${\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}-{\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}\ge 0$. Finally, we obtain ${\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n}\right)\u2225}_{2}^{2}\ge {\u2225\widehat{\mathbf{r}}-\mathbf{A}\mathbf{p}\left({\tau}_{n+1}\right)\u2225}_{2}^{2}$.

## Appendix 3: an alternative interpretation of the residual

The residual correlation **b** that appears when the LARS/homotopy algorithm is applied to the problem (14), has a clear physical interpretation.

**b**when LARS/homotopy is applied to (14) takes the form

**Ã**exposed in (10) and the sample covariance $\widehat{\mathbf{R}}$.

**Ãp**(

*τ*) can be expressed as

Since ${\mathbf{s}}_{i}^{*}\otimes {\mathbf{s}}_{i}=\text{vec}\text{(}{\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H})$, then $\stackrel{\u0303}{\mathbf{A}}\mathbf{p}\text{(}\tau \text{)=vec}\left\{\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}\right\}$

*τ*yields:

**Ã**presented in (10), the last expression can be rewritten as:

being ${\widehat{\mathbf{C}}}_{\tau}=\widehat{\mathbf{R}}-\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$.

*i*th component of

**b**(

*τ*) is real because it fulfills ${\mathbf{s}}_{1}^{H}{\widehat{\mathbf{C}}}_{\tau}{\mathbf{s}}_{1}={\mathbf{s}}_{1}^{H}{\widehat{\mathbf{C}}}_{\tau}^{H}{\mathbf{s}}_{1}$. Therefore, the residual correlation yields:

This result provides an alternative interpretation of the residual correlation. At each breakpoint *τ*, the corresponding residual **b**(*τ*) can be seen as the Barlett estimator applied to the residual covariance matrix ${\widehat{\mathbf{C}}}_{\tau}=\widehat{\mathbf{R}}-\sum _{i=1}^{G}{p}_{i}\text{(}\tau ){\mathbf{s}}_{i}{\mathbf{s}}_{i}^{H}$.

## Declarations

### Acknowledgements

This study was partially supported by the Spanish Ministry of Economy and Competitiveness under projects TEC2011-29006-C03-01 (GRE3N-PHY) and TEC2011-29006-C03-02 (GRE3N-LINK-MAC) and by the Catalan Government under grant 2009 SGR 891.

## Authors’ Affiliations

## References

- Trees HLV:
*Detection, Estimation, and Modulation Theory, Part IV: Optimum Array Processing*. John Wiley & Sons, New York, USA; 2002.Google Scholar - Capon J: High-resolution frequency-wavenumber spectrum analysis.
*Proc IEEE*1969, 57(8):1408-1418.View ArticleGoogle Scholar - Lagunas MA, Gasull A: An improved maximum likelihood method for power spectral density estimation.
*IEEE Trans Acoustics Speech Signal Process*1984, ASSP-32(1):170-173.View ArticleMATHGoogle Scholar - Schmidt R: Multiple emitter location and signal parameter estimation.
*IEEE Trans Antennas Propag*1986, 34(3):276-280. 10.1109/TAP.1986.1143830View ArticleGoogle Scholar - Stoica P, Nehorai A: Performance study of conditional and unconditional direction-of-arrival estimation.
*IEEE Trans Acoustics Speech Signal Process*1990, 38(10):1783-1795. 10.1109/29.60109View ArticleMATHGoogle Scholar - Högbom J: Aperture synthesis with an non-regular distribution of interferomer baselines.
*Astron Astrophys*1974, 15: 417-426.Google Scholar - Stoica P, Moses R:
*Spectral Analysis of Signals*. Prentice Hall, NJ, USA; 2005.Google Scholar - Tuncer TE, Friedlander B:
*Classical and Modern Direction-of-Arrival Estimation*. Elsevier Academic Press, Burlington, USA; 2009.Google Scholar - Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M: Deterministic edge-preserving regularization in computed imaging.
*IEEE Trans Image Process*1997, 6: 298-311. 10.1109/83.551699View ArticleGoogle Scholar - Zou H, Hastie T: Regularization and variable selection via the elastic net.
*J Royal Stat Soc Ser B*2005, 67(2):301-320. 10.1111/j.1467-9868.2005.00503.xMathSciNetView ArticleMATHGoogle Scholar - Donoho DL: Compressed sensing.
*IEEE Trans Inf Theory*2006, 52(4):1289-1306.MathSciNetView ArticleMATHGoogle Scholar - Boyd S, Vandenberghe L:
*Convex Optimization*. Cambridge University Press, Cambridge; 2004.View ArticleMATHGoogle Scholar - Donoho DL, Elad M: Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization.
*Proc Nat Aca Sci*2003, 100(5):2197-2202. 10.1073/pnas.0437847100MathSciNetView ArticleMATHGoogle Scholar - Donoho DL: For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution.
*Commun Pure Appl Math*2006, 59(7):907-934. 10.1002/cpa.20131MathSciNetView ArticleMATHGoogle Scholar - Tibshirani R: Regression shrinkage and selection via the lasso.
*J Royal Stat Soc Ser B*1996, 58: 267-288.MathSciNetMATHGoogle Scholar - Chen SS, Donoho DL, Saunders MA: Atomic decomposition by basis pursuit.
*SIAM J Sci Comput*1998, 20: 33-61. 10.1137/S1064827596304010MathSciNetView ArticleMATHGoogle Scholar - Donoho DL, Tsaig Y: Fast solution of l1-norm minimization problems when the solution may be sparse.
*IEEE Trans Inf Theory*2008, 54: 4789-4812.MathSciNetView ArticleMATHGoogle Scholar - Osborne MR, Presnell B, Turlach BA: A new approach to variable selection in least squares problems.
*IMA J Numer Anal*2000, 20(3):389-403. 10.1093/imanum/20.3.389MathSciNetView ArticleMATHGoogle Scholar - Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression.
*Annals Stat*2004, 32: 407-499. 10.1214/009053604000000067MathSciNetView ArticleMATHGoogle Scholar - Gorodnitsky IF, Rao BD: Sparse signal reconstruction from limited data using focuss: a re-weighted minimum norm algorithm.
*IEEE Trans Signal Process*1997, 45(3):600-616. 10.1109/78.558475View ArticleGoogle Scholar - Fuchs J: Linear programming in spectral estimation: application to array processing. In
*International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. Atlanta, GA; 1996:3161-3164. 6Google Scholar - Cotter SF, Rao BD, KE , Kreutz-delgado K: Sparse solutions to linear inverse problems with multiple measurement vectors.
*IEEE Trans Signal Process*2005, 53(7):2477-2488.MathSciNetView ArticleGoogle Scholar - Yardibi T, Li J, Stoica P, Xue M, Baggeroboer AB: Source localization and sensing: A nonparametric iterative adaptive approach based on weighted least squares.
*IEEE Trans Aerospace Electron Syst*2010, 46(1):425-443.View ArticleGoogle Scholar - Malioutov DM, Çetin M, Willsky AS: A sparse signal reconstruction perspective for source localization with sensor arrays.
*IEEE Trans Signal Process*2005, 53: 3010-3022.MathSciNetView ArticleGoogle Scholar - Eldar YC, Rauhut H: Saverage case analysis of multichannel sparse recovery using convex relaxation.
*IIEEE Trans Inf Theory*2010, 56(1):505-519.MathSciNetView ArticleGoogle Scholar - Yardibi T, Li J, Stoica P, Cattafesta LN: Sparsity constrained deconvolution approaches for acoustic source mapping.
*J Acoust Soc Am*2008, 123(5):2631-2642. 10.1121/1.2896754View ArticleGoogle Scholar - Picard JS, Weiss AJ: Direction finding of multiple emitters by spatial sparsity and linear programming. In
*International conference on Communications and information technologies (ISCIT)*. Incheon, Korea; 2009:1258-1262.Google Scholar - Stoica P, Babu P, Li J: Spice: A sparse covariance-based estimation method for array processing.
*IEEE Trans Signal Process*2011, 59(2):629-638.MathSciNetView ArticleGoogle Scholar - Liu Z, Huang Z, Zhou Y: Direction-of-arrival estimation of wideband signals via covariance matrix sparse representation.
*IEEE Trans Signal Process*2011, 59(9):4256-4270.MathSciNetView ArticleGoogle Scholar - Stoica P, Babu P, Li J: A sparse covariance-based method for direction of arrival estimation. In
*International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. Prague, Czech Republic; 2011:2844-2847.Google Scholar - Mørup M, Madsen KH, Hansen LK: Approximate
*L*_{0}constrained non-negative matrix and tensor factorization. In*International Conference on Circuits and Systems (ISCAS)*. Seattle, WA; 2008:1328-1331.Google Scholar - Rust BW, O'Leary DP: Residual periodograms for choosing regularization parameters for ill-posed problems.
*Inverse Probl*2008, 24(3):1-30.MathSciNetView ArticleMATHGoogle Scholar - Hansen P, Kilmer M, Kjeldsen R: Exploiting residual information in the parameter choice for discrete ill-posed problems.
*BIT Numer Math*2006, 46(1):41-59. 10.1007/s10543-006-0042-7MathSciNetView ArticleMATHGoogle Scholar - Barlett M:
*An introduction to Stochastic Processes*. Cambridge University Press, Cambridge; 1966.Google Scholar - Durbin J: Tests for serial correlation in regression analysis based on the periodogram of least-squares residuals.
*Biometrika*1969, 56(1):1-15. 10.1093/biomet/56.1.1MathSciNetView ArticleMATHGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.