Skip to main content

2-D DOA tracking using variational sparse Bayesian learning embedded with Kalman filter


In this paper, we consider the 2-D direction-of-arrival (DOA) tracking problem. The signals are captured by a uniform spherical array and therefore can be analyzed in the spherical harmonics domain. Exploiting the sparsity of source DOAs in the whole angular region, we propose a novel DOA tracking method to estimate the source locations and trace their trajectories by using the variational sparse Bayesian learning (VSBL) embedded with Kalman filter (KF). First, a transition probabilities (TP) model is used to build the state transition process, which assumes that each source moves to its adjacent grids with equal probability. Second, the states are estimated by KF in the variational E-step of the VSBL and the variances of the state noise and measurement noise are learned in the variational M-step of the VSBL. Finally, the proposed method is extended to deal with the off-grid tracking problem. Simulations show that the proposed method has higher accuracy than VSBL and KF methods.

1 Introduction

Direction-of-arrival (DOA) estimation is an active research field of array signal processing and has been used in various applications, such as radar, channel modeling, tracking, and surveillance [1,2,3]. Among the estimation algorithms, multiple signal classification (MUSIC) and estimation of signal parameters via rotational invariance techniques (ESPRIT) are the most representative methods, which employ the signal and noise subspaces. Compared with the conventional beamforming algorithms, these methods enhance the estimation precision. However, the number of impinging signals must be the prior knowledge and the computational complexity of decomposing the covariance matrix increases when the number of array elements rises. Recently, sparse reconstruction methods have attracted substantial attention because the signals impinging on an array are intrinsically sparse in the spatial domain [4, 5]. In these methods, the whole angular domain is divided into some predefined grids and a measurement matrix is constructed by sampling these grids. Based on the singular value decomposition (SVD), an approach named as l1-SVD was proposed to reduce the computational complexity and enforce sparsity using l1-norm [6]. Compared with the l1-SVD method, sparse Bayesian learning (SBL) can model the sparse signals more flexibly and give more accurate recovery results [7,8,9]. When SBL was used to estimate DOA, the sparse prior for the interested signals is Gaussian [10] or Laplacian distribution [11]. The SBL method can achieve good estimation results for static DOA estimation.

In the case of tracking moving targets, most methods assume that each source angle is constant during a time interval. However, it may differ from one interval to another because of the moving sources [12]. There are some different approaches to track the DOAs of moving sources, such as classical subspace optimization approach, sparsity recovery theory, and adaptive filter method. The subspace optimization approaches only optimize the signal or noise subspace without using the eigenvalue decomposition (EVD) so that they can reduce the computational complexity and storage requirements. Yang presented a new approach to track the signal subspace using an unconstrained minimization method [13]. The subspace method, which can be used to dynamic DOAs, was extended in L-shaped array [14] and two parallel linear arrays [15], respectively. The performance of the subspace method relies on the number of snapshots. This kind of method is inapplicable for the moving targets when the number of snapshots in each time interval is relatively small.

Vaswani et al. reviewed many algorithms on the analysis of dynamic sparse signal recovery. If the support change is highly correlated and the correlation model is known, we can get an accurate support estimation by using the previous estimation information [16,17,18]. In [19, 20], these methods combined the slow signal value change and slow support change to enhance the precision. Most of these methods are the deformation of basis pursuit denoising. A sequential Bayesian algorithm was introduced to estimate the moving DOAs in the time-varying circumstance [21]. It assumed that the sources move at a constant velocity and used this hypothesis to construct the signal-moving model. The locally competitive algorithm (LCA) was proposed to construct a dynamic system and track DOAs [22]. This method mainly introduced a thresholding function to enforce sparsity. A model was proposed to describe a time-varying array response in the frequency domain for each source [23]. The key of this paper is that it utilizes a hidden Markov model to describe the moving signal and uses the posterior inference to estimate the signal positions. It traps into the local optimum when the signals alias at high frequency.

The Kalman filter (KF) method is the most representative method for tracking sources in the adaptive filter theory, which means the value of the predicted state should equal to the value of the actual state by minimizing the Bayesian mean square error of the state vector. However, the variance of the noise and state can be seen as prior in the KF method. In [24], the time difference of arrival (TDOA) information was calculated by a pair of microphones and a distributed unscented KF method was used to track speakers in a nonlinear measurement model. The method combining KF with compression sensing was also used for dynamic DOA estimation [25, 26]. The state transition function was built under the assumption that the bearing change rate had been known [25]. This assumption is hard to be satisfied in real applications. A Bayesian compressive sensing Kalman filter (BCSKF) method [26] was proposed to track dynamic moving sources. This method used the constant DOA changes in the KF prediction, which meant that the source moved to the designated direction with a fixed step. The particle filter was utilized to track the trajectory of target. It adopted a series of particles to represent the posterior distribution of signal stochastic process. For example, an algorithm combining compressive sense and particle filter was introduced in [27]. However, it only used the compressive sense to estimate the original position and utilized particle filter to track the trajectory.

In this paper, a spherical array is used to track the moving sources in 3-D space because the array is 3D symmetric structure and can capture high-order sound field information [28,29,30]. Besides, comparing with linear array, spherical array can capture the two-dimension information of signal rather than single-angle information. In addition, due to the special construction of spherical array, the receiving signal can be unfolded in special harmonic domain, which can separate the signal position coordinates and sensor position coordinates conveniently.

A new method is proposed based on the spherical array to track the 2-D dynamic DOAs. It combines the variational Bayesian inference and KF to improve the tracking performance. First, a transition probabilities (TP) model is built to describe the state transition process of a signal. In this model, the source moves to an uncertain situation with an equal probability rather than a determining expected DOA change in [26]. In addition, TP model is convenient to build with a tracking framework and adopts sparse methods to estimate signal position. Based on this model, an alternating iterative method is developed to track DOAs. In the first step, we use the KF method to estimate the signal values. In the second step, we use variational sparse Bayesian learning (VSBL) to learn the variances of the measurement noise and state noise, which are useful to update the signal state in the KF method. This is an interdependent process.

There are three differences between the proposed approach and the method appeared in paper [26]. On the one hand, the proposed method constructs a real-valued steering matrix in signal model rather than splitting the complex value into several real-valued ones, which can reduce a half of the computational quantity approximately. On the other hand, the KF estimation values is used instead of the Bayesian estimation values. The main contribution of paper [26] is putting the estimation parameters of KF to optimize the low bound of relate vector machine. However, the KF estimation will be embedded in VSBL to improve the precision further in this paper. Finally, we introduce the off-grid model to overcome the mismatch problems.

The rest of the paper is organized as follows. The real-valued array signal model is given in Section 2. The variational Bayesian inference is briefly reviewed, and the proposed tracking method is introduced in Sections 3 and 4. Numerical examples and simulation results are given in Section 5. Section 6 concludes the paper.

The notations description as:

(a, b, A, B, ), scalar variable

arg(), phase operator

(A, B, ), the matrix variable

||, the absolute value

(a, b, ), column vector

()+, the Moore-Penrose pseudo inverse

()T, transpose operator

()', the derivation operator

(), complex conjugation operator

, l2 norm

()H, conjugate transpose operator

F , Frobenius norm

diag(), diagonal matrix

〉, expectation operator

blkdiag(), block diagonal matrix

A(n, :), the nth row of matrix A

Re(), the real parts of a complex value

A(:, n), the nth column of matrix A

exp(), the exponent signal

Im(), the imaginary parts of a complex value

k, the wavenumber

D, the number of signals

(ϑ l , φ l ), the elevation and azimuth of the sensor

\( \left({\overset{\smile }{\theta}}_{d,t},{\overset{\smile }{\phi}}_{d,t}\right) \), the elevation and azimuth of the dth signal at the tth time interval

L, the number of sensors

t, time interval

B, snapshots

R, the radius of sphere array

G1, G2, the azimuth and elevation range

i, the imaginary unit \( \sqrt{-1} \)

\( \left(\overset{\smile }{\mathbf{X}},\overline{\mathbf{X}},\mathbf{X}\right) \), the space domain, spherical domain and real-valued receiving signal

\( \left(\overset{\smile }{\mathbf{A}},\widehat{\mathbf{A}},\overline{\mathbf{A}},\mathbf{A}\right) \), the true steering, dictionary, the spherical and real-valued matrix

\( \left(\overset{\smile }{\mathbf{S}},\overline{\mathbf{S}},\mathbf{S}\right) \), the space amplify, the sparse amplify signal and the real-valued amplify signal

\( \left(\overset{\smile }{\mathbf{V}},\overline{\mathbf{V}},\mathbf{V}\right) \), the space domain, the spherical and the real-valued noise

h n , spherical Hankel function of order n

\( {Y}_n^m\left(\cdot \right) \), the spherical harmonic of order n and degree m

I n , an n  ×  n identity matrix

j n , the n-order spherical Bessel function

J n , the exchange matrix with ones on its antidiagonal and zeros elsewhere

()', the derivation operators

Γ(), a Gamma function

0 n , a column vector containing n zeros

(Ab), take the variables A except variable b


2 Real-valued array signal model

Assuming D dynamic narrowband far-field signals with the wavenumber k impinge on a spherical array from \( {\overset{\smile }{\Phi}}_{d,t}=\left({\overset{\smile }{\theta}}_{d,t},{\overset{\smile }{\phi}}_{d,t}\right) \). It is assumed that there are B snapshots available to process the received data and estimate DOA in each time interval. We use t to represent the range [(t − 1)B + 1, , tB] for simplicity. The signal angular change is slow so that it can be considered as fixed in the time interval t. The spherical array is made of L identical isotropic elements with radius R in Fig. 1. The position of the lth sensor can be described as R l  = R[cosφ l  sin ϑ l , sinφ l  sin ϑ l , cosϑ l ]T. The steering vector of the array is defined as

$$ \overset{\smile }{\mathbf{a}}\left({\overset{\smile }{\theta}}_{d,t},{\overset{\smile }{\phi}}_{d,t}\right)=\left[\begin{array}{c}\exp \left( ikR\left(\sin {\overset{\smile }{\theta}}_{d,t}\sin {\vartheta}_1\cos \left({\overset{\smile }{\phi}}_{d,t}-{\varphi}_1\right)+\cos {\overset{\smile }{\phi}}_{d,t}\cos {\varphi}_1\right)\right)\\ {}\vdots \\ {}\exp \left( ikR\left(\sin {\overset{\smile }{\theta}}_{d,t}\sin {\vartheta}_L\cos \left({\overset{\smile }{\phi}}_{d,t}-{\varphi}_L\right)+\cos {\overset{\smile }{\phi}}_{d,t}\cos {\varphi}_L\right)\right)\end{array}\right]. $$

where \( i=\sqrt{-1} \). The output of the array is given by

$$ {\overset{\smile }{\mathbf{X}}}_t={\overset{\smile }{\mathbf{A}}}_t{\overset{\smile }{\mathbf{S}}}_t+{\overset{\smile }{\mathbf{V}}}_t, $$

where \( {\overset{\smile }{\mathbf{X}}}_t={\left[{\overset{\smile }{\mathbf{x}}}_{1,t},\cdots, {\overset{\smile }{\mathbf{x}}}_{L,t}\right]}^T \), \( {\overset{\smile }{\mathbf{x}}}_{l,t}={\left[{x}_l\left(\left(t-1\right)B+1\right),\cdots, {x}_l(tB)\right]}^T \) is a column vector, which denotes the lth sensor receiving signal, \( {\overset{\smile }{\mathbf{S}}}_t={\left[{\overset{\smile }{\mathbf{s}}}_{1,t},\cdots, {\overset{\smile }{\mathbf{s}}}_{D,t}\right]}^T \) is the amplitude of source signal, \( {\overset{\smile }{\mathbf{s}}}_{d,t}=\left[{s}_d\left(\left(t-1\right)B+1\right),\cdots, {s}_d(tB)\right]. \) \( {\overset{\smile }{\mathbf{A}}}_t=\left[\overset{\smile }{\mathbf{a}}\left({\overset{\smile }{\theta}}_{1,t},{\overset{\smile }{\phi}}_{1,t}\right),\cdots, \overset{\smile }{\mathbf{a}}\left({\overset{\smile }{\theta}}_{D,t},{\overset{\smile }{\phi}}_{D,t}\right)\right] \) is the steering matrix, and \( {\overset{\smile }{\mathbf{V}}}_t={\left[{\overset{\smile }{\mathbf{v}}}_{1,t},\cdots, {\overset{\smile }{\mathbf{v}}}_{L,t}\right]}^T \) is the measurement noise, and \( {\overset{\smile }{\mathbf{v}}}_{l,t}={\left[{v}_l\left(\left(t-1\right)B+1\right),\cdots, {v}_l(tB)\right]}^T \).

Fig. 1
figure 1

Spherical array geometry

In order to construct a sparse signal model, we divide the azimuth and elevation range into G1 >  > D and G2 >  > D grids, respectively. That is to say, the whole 2-D angular space is divided into G = G1G2 grids noted as Φ = {Φ1, , Φ G }, where \( {\Phi}_g=\left({\theta}_{g_1},{\phi}_{g_2}\right) \), g1 = 1, , G1, g2 = 1, , G2. The output of the sensors at the time interval t can be reformulated as

$$ {\overset{\smile }{\mathbf{X}}}_t={\widehat{\mathbf{A}}\overline{\mathbf{S}}}_t+{\overset{\smile }{\mathbf{V}}}_t, $$

where \( \widehat{\mathbf{A}}=\left[\overset{\smile }{\mathbf{a}}\left({\Phi}_1\right),\cdots, \overset{\smile }{\mathbf{a}}\left({\Phi}_G\right)\right] \) is the redundant dictionary matrix containing the interested angles \( {\overset{\smile }{\Phi}}_{d,t} \), d = 1, , D, and \( {\overline{\mathbf{S}}}_t={\left[{\overline{\mathbf{s}}}_{1,t},\cdots, {\overline{\mathbf{s}}}_{G,t}\right]}^T \) is a sparse matrix with only D nonzero rows corresponding to the positions of the sources. \( {\overset{\smile }{a}}_l\left({\Phi}_g\right) \) denotes the lth element of the dictionary vector \( \overset{\smile }{\mathbf{a}}\left({\Phi}_g\right) \), and it can be represented using spherical harmonics [31] as

$$ {\overset{\smile }{a}}_l\left({\Phi}_g\right)=\sum \limits_{n=0}^N\sum \limits_{m=-n}^n{b}_n(kR){\left[{Y}_n^m\left({\Phi}_g\right)\right]}^{\ast }{Y}_n^m\left({\Omega}_l\right), $$

where Ω l  = (ϑ l , φ l ), N is the maximum spherical harmonics order, b n (kR) is the far-field mode strength, which depends on the array properties. The simplest spherical array configuration is the open sphere composed of sensors suspended in free space. It is assumed that the other accessories, such as cables and mounting brackets, have no effect for sensors capturing. However, this configuration might make the open sphere suffer from poor robustness at some certain frequency points. Another common configuration is the rigid sphere. In this configuration, sensors are mosaicked on a rigid spherical baffle, which means the sound waves will be scattered by the sphere. So, the mode strength of these two configurations is different. The magnitude of b n (kR)for different sphere is shown in Fig. 2.

Fig. 2
figure 2

The magnitude of bn for different sphere

The specific form can be expressed as

$$ {b}_n(kR)=\Big\{{\displaystyle \begin{array}{c}4\pi {i}^n{j}_n(kR)\kern15.5em \mathrm{open}\kern0.5em \mathrm{sphere}\\ {}4\pi {i}^n\left({j}_n(kR)-\frac{j_n^{\prime }(kR)}{h_n^{\prime }(kR)}{h}_n(kR)\right)\kern6.9em \mathrm{rigid}\kern0.5em \mathrm{sphere}\end{array}}\operatorname{}, $$

\( {Y}_n^m\left(\theta, \phi \right) \) defined as:

$$ {Y}_n^m\left(\theta, \phi \right)=\sqrt{\frac{2n+1}{4\pi}\frac{\left(n-m\right)!}{\left(n+m\right)!}}{P}_n^m\left(\cos \theta \right){e}^{im\phi}, $$

where 0 ≤ n ≤ N,   − n ≤ m ≤ n, and \( {P}_n^m\left(\cos \theta \right) \) are the associated Legendre polynomials [32].

According to (4), the dictionary matrix \( \widehat{\mathbf{A}} \) can be written as

$$ \widehat{\mathbf{A}}=\mathbf{Y}\left(\boldsymbol{\Omega} \right)\mathbf{B}(k){\mathbf{Y}}^H\left(\boldsymbol{\Phi} \right), $$

where Y(Ω) is an L × J spherical harmonic matrix whose lth row is given as:

$$ \mathbf{y}\left({\Omega}_l\right)=\left[\underset{n=0}{\underbrace{Y_0^0\left({\Omega}_l\right)}},\underset{n=1}{\underbrace{Y_1^{-1}\left({\Omega}_l\right),{Y}_1^0\left({\Omega}_l\right),{Y}_1^1\left({\Omega}_l\right)}},\cdots, \underset{n=N}{\underbrace{Y_N^{-N}\left({\Omega}_l\right),\cdots, {Y}_N^N\left({\Omega}_l\right)}}\right], $$

where J = (N + 1)2, Y(Φ) is a G × J spherical harmonic matrix defined similarly to Y(Ω), and B(k) is a J × J far-field mode strength matrix:

$$ \mathbf{B}(k)=\operatorname{diag}\left(\underset{n=0}{\underbrace{b_0(kR)}},\underset{n=1}{\underbrace{b_1(kR),{b}_1(kR),{b}_1(kR)}},\cdots, \underset{n=N}{\underbrace{b_N(kR),\cdots, {b}_N(kR)}}\right). $$

From (7), the dictionary matrix consists of three terms. The first and the second terms are only correlated with the sensor positions and the wavenumber, respectively. Furthermore, considering Y(Ω)B(k) is an L × J matrix (L ≥ J) that has a left pseudo inverse, (3) can be rewritten as:

$$ {\overline{\mathbf{X}}}_t={\overline{\mathbf{A}}\overline{\mathbf{S}}}_t+{\overline{\mathbf{V}}}_t, $$

where \( {\overline{\mathbf{X}}}_t={\mathbf{B}}^{-1}(k){\mathbf{Y}}^{+}\left(\boldsymbol{\Omega} \right){\overset{\smile }{\mathbf{X}}}_t \), \( \overline{\mathbf{A}}={\mathbf{Y}}^H\left(\boldsymbol{\Phi} \right) \), and \( {\overline{\mathbf{V}}}_t={\mathbf{B}}^{-1}(k){\mathbf{Y}}^{+}\left(\boldsymbol{\Omega} \right){\overset{\smile }{\mathbf{V}}}_t \). Owing to the special property of spherical harmonic function, the complex-valued model can be transformed into a real-valued one to reduce the computational complexity.

Because of the characterization of \( {\left[{Y}_n^m\left(\Phi \right)\right]}^{\ast }={\left(-1\right)}^m{Y}_n^{-m}\left(\Phi \right) \) for each element in \( \overline{\mathbf{A}} \), we can transform \( \overline{\mathbf{A}} \) into a matrix with column conjugation property [33] for each order by Q1:

$$ {\mathbf{Q}}_1{\overline{\mathbf{X}}}_t={\mathbf{Q}}_1{\overline{\mathbf{A}}\overline{\mathbf{S}}}_t+{\mathbf{Q}}_1{\overline{\mathbf{V}}}_t, $$
$$ {\mathbf{Q}}_1=\mathrm{blkdiag}\left({\mathbf{Q}}_{1,0},\cdots, {\mathbf{Q}}_{1,N}\right), $$
$$ {\mathbf{Q}}_{1,n}\kern0.5em =\kern0.5em \operatorname{diag}\left(\underset{n}{\underbrace{{\left(-1\right)}^n,{\left(-1\right)}^{n-1}\cdots, \left(-1\right)}},1,\underset{n}{\underbrace{1,1,\cdots 1}}\right). $$

Q1 is a unitary matrix. The new matrix \( {\mathbf{Q}}_1\overline{\mathbf{A}} \) is:

$$ {\mathbf{Q}}_1\overline{\mathbf{A}}={\left[{\mathbf{y}}_0^0\left(\boldsymbol{\Phi} \right),\cdots, \underset{2N+1}{\underbrace{{\mathbf{y}}_N^N\left(\boldsymbol{\Phi} \right),\cdots, {\mathbf{y}}_N^0\left(\boldsymbol{\Phi} \right),\cdots, {\left[{\mathbf{y}}_N^N\left(\boldsymbol{\Phi} \right)\right]}^{\ast }}}\right]}^T, $$

where \( {\mathbf{y}}_n^m\left(\boldsymbol{\Phi} \right)={\left[{Y}_n^m\left({\Phi}_1\right),\cdots, {Y}_n^m\left({\Phi}_G\right)\right]}^T \). From (14), we can find that each order has the column conjugation property, which means we can transform the complex-valued matrix into a real-valued one by linear combinations. So, the transform matrix is built as:

$$ {\mathbf{Q}}_2=\mathrm{blkdiag}\left({\mathbf{Q}}_{2,0},\cdots, {\mathbf{Q}}_{2,N}\right), $$
$$ {\mathbf{Q}}_{2,n}=\frac{1}{\sqrt{2}}\left[\begin{array}{ccc}{\mathbf{I}}_n& {\mathbf{0}}_n& {\mathbf{J}}_n\\ {}{\mathbf{0}}_n^T& \sqrt{2}& {\mathbf{0}}_n^T\\ {}-i{\mathbf{J}}_n& {\mathbf{0}}_n& i{\mathbf{I}}_n\end{array}\right], $$

where n  ≥  1, and Q2, 0 = 1. From (15), Q2 is a block diagonal unitary matrix due to \( {\mathbf{Q}}_2^{-1}={\mathbf{Q}}_2^H \). Utilizing the unitary transform matrix Q2, (11) can be changed into

$$ {\mathbf{Q}}_2{\mathbf{Q}}_1{\overline{\mathbf{X}}}_t={\mathbf{Q}}_2{\mathbf{Q}}_1{\overline{\mathbf{A}}\overline{\mathbf{S}}}_t+{\mathbf{Q}}_2{\mathbf{Q}}_1{\overline{\mathbf{V}}}_t. $$

We use \( {\tilde{\mathbf{X}}}_t=\mathbf{A}{\overline{\mathbf{S}}}_t+{\tilde{\mathbf{V}}}_t \) to stand for (17) for simplicity. The new signal and noise matrices after the unitary transformation can be described as \( {\tilde{\mathbf{X}}}_t=\mathbf{Q}{\overline{\mathbf{X}}}_t \) and \( {\tilde{\mathbf{V}}}_t=\mathbf{Q}{\overline{\mathbf{V}}}_t \), where Q = Q2Q1. The real-valued dictionary matrix can be inferred as follows:

$$ \mathbf{A}=\mathbf{Q}\overline{\mathbf{A}}={\mathbf{QY}}^H\left(\boldsymbol{\Phi} \right)={\widehat{\mathbf{Y}}}^H\left(\boldsymbol{\Phi} \right)={\left[{\widehat{\mathbf{y}}}_0^0\left(\boldsymbol{\Phi} \right),\cdots, \underset{2N+1}{\underbrace{{\widehat{\mathbf{y}}}_N^{-N}\left(\boldsymbol{\Phi} \right),\cdots, {\widehat{\mathbf{y}}}_N^0\left(\boldsymbol{\Phi} \right),\cdots, {\widehat{\mathbf{y}}}_N^N\left(\boldsymbol{\Phi} \right)}}\right]}^T, $$

where \( {\widehat{\mathbf{y}}}_n^m\left(\boldsymbol{\Phi} \right)={\left[{\widehat{Y}}_n^m\left({\Phi}_1\right),\cdots, {\widehat{Y}}_n^m\left({\Phi}_G\right)\right]}^T \) and \( {\widehat{Y}}_n^m\left({\Phi}_g\right) \) is

$$ {\widehat{Y}}_n^m\left({\Phi}_g\right)\kern0.5em =\kern0.5em \left\{\begin{array}{c}\sqrt{2}\operatorname{Re}\left({Y}_n^m\left({\Phi}_g\right)\right)\kern7em m<0\\ {}{Y}_n^m\left({\Phi}_g\right)\kern10.3em m=0\\ {}{\left(-1\right)}^{m-1}\sqrt{2}\operatorname{Im}\left({Y}_n^m\left({\Phi}_g\right)\right)\kern3.2em m>0\end{array}\right.. $$

Afterwards, \( {\tilde{\mathbf{X}}}_t \), \( {\overline{\mathbf{S}}}_t \), and \( {\tilde{\mathbf{V}}}_t \) can be partitioned into real and imaginary parts which are combined as \( {\mathbf{X}}_t=\left[\operatorname{Re}\left({\tilde{\mathbf{X}}}_t\right),\kern0.5em \operatorname{Im}\left({\tilde{\mathbf{X}}}_t\right)\right] \), \( {\mathbf{S}}_t=\left[\operatorname{Re}\left({\overline{\mathbf{S}}}_t\right),\kern0.5em \operatorname{Im}\left({\overline{\mathbf{S}}}_t\right)\right] \), and \( {\mathbf{V}}_t=\left[\operatorname{Re}\left({\tilde{\mathbf{V}}}_t\right),\kern0.5em \operatorname{Im}\left({\tilde{\mathbf{V}}}_t\right)\right] \). Therefore, the complex model (11) can be transformed into a real-valued one as:

$$ {\mathbf{X}}_t={\mathbf{AS}}_t+{\mathbf{V}}_t. $$

Now, the real-valued signal model (20) is obtained in the spherical harmonics domain.

3 Variational sparse Bayesian learning embedded with Kalman filter (VSBLKF) for 2-D DOA tracking

When signal sources move in the whole angular space, it is necessary to model the evolution of S t in time. Most of existing tracking state models [19, 20, 34] reveal the velocity information of the target movement. However, a TP model [35] is adopted to describe the movement of sources in a more practical way in this paper. It assumes that each source can transfer to its adjacent grids or stay at the current grid with an equal probability. So, the grid set D = {1, , G} distributes in a plane with G1 rows and G2 columns and is divided into three types, the corner area (D1, D3, D7, and D9), the marginal area (D2, D4, D6, and D8), and the center area (D5). The corresponding relation between the three areas and the angular grids (Φ g ) is shown in Fig. 3. St = 1(g l , :) denotes the vector at the g l th grid during the time interval t − 1. The probability of the vector moving to the g c th position during the time interval t is f ij as follows

$$ {f}_{g_l{g}_c}=\left\{\begin{array}{l}\begin{array}{l}1/4\kern2.5em {g}_l=1,{g}_c-{g}_l=\left\{0,1,{G}_2,{\mathrm{G}}_2+1\right\}\\ {}1/6\kern2.5em {g}_l\in {\mathbf{D}}_2,{g}_c-{g}_l=\left\{0,\pm 1,{G}_2,{\mathrm{G}}_2\pm 1\right\}\\ {}1/4\kern2.5em {g}_l={G}_2,{g}_c-{g}_l=\left\{0,-1,{G}_2,{\mathrm{G}}_2-1\right\}\\ {}1/6\kern2.5em {g}_l\in {\mathbf{D}}_4,{g}_c-{g}_l=\left\{0,1,\pm {G}_2,1\pm {\mathrm{G}}_2\right\}\\ {}1/9\kern2.5em {g}_l\in {\mathbf{D}}_5,{g}_c-{g}_l=\left\{0,\pm 1,\pm \left({\mathrm{G}}_2-1\right),\pm \left({\mathrm{G}}_2+1\right),\pm {\mathrm{G}}_2\right\}\\ {}1/6\kern2.5em {g}_l\in {\mathbf{D}}_6,{g}_c-{g}_l=\left\{0,-1,\pm {G}_2,-1\pm {\mathrm{G}}_2\right\}\\ {}1/4\kern2.5em {g}_l=\left({G}_1-1\right){G}_2+1,{g}_c-{g}_l=\left\{0,1,-{G}_2,1-{\mathrm{G}}_2\right\}\\ {}1/6\kern2.5em {g}_l\in {\mathbf{D}}_8,{g}_c-{g}_l=\left\{0,\pm 1,-{G}_2,-{\mathrm{G}}_2\pm 1\right\}\\ {}1/4\kern2.5em {g}_l=G,{g}_c-{g}_l=\left\{0,-1,-{G}_2,-{\mathrm{G}}_2-1\right\}\end{array}\\ {}0\kern4.5em otherwise\end{array}\right.. $$
Fig. 3
figure 3

The schematic diagram of transition

Therefore, the vector in the g c th position at the time interval t can be expressed as \( {\mathbf{S}}_t\left({g}_c,:\right)=\sum \limits_{g_l=1}^G{f}_{g_l{g}_c}{\mathbf{S}}_{t=1}\left({g}_l,:\right) \). Considering the state noise exists in the process of transition, the state transition model can be described as

$$ {\mathbf{S}}_t={\mathbf{FS}}_{t-1}+{\mathbf{E}}_t, $$

where E t is the state noise.

In tracking dynamic targets, the most popular method is KF technique. It is based on Gaussian assumption. In the standard KF method, the covariance matrix of the state noise and the variance of the measurement noise are assumed to be known. However, in real applications, it is difficult to know these parameters. Because of the advantage of VSBL, which can model the sparse signal well, we want to introduce VSBL into the KF method to estimate the covariance matrix of the state noise and the measurement noise variances. Nevertheless, based on the model we construct, it might be infeasible to calculate the posterior distribution. Because the dimensionality of the latent parameters is too high to calculate directly. Therefore, we attempt to turn to approximation schemes for help. The marginal probability of the observed data X is gotten by integrating over the remaining unobserved variables θ(including the latent variables S and some hyper-parameters) [36].

$$ P\left(\mathbf{X}\right)=\int P\left(\mathbf{X},\boldsymbol{\uptheta} \right)d\boldsymbol{\uptheta} . $$

However, it is knotty to solve this integration. The variational approaches can be used to solve this problem by pulling in a distribution Q(θ) which allows the marginal log likelihood to be decomposed into two terms

$$ \ln P\left(\mathbf{X}\right)=L(Q)+ KL\left(Q\Big\Vert P\right), $$
$$ L(Q)=\int Q\left(\boldsymbol{\uptheta} \right)\ln \frac{P\left(\mathbf{X},\boldsymbol{\uptheta} \right)}{Q\left(\boldsymbol{\uptheta} \right)}d\boldsymbol{\uptheta}, $$

where KL(QP) is the Kull-Leibler divergence between Q(θ) and the posterior distribution P(θ| X) and it is given by

$$ KL\left(Q\Big\Vert P\right)=-\int Q\left(\boldsymbol{\uptheta} \right)\ln \frac{P\left(\boldsymbol{\uptheta} |\mathbf{X}\right)}{Q\left(\boldsymbol{\uptheta} \right)}d\boldsymbol{\uptheta}, $$

L(Q) is a functional about Q(θ). Since KL(QP) ≥ 0, it follows that L(Q) is strictly lower bound on lnP(X). Furthermore, because the left side of (24) is independent of Q(θ), maximizing L(Q) is tantamount to minimizing KL(QP). Therefore Q(θ) represents an approximation to the posterior distribution P(θ| X). The goal in a variational approach is to choose a suitable form for Q(θ) which is adequately simple and flexible. Besides, it should make the lower bound L(Q) to be readily evaluated and tight. In fact, some family of Q(θ) distributions can be chosen and the best approximation within this family is found by maximizing the lower bound with respect to Q(θ). One approach is to assume some tractable forms for Q(θ) and then to optimize L(Q) about the parameters of the distribution [36, 37]. We adopt an alternative step and consider a factorized form over the component variables {θ ζ } in θ as follows

$$ Q\left(\boldsymbol{\uptheta} \right)=\prod \limits_{\zeta }{Q}_{\zeta}\left({\theta}_{\zeta}\right). $$

This scheme is called a mean field theory [38]. Because of the true posterior distribution might not be calculated, an approximate distribution is introduced to estimate the distribution. In this paper, the mean field theory is adopted as the tool to solve above problem. In this method, we only limit the type range of probability distribution rather than the form of probability distribution, which can make the approximate distribution become more flexible and extensive. In the scenario considered, the signal is sparse in the whole space at each time interval, the potential position and noise variance can be seen as the unobserved variables, and the array receiving the signal can be seen as the observation variables. Therefore, the final model can be solved with the VSBL [39]. In the mean field theory of the posterior, the algorithm can optimize one parameter at a time while fixing all other parameters. The optimal distribution for each of the parameters can be expressed as:

$$ \ln \kern.05em Q\left(\upmu \right)\propto \left\langle \ln \kern.05em p\left(\mathbf{X},\uptheta \right)\right\langle {}_{\boldsymbol{\uptheta} \setminus \boldsymbol{\upmu}}, $$

where μ indicates one of the parameter and 〈lnp(X, θ) θ μ is the expectation of the joint probability of the data and latent variables which take over all variables except μ.

According to the Eq. (22), the state noise is assumed as the Gaussian distribution. The prior distribution of E t is

$$ p\left({\mathbf{E}}_t\right)={\mathrm{\mathcal{M}}\mathcal{N}}_{G,2B}\left(0,{\boldsymbol{\Sigma}}_{t\mid t},{\mathbf{I}}_G\right)={\left(2\pi \right)}^{-G}{\left|{\boldsymbol{\Delta}}_t\right|}^{-B}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{\Sigma}}_{t\mid t}^T{\boldsymbol{\Delta}}_t^{-1}{\boldsymbol{\Sigma}}_{t\mid t}\right)\right\}. $$

Since the measurement noise satisfies Gaussian distribution, we can get

$$ p\left({\mathbf{V}}_t\right)={\mathrm{\mathcal{M}}\mathcal{N}}_{J,2B}\left(0,{\boldsymbol{\Delta}}_t,{\mathbf{I}}_{2B}\right)={\left(2\pi \right)}^{-J}{\left|{\boldsymbol{\Delta}}_t\right|}^{-B}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\mathbf{V}}_t^T{\boldsymbol{\Delta}}_t^{-1}{\mathbf{V}}_t\right)\right\}, $$

where Δ t  = diag(Δ1, t, , ΔJ, t). The likelihood of measurement signal can be expressed as

$$ {\displaystyle \begin{array}{l}p\left({\mathbf{X}}_t|{\mathbf{S}}_t,{\boldsymbol{\Delta}}_t\right)={\mathrm{\mathcal{M}}\mathcal{N}}_{J,2B}\left({\mathbf{AS}}_t,{\boldsymbol{\Delta}}_t,{\mathbf{I}}_{2B}\right)\\ {}\kern9em ={\left(2\pi \right)}^{-J}{\left|{\boldsymbol{\Delta}}_t\right|}^{-B}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\left({\mathbf{X}}_t-{\mathbf{AS}}_t\right)}^T{\boldsymbol{\Delta}}_t^{-1}\left({\mathbf{X}}_t-{\mathbf{AS}}_t\right)\right]\right\}.\end{array}} $$

Equivalently, we introduce a conjugate prior over the inverse noise variance Δ t given by Gamma distribution:

$$ p\left({\boldsymbol{\Delta}}_t^{-1}\right)=\prod \limits_{j=1}^J Gam\left({\Delta}_{j,t}^{-1}|{\mathrm{b}}_{j,t}^{\left(\Delta \right)},{\mathrm{c}}_{j,t}\right). $$

The Eq. (31) will be the exponential form of quadric function about variable S t , when the variable Δ t is seen as a constant. Therefore, the conjugate prior of S t will obey the Gaussian distribution [36]. So, the prior distribution of S t is

$$ {\displaystyle \begin{array}{l}p\left({\mathbf{S}}_t|{\mathbf{FS}}_{t-1},{\boldsymbol{\Lambda}}_t\right)={\mathrm{\mathcal{M}}\mathcal{N}}_{G,2B}\left({\mathbf{FS}}_{t-1},{\boldsymbol{\Lambda}}_t,{\mathbf{I}}_G\right)\\ {}\kern10.5em ={\left(2\pi \right)}^{-G}{\left|{\boldsymbol{\Lambda}}_t\right|}^{-B}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)}^T{\boldsymbol{\Lambda}}_t^{-1}\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)\right]\right\}.\end{array}} $$

The signal satisfies Gaussian distribution with zero mean. The variance of S t is \( {\boldsymbol{\Lambda}}_t=\operatorname{diag}\left({\boldsymbol{\upalpha}}_t^{-1}\right) \), where hyperparameter α t  = [α1, t, , αG, t]. At the same time, the conjugate prior of αg, t satisfies Gamma distribution

$$ p\left({\alpha}_{g,t}\right)= Gam\left({\alpha}_{g,t}|{a}_{g,t},{b}_{g,t}\right)={b}_{g,t}^{a_{g,t}}{\alpha}_{g,t}^{a_{g,t}-1}{e}^{-{b}_{g,t}{\alpha}_{g,t}}/\Gamma \left({a}_{g,t}\right), $$

where Γ() is a Gamma function [40]. The expectation of αg, t in (34) is 〈αg, t〉 = ag, t/bg, t. Note that the marginal distribution of S t can be obtained by integrating over α t . Note that if the prior distribution and likelihood function conjugate each other, it will make the posterior distribution to have the same form with prior distribution [36].

Using the chain rule of probability [39], the posterior probabilistic distribution can be expressed as:

$$ p\left({\boldsymbol{\upalpha}}_t,{\mathbf{S}}_t,{\boldsymbol{\Delta}}_t|{\mathbf{X}}_t\right)\propto p\left({\mathbf{X}}_t|{\mathbf{S}}_t,{\boldsymbol{\Delta}}_t\right)p\left({\mathbf{S}}_t|{\boldsymbol{\upalpha}}_t\right)p\left({\boldsymbol{\upalpha}}_t\right)p\left({\boldsymbol{\Delta}}_t\right). $$

The variational framework introduces a factorial representation (27) approximating to the posterior distribution p(α t , S t , Δ t | X t ) by Q(α t , S t , Δ t ) = Q(S t )Q(α t )Q(Δ t ). We propose the VSBLKF algorithm using the expectation maximization (EM) updates. In order to compute the E-step, it needs to know the posterior distribution of the unknown sparse state signal, which is Gaussian with mean Utt and covariance Σtt. These parameters can be computed by using KF prediction and updating equations as follows:

Predict steps:

$$ {\mathbf{U}}_{t\mid t-1}={\mathbf{FU}}_{t-1\mid t-1}, $$
$$ {\boldsymbol{\Sigma}}_{t\mid t-1}={\mathbf{F}\boldsymbol{\Sigma}}_{t-1\mid t-1}{\mathbf{F}}^T+{\left\langle {\boldsymbol{\Lambda}}_t\right\rangle}_q. $$

Update steps:

$$ {\mathbf{K}}_t={\boldsymbol{\Sigma}}_{t\mid t-1}{\mathbf{A}}^T{\left({\mathbf{A}\boldsymbol{\Sigma}}_{t\mid t-1}{\mathbf{A}}^T+{\left\langle \boldsymbol{\Delta} \right\rangle}_q\right)}^{-1}, $$
$$ {\mathbf{U}}_{t\mid t}={\mathbf{U}}_{t\mid t-1}+{\mathbf{K}}_t\left({\mathbf{X}}_t-{\mathbf{AU}}_{t\mid t-1}\right), $$
$$ {\boldsymbol{\Sigma}}_{t\mid t}={\boldsymbol{\Sigma}}_{t\mid t-1}-{\mathbf{K}}_t{\mathbf{A}\boldsymbol{\Sigma}}_{t\mid t-1}. $$

Here, the initial value U0|0 can be assumed to be known. In the M-step, we use VSBL to calculate the state variance and measurement variance of the current time. According to the Eq. (28)

$$ {\displaystyle \begin{array}{c}\ln \kern.05em Q\left({\upalpha}_t\right)\propto \left\langle \ln \kern.05em p\left({\mathbf{S}}_t|{\mathbf{FS}}_{t-1},{\boldsymbol{\Lambda}}_t\right)p\left({\upalpha}_t\right)\right\langle {}_{\uptheta \setminus {\upalpha}_t}\\ {}=\left\langle \ln \kern.05em p\left({\mathbf{S}}_t|{\mathbf{FS}}_{t-1},{\boldsymbol{\Lambda}}_t\right)\right\langle +\ln \kern.05em p\left({\upalpha}_t\right)\\ {}=-B\ln \mid {\boldsymbol{\Lambda}}_t\mid -\frac{1}{2} tr\left\{\right\langle {\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)}^T{\boldsymbol{\Lambda}}_t^{-1}\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)\left\langle \right\}\\ {}+\sum \limits_{g=1}^G\left\{\left({a}_{g,t}-1\right)\ln \kern.05em {\alpha}_{g,t}-{b}_{g,t}{\alpha}_{g,t}\right\}+C\\ {}=\sum \limits_{g=1}^G\left\{\left({a}_{g,t}-1+B\right)\ln \kern.05em {\alpha}_{g,t}-\right({b}_{g,t}+\frac{1}{2}\left\langle \left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right){\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)}^T\right\langle \left){\alpha}_{g,t}\right\}+C\end{array}} $$

C is a constant number. Therefore, the posterior is given as:

$$ Q\left({\boldsymbol{\upalpha}}_t\right)=\prod \limits_{g=1}^G Gam\left({\alpha}_{g,t}|{\overline{a}}_{g,t},{\overline{b}}_{g,t}\right), $$
$$ {\overline{a}}_{g,t}\kern0.5em =\kern0.5em {a}_{g,t}\kern0.5em +\kern0.5em B, $$
$$ {\overline{b}}_{g,t}={b}_{g,t}+0.5\left\langle \left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right){\left({\mathbf{S}}_t-{\mathbf{FS}}_{t-1}\right)}^T\right\rangle, $$
$$ \left\langle {\Lambda}_{g,t}\right\rangle \kern0.5em ={\overline{a}}_{g,t}/{\overline{b}}_{g,t}, $$

The posterior distribution of the noise inverse variance can be similarly calculated as

$$ Q\left({\boldsymbol{\Delta}}_t\right)=\prod \limits_{j=1}^J Gam\left({\Delta}_{j,t}|{\overline{b}}_{j,t}^{\left(\Delta \right)},{\overline{c}}_{j,t}\right). $$
$$ {\overline{c}}_{j,t}^{\left(\Delta \right)}={c}_{j,t}^{\left(\Delta \right)}+0.5\left\langle \left({\mathbf{X}}_t-{\mathbf{AS}}_t\right){\left({\mathbf{X}}_t-{\mathbf{AS}}_t\right)}^T\right\rangle, $$
$$ {\overline{b}}_{j,t}^{\left(\Delta \right)}={b}_{j,t}^{\left(\Delta \right)}+J, $$
$$ \left\langle {\Delta}_{j,t}\right\rangle \kern0.5em ={\overline{b}}_{j,t}^{\left(\Delta \right)}/{\overline{c}}_{j,t}, $$

where the hyperpriors \( {a}_{g,t},{b}_{g,t},{b}_{j,t}^{\left(\Delta \right)},{c}_{j,t} \) are set as 10‐3. From (38), we can see that using these recursions cannot guarantee the prediction sparse. Running (38) up to a large enough t, the signal strength of this target will spill over to all entries of Utt. So, we propose a state corrector method to determine the true moving direction of next time and guarantee the prediction sparse. For each signal, there are different possible positions at next time. These possible positions correspond to nonzero values in Utt noted as \( {\mathbf{U}}_{t\mid t}\left({p}_1,:\right),\cdots, {\mathbf{U}}_{t\mid t}\left({p}_{\sum_{d=1}^D{w}_d},:\right) \), where w d  {4, 6, 9}. The value in the set correspond to different signal positions—corner area, marginal area, and central area successively. We find D maximum values in Utt corresponding to the maximum of [A(:, p κ )]TX t 2, where \( \kappa =1,\cdots, \sum \limits_{d=1}^D{w}_d, \) and consider these positions as true directions. At the same time, the rest of the rows in Utt are set to zeros. The steps of the proposed algorithm are summarized in the following. After that, we discuss the computational complexity of two models using VSBLKF to track DOA trajectory. The two models are the traditional complex-valued model and the proposed real-valued model, respectively. Different models lead to different measurement matrix dimensions, which may affect the parameter updates in VSBL. Table 1 shows the amount of computations using the two models. At, c, Xt, c, St, c and At, r, Xt, r, and St, r represent the measurement matrices, receiving data and sparse signals in the complex-valued model and real-valued model. Real multiplication and real addition are used to evaluate the computational cost. Our proposed real model transforms the complex-valued measurement matrix into a real-valued one through a unitary transformation without changing the dimension. After the real-valued transform, the computation complexity for each iteration will reduce much when the steering matrix is used to calculate. Therefore, using the real-valued transform can improve the rate of the proposed algorithm. A unitary transformation needs 4J2T real multiplications and (4J − 2)JT real additions, and it does not join the iteration of SBL. The computational load of the real-valued model is much lower than that of the complex-valued model, so the computation complexity can be reduced greatly.

Table 1 Comparison of computational load in one iteration
figure a

4 Extension to off-grid problem

The assumption that the estimated signal directions lie on specified grids is unrealistic. To reduce the errors which are caused by mismatch of the real locations and partition locations, the denser grids can be applied when dividing the measurement matrix of signal. Nevertheless, each column of the measurement matrix needs to satisfy the cross-unrelated property for sparse recovery. The denser grids might lead to a higher complexity and make the dictionary become coherent that violates the necessary conditions for compressive sensing. Hence, to make the proposed algorithm more practical, the off-grid model is considered. As mentioned earlier, the directional matrix \( {\widehat{\mathbf{Y}}}^H\left(\boldsymbol{\Phi} \right) \) in the VSBLKF algorithm is constructed on the uniform sampled angle grids Φ = {Φ1, , Φ G } with Φ g  = (θ g , ϕ g ), g = 1, , G. The target signal angles are \( \overset{\smile }{\boldsymbol{\Phi}}=\left\{{\overset{\smile }{\Phi}}_1,\cdots, {\overset{\smile }{\Phi}}_D\right\} \) with \( {\overset{\smile }{\Phi}}_d=\left({\overset{\smile }{\theta}}_d,{\overset{\smile }{\phi}}_d\right) \), d = 1, , D, D<<G, and \( {\tilde{\Phi}}_d \) which is not equal to Φ g . In addition, we define the nearest grid of target signal angles as \( {\widehat{\Phi}}_{g_d}\kern0.5em =\kern0.5em \left({\widehat{\theta}}_{g_d},{\widehat{\phi}}_{g_d}\right) \). The true steering vector \( \mathbf{a}\left({\tilde{\Phi}}_d\right) \) can be approximated by its Taylor extension as [41].

$$ \mathbf{a}\left({\tilde{\Phi}}_d\right)\approx \mathbf{a}\left({\widehat{\Phi}}_{g_d}\right)+\mathbf{b}\left({\widehat{\theta}}_{g_d}\right)\left({\overset{\smile }{\theta}}_d-{\widehat{\theta}}_{g_d}\right)+\mathbf{c}\left({\widehat{\phi}}_{g_d}\right)\left({\overset{\smile }{\phi}}_d-{\widehat{\phi}}_{g_d}\right), $$

where \( \mathbf{a}\left({\widehat{\Phi}}_{g_d}\right) \) represents the original steering vector, \( \mathbf{b}\left({\widehat{\theta}}_{g_d}\right) \) and \( \mathbf{c}\left({\widehat{\phi}}_{g_d}\right) \) are the partial derivative of \( {\overset{\smile }{\theta}}_d \) and \( {\overset{\smile }{\phi}}_d \) with respect to \( {\widehat{\Phi}}_{g_d} \). Let

$$ \mathbf{B}=\left[\mathbf{b}\left({\theta}_1\right),\cdots, \mathbf{b}\left({\theta}_G\right)\right)\Big], $$
$$ \mathbf{C}=\left[\mathbf{c}\left({\phi}_1\right),\cdots, \mathbf{c}\left({\phi}_G\right)\right], $$
$$ {\boldsymbol{\upbeta}}^{\mathrm{T}}=\left[{\beta}_1,{\beta}_2,\cdots, {\beta}_G\right], $$
$$ {\boldsymbol{\upgamma}}^{\mathrm{T}}=\left[{\gamma}_1,{\gamma}_2,\cdots, {\gamma}_G\right], $$
$$ {\beta}_g=\Big\{{\displaystyle \begin{array}{c}{\overset{\smile }{\theta}}_d-{\hat{\theta}}_{g_d},\kern3.3em g={g}_d\\ {}0,\kern4.9em otherwise\end{array}}\operatorname{}, $$
$$ {\gamma}_g=\Big\{{\displaystyle \begin{array}{c}{\overset{\smile }{\phi}}_d-{\hat{\phi}}_{g_d},\kern3.3em g={g}_d\\ {}0,\kern4.9em otherwise\end{array}}\operatorname{}. $$

So, the new steering matrix A + B diag(β) + C diag(γ) can replace the original steering matrix A to estimate both signal coarse girds and their bias with elevation and azimuth, simultaneously. According to [41], the author solved this problem from the Bayesian perspective. In this section, we use the least squares estimation to calculate the biases of elevation and azimuth, respectively. In the first step, we use VSBL to get the coarse estimation of true location. Next, using the expectation Utt to minimize the following equation as

$$ \underset{{\boldsymbol{\upbeta}}^{\xi },{\boldsymbol{\upgamma}}^{\xi }}{\min }{\left\Vert {\boldsymbol{\upbeta}}^{\xi}\right\Vert}^2+{\left\Vert {\boldsymbol{\upgamma}}^{\xi}\right\Vert}^2+{\left\Vert {\mathbf{X}}_t-\left(\mathbf{A}+\mathbf{B}\operatorname{diag}\left({\boldsymbol{\upbeta}}^{\xi}\right)+\mathbf{C}\operatorname{diag}\left({\boldsymbol{\upgamma}}^{\xi}\right)\right){\mathbf{U}}_{t\mid t}\right\Vert}_F^2, $$

where βξ stands for the kth iteration update of β and γξ represents the ξth iteration update of γ. When updating difference biases, we keep another one fixed. It is tantamount to the least squares problem when we want to solve

$$ {\mathbf{X}}_t-\left(\mathbf{A}+{\mathbf{C}}_t\operatorname{diag}\left({\boldsymbol{\upgamma}}^{\xi}\right)\right){\mathbf{U}}_{t\mid t}^{\xi}\kern0.5em =\kern0.5em \mathbf{B}{\Xi}_{{\mathbf{S}}_{t\mid t}}^{\xi }{\boldsymbol{\upbeta}}^{\xi }, $$
$$ {\mathbf{X}}_t-\left(\mathbf{A}+{\mathbf{B}}_t\operatorname{diag}\left({\boldsymbol{\upbeta}}^{\xi}\right)\right){\mathbf{U}}_{t\mid t}^{\xi}\kern0.5em =\kern0.5em \mathbf{C}{\Xi}_{{\mathbf{S}}_{t\mid t}}^{\xi }{\boldsymbol{\upgamma}}^{\xi }, $$
figure b

where \( {\Xi}_{{\mathbf{S}}_{t\mid t}}^{\xi }=\kern0.5em \operatorname{diag}\left({\mathrm{S}}_{1,t\mid t}^{\xi },\cdots, {\mathrm{S}}_{G,t\mid t}^{\xi}\right) \). So, at the kth iteration update of β is:

$$ {\boldsymbol{\upbeta}}^{\xi}\kern0.5em =\kern0.5em {\left(\mathbf{B}{\Xi}_{{\mathbf{S}}_{t\mid t}}^{\xi}\right)}^{+}\left({\mathbf{X}}_t-\left(\mathbf{A}+\mathbf{C}\operatorname{diag}\left({\boldsymbol{\upgamma}}^{\xi}\right)\right){\mathbf{U}}_{t\mid t}^{\xi}\right). $$

In a similar manner, the iterative form of γ can be obtained as:

$$ {\boldsymbol{\upgamma}}^{\xi}\kern0.5em =\kern0.5em {\left(\mathbf{C}{\Xi}_{{\mathbf{S}}_{t\mid t}}^{\xi}\right)}^{+}\left({\mathbf{X}}_t-\left(\mathbf{A}+\mathbf{B}\operatorname{diag}\left({\boldsymbol{\upbeta}}^{\xi}\right)\right){\mathbf{U}}_{t\mid t}^{\xi}\right). $$

The initial value of β and γ are set as 0 G (a column vector containing G zeros). The number of source signals is known a priori. Each signal current coarse location can be approximated by VSBLKF and the biases between the reference and estimation are gauged using (59) and (60), respectively. The concrete steps are summarized in Algorithm 2.

5 Results and discussion

In this section, we want to verify the robustness and performance of the proposed algorithm compared with the standard KF, SBL, and VSBL methods. We use a rigid spherical array, which has 32 sensors distributed in a uniform way and radius R = 0.1 m. The maximum order of the spherical harmonics is N = 4. In our experiments, the ranges of elevation and azimuth are defined from 0° to 180° and from 0° to 360°. We divide them into 31 and 62 grids with stationary angular interval respectively. Therefore, there are 1922 grids which are the possible angles for source signals. Note that we only choose the range of azimuth from 0° to 180° in our simulations to reduce the calculate complexity. In the moving process, assuming each source is likely to move to its adjacent grids or stay at current grid, which means a source can move to the different directions with 6° or be static with equal probability. The proposed method can track other trajectories as long as the trajectory can be described by the grids and obey the TP model. The trajectory used in this paper is randomly generated under the frame of TP model. In order to show a series of performance quantitatively, such as on-grid, off-grid, and RMSE vs SNR, we used one trajectory to explain these results. One random realization of this movement model is considered for T = 50 time interval. The number of Monte Carlo trials is 500. The hyperpriors \( {a}_{g,t},{b}_{g,t},{b}_{j,t}^{\left(\Delta \right)},{c}_{j,t} \) are set as 10‐3.

5.1 Example 1: the performance of the proposed method with an on-grid model

In the first simulation, we show the tracking performance of one signal at each time interval. The signal to noise ratio (SNR) is set as 12 dB while the strength of the signal is 10. The moving trace obtained by the four methods are shown in Fig. 4. Figures 5 and 6 give the tracking performance at different time intervals. The performance of SBL is poor, because the likelihood function of the measurements involved in SBL cannot match the true one. We can see that the VSBLKF method achieves the best tracking performance because it combines the advantages of KF with VSBL method. Figures 7 and 8 denote the signal angle errors (the differences between the estimated angle and reference angle) varying with time intervals. From these curves, the proposed method is better than the other approaches.

Fig. 4
figure 4

Traces of four methods based in the on-grid model

Fig. 5
figure 5

Estimated azimuth as a function of time

Fig. 6
figure 6

Estimated elevation as a function of time

Fig. 7
figure 7

Azimuth error as a function of time

Fig. 8
figure 8

Estimated elevation error as a function of time

5.2 Example 2: the RMSE versus SNR with an on-grid model

In Fig. 9, we compare the performance of the proposed algorithm under different SNRs with other approaches. The SNR is set as [7, 10, 13, 16, 20]. Root mean square error (RMSE) is adopted to measure the estimation performance under different SNRs. The RMSE is given by:

$$ RMSE=\frac{1}{NUM}\sum \limits_{n=1}^{NUM}\sqrt{\sum \limits_{d=1}^D\sum \limits_{t=1}^T{\left({\overset{\smile }{\theta}}_{d,t}-{\widehat{\theta}}_{d,t}\right)}^2+{\left({\overset{\smile }{\phi}}_{d,t}-{\widehat{\phi}}_{d,t}\right)}^2/2 DT}, $$

where \( \left({\overset{\smile }{\theta}}_{d,t},{\overset{\smile }{\phi}}_{d,t}\right) \) denotes the actual DOA, \( \left({\widehat{\theta}}_{d,t},{\widehat{\phi}}_{d,t}\right) \) denotes the estimated DOA, NUM is the number of Monte Carlo trials, T is the time interval number, and D is the signal numbers. We see that the VSBL method outperforms the KF method because the latter one does not consider the changes of the state noise variance. The prior knowledge of the variances of the state noise and the measured noise must be known for the KF approach. The proposed VSBLKF approach shows better performance than VSBL in tracking DOAs because the former one exploits the correlation of different time intervals. Just like the experiment display in Fig. 3, there are always some points deviating the ideal locations randomly.

Fig. 9
figure 9

RMSE as a function of SNR in the on-grid model

5.3 Example 3: the performance of the proposed method with an off-grid model

In this part, we assume that the moving signal angular trajectory deviates the sampling grids. The deviating errors are set to be a standard normal distribution. The orbits of VSBLKF and OGVSBLKF are shown in Fig. 10, where the SNR is 12 dB and the time interval number is 50. Figure 10 shows that VSBLKF still cannot track the true locus reliably, but the proposed method can estimate the angular locus more accurately. Note that the OGVSBL method is not considered here because the OGVSBL method is an improving algorithm under the condition of locating in coarse grids correctly. As we can see in Fig. 4, the VSBL method cannot satisfy the demand yet. In Figs. 11 and 12, we can observe the specific errors between the estimated values and references clearly in the azimuth and elevation, respectively. We used the least-square estimation to calculate the deviation. However, we can find that it still has some deviation. It might be because the least square estimation we used still cannot handle the deviation problem well.

Fig. 10
figure 10

Traces of two proposed methods in the off-grid model

Fig. 11
figure 11

Estimated elevation as a function of time

Fig. 12
figure 12

Estimated azimuth as a function of time

5.4 Example 4: the RMSE versus SNR with an off-grid model

In Fig. 13, we compare the performance of OGVSBLKF and VSBLKF under different SNRs varying from 7 to 20 dB with 3 dB interval approximately. In this simulation, we set the time interval number as 50 and Monte Carlo trials as 300. The angular deviating errors are set to be a normal distribution with a variance of 1.5. The OGVSBLKF shows better performances than the VSBLKF method.

Fig. 13
figure 13

RMSE as a function of SNR in the off-grid model

5.5 Example 5: the RMSE versus different grids

In Fig. 14, the influence of the proposed method with different grid intervals is depicted. From the figure, the estimation error will increase with the larger grid interval. Therefore, it is importance to select the suitable grid interval to estimate the signal direction trajectory when adopting the proposed method. However, this suitable grid interval might require several repeated tests. If we want to get a trade-off between performance and model complexity, we can use a coarse grid to search firstly, a fine grid then used to improve the precision of estimation.

Fig. 14
figure 14

RMSE as a function of SNR in the off-grid model with different grid interval

5.6 Example 6: the performance of the proposed method for multiple signals

In this example, we compare our proposed algorithm with other methods in the condition of multiple moving signals impinging on the spherical array. It aims at illustrating that our proposed algorithm can be applicable to tracking multiple signals. Figure 15 is used to demonstrate the tracking performance when the SNR is 12 dB and the number of time intervals is 17 from the perspectives of elevation and azimuth. As we can see, SBL cannot track the signal trajectory reasonably and our proposed algorithm shows advantages in tracing multiple signals.

Fig. 15
figure 15

Traces of three methods for two signals

5.7 Example 7: the cost time versus different methods

In addition, the computational time of different DOA tracking methods is also analyzed. We conduct an evaluation of the computational complexity using TIC and TOC instruction in MATLAB. All the simulation results are obtained using the same PC with an Intel i7-6700 and 8 GB RAM, running MATLAB 2015b on 64-bit Windows 10. The average computational time is given in Table 2, which is obtained from 300 Monte Carlo trials and 50 time intervals. It can be seen that the time cost of the proposed method is more than that of VSBL, but the precision of the proposed method is higher. Comparing VSBLKF with OGVSBLKF, we observe that the method based on the off-grid model only needs 7 s more than that based on the on-grid model, so the performance of OGVSBLKF is better. Note that the theoretic complexity is O() approximately, υ is the iteration numbers. The reason of long cost time of the propose method is that using the Kalman filter embedded into the VSBL might increase the iteration numbers to avoid the local values.

Table 2 Comparison of time cost for different methods

6 Conclusions

In order to track the 2-D DOAs, we construct the state transition function according to the TP model based on a spherical array. The angular space is divided into grids to model a sparse signal. Through combining VSBL and KF methods, we propose an effective method called VSBLKF to track dynamic DOAs, where the KF estimation is embedded into the VSBL to estimate the signals. Besides, we extend our algorithm to the off-grid model. Simulations show that the proposed method achieves better tracking and anti-noise performance than VSBL and KF. In the future, we will extend the algorithm to wideband signals.


  1. A Hassanien, SA Vorobyov, Transmit energy focusing for DOA estimation in MIMO radar with colocated antennas. IEEE Trans. Signal Process. 59(6), 2669–2682 (2011)

    Article  MathSciNet  Google Scholar 

  2. H Krim, M Viberg, Two decades of array signal processing research: the parametric approach. IEEE Signal Process. Mag. 13(4), 67–94 (1996)

    Article  Google Scholar 

  3. A Boukerche, H Oliveira, E Nakamura, A Loureiro, Localization systems for wireless sensor networks. IEEE Trans. Wirel. Commun. 14(6), 6–12 (2007)

    Article  Google Scholar 

  4. M Carlin, P Rocca, G Oliveri, Directions-of-arrival estimation through Bayesian compressive sensing strategies. IEEE Trans. Antennas Propag. 61(7), 3828–3838 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. B Wang, J Liu, X Sun, Mixed sources localization based on sparse signal reconstruction. IEEE Signal Process. Lett. 19(8), 487–490 (2012)

    Article  Google Scholar 

  6. D Malioutov, M Cetin, A Willsky, A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 53(8), 3010–3022 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. D Wipf, B Rao, An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process. 55(7), 3704–3716 (2007)

    Article  MathSciNet  Google Scholar 

  8. S Ji, Y Xue, L Carin, Bayesian compressive sensing. IEEE Trans. Image Process. 56(6), 2346–2356 (2008)

    Article  MathSciNet  Google Scholar 

  9. S Babacan, R Molina, A Katsaggelos, Bayesian compressive sensing using Laplace priors. IEEE Trans. Image Process. 19(1), 53–63 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. DP Wipf, BD Rao, An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process 55(7), 3704–3716 (2007)

  11. SD Babacan, R Molina, AK Katsaggelos, Bayesian compressive sensing using Laplace priors. IEEE Trans. Image Process. 19(1), 53–63 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  12. JF Gu, SC Chan, WP Zhu, et al., Joint DOA estimation and source signal tracking with Kalman filtering and regularized QRD RLS algorithm. IEEE Trans Circuits. Syst. II, Exp Briefs 60(1), 46–50 (2013)

    Article  Google Scholar 

  13. B Yang, Projection approximation subspace tracking. IEEE Trans. Signal Process. 43(1), 95–107 (Jan. 1995)

    Article  Google Scholar 

  14. C Liu, G Wang, J Xin, et al., Low complexity subspace-based two-dimensional direction-of-arrivals tracking of multiple targets. Proc. ICSP, 1825–1829 (2012)

  15. J Wu, J Xin, G Wang, et al., Two-dimensional direction tracking of coherent signals with two parallel uniform linear arrays. Proc. ICSP, 183–187 (2012)

  16. N Vaswani, J Zhan, Recursive recovery of sparse signal sequences from compressive measurements: a review. IEEE Trans. Signal Process. 64(13), 3523–3549 (2016)

    Article  MathSciNet  Google Scholar 

  17. N Vaswani, LS-CS-residual (LS-CS): compressive sensing on least squares residual. IEEE Trans. Signal Process. 58(8), 4108–4120 (2010)

    Article  MathSciNet  Google Scholar 

  18. N Vaswani, W Lu, Modified-CS: modifying compressive sensing for problems with partially known support. IEEE Trans. Signal Process. 58(9), 4595–4607 (2010)

    Article  MathSciNet  Google Scholar 

  19. M Friedlander, H Mansour, et al., Recovering compressively sampled signals using partial support information. IEEE Trans. Inf. Theory 58(2), 1122–1134 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. W Lu, N Vaswani, Regularized modified BPDN for noisy sparse reconstruction with partial erroneous support and signal value knowledge. IEEE Trans. Signal Process. 60(1), 182–196 (2012)

    Article  MathSciNet  Google Scholar 

  21. XZ Gao et al., A sequential Bayesian algorithm for DOA tracking in time-varying environments. Chin. J. Electron. 24(1), 140–145 (2015)

    Article  Google Scholar 

  22. J Jia, L Yu, H Sun, et al., Dynamic DOA estimation via locally competitive algorithm. Proc. ICSP, 336–340 (2014)

  23. Higuchi, Takuya, et al. Underdetermined blind separation and tracking of moving sources based on DOA-HMM. IEEE Int. Conf. Acoust, Speech and Signal Process, 3191-3195 (2014).

  24. Y Tian, Z Chen, F Yin, Distributed imm-unscented kalman filter for speaker tracking in microphone array networks. IEEE Trans. Audio Speech Language Process. 23(10), 1637–1647 (2015)

    Article  Google Scholar 

  25. P Khomchuk, I Bilik, Dynamic direction-of-arrival estimation via spatial compressive sensing. Proc. Radar, 1191–1196 (2010)

  26. M Hawes, L Mihaylova, F Septier, et al., A Bayesian compressed sensing Kalman filter for direction of arrival estimation. Proc. IEEE Int. Conf. Inf. Fusion, 969–975 (2015)

  27. F Ning, L Ning, et al., Combining compressive sensing with particle filter for tracking moving wideband sound sources. IEEE Int. Conf. Signal Process, Commun and Comput, 1–6 (2015)

  28. B Rafaely, Analysis and design of spherical microphone arrays. IEEE Trans. Speech, Audio Process. 13(1), 135–143 (2005)

    Article  Google Scholar 

  29. Q Huang, T Wang, Acoustic source localization in mixed field using spherical microphone arrays. EURASIP J. Adv. Sign. Process. 2014(1), 1–16 (2014)

    Article  Google Scholar 

  30. R Goossens, H Rogier, 2-D angle estimation with spherical arrays for scalar fields. Signal Process., IET 3(3), 221–231 (2009)

    Article  MathSciNet  Google Scholar 

  31. H Sun, S Yan, U Svensson, Robust spherical microphone array beamforming with multi-beam-multi-null steering, and sidelobe control. in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust, 113–116 (2009)

  32. B. Rafaely, Fundamentals of Spherical Array Processing, Springer, chapter 1, pp. 10, 2015.

  33. DA Linebarger, RD DeGroat, EM Dowling, Efficient direction-finding methods employing forward-backward averaging. IEEE Trans. Sign. Proc. 42, 2136–2145 (1994)

    Article  Google Scholar 

  34. A Roda, C Micheloni, Tracking sound sources by means of HMM. IEEE Int. Conf. Adv. Video and Signal-Based Surveillance, 83-85 (2011) 

  35. S Farahmand, GB Giannakis, G Leus, et al., Tracking target signal strengths on a grid using sparsity. Eurasip J. Adv. Sign. Proc. 2014(1), 1–17 (2011)

    Google Scholar 

  36. C Bishop, Pattern recognition and machine learning (Springer-Verlag, New York, 2008), pp. 477–482

    Google Scholar 

  37. MJ Wainwright, MI Jordan, Graphical models, exponential families, and variational inference. Now Publishers Inc, pp. 159–164, 2008

  38. EP Xing, MI Jordan, S Russell, A generalized mean field algorithm for variational inference in exponential families. Proceed. Conf. Uncertainty Artific. Intel, 583–591 (2012)

  39. DG Tzikas, AC Likas, NP Galatsanos, The variational approximation for Bayesian inference. IEEE Sign. Proc. Magazine. 25(6), 131–146 (2008)

    Article  Google Scholar 

  40. P. Sebah, X. Gourdon, “Introduction to the Gamma Function,” [Online]. Available:

  41. Z Yang, L Xie, C Zhang, Off-grid direction of arrival estimation using sparse Bayesian inference. IEEE Trans. Signal Process. 61(1), 38–43 (2013)

    Article  MathSciNet  Google Scholar 

Download references


The authors would like to thank the editor and anonymous reviewers for their valuable comments.


The work was supported by the National Natural Science Foundation (61571279, 61501288) and the Shanghai Science and Technology Commission Scientific Research Project (16010500100).

Author information

Authors and Affiliations



QH and JH designed and implemented the proposed algorithm and wrote the paper. KL and YF scientifically supervised the work and contributed in implementing the proposed algorithm. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Qinghua Huang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Q., Huang, J., Liu, K. et al. 2-D DOA tracking using variational sparse Bayesian learning embedded with Kalman filter. EURASIP J. Adv. Signal Process. 2018, 23 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: