Skip to main content

Deep learning-based DOA estimation for hybrid massive MIMO receive array with overlapped subarrays


As massive MIMO is a key technology in the future sixth generation (6G), the large-scale antenna arrays are widely considered in direction-of-arrival (DOA) estimation for they can provide larger aperture and higher estimation resolution. However, the conventional fully digital architecture requires one radio-frequency (RF) chain per antenna, and this is challenging for the high hardware costs and much more power consumption caused by the large number of RF chains. Therefore, an overlapped subarray (OSA) architecture-based hybrid massive MIMO array is proposed to reduce the hardware costs, and it can also have better DOA estimation accuracy compared to non-overlapped subarray (NOSA) architecture. The simulation results also show that the accuracy of the proposed OSA architecture has \(6^{\circ }\) advantage over the NOSA architecture with signal-to-noise ratio (SNR) at 10 dB. In addition, to improve the DOA estimation resolution, a deep learning (DL)-based estimator is proposed by combining convolution denoise autoencoder (CDAE) and deep neural network (DNN), where CDAE can remove the approximation error of sample covariance matrix (SCM) and DNN is used to perform high-resolution DOA estimation. From the simulation results, CDAE-DNN can achieve the accuracy lower bound at \(\textrm{SNR}=-8\) dB and the number of snapshots \(N=100\), this means it has better performance in poor communication situation and can save more software resources compared to conventional estimators.

1 Introduction

Direction-of-arrival (DOA) estimation has been an important research direction in the areas of wireless communications, radar, sonar, etc., for a long time [1]. With the development of 5G, the massive MIMO system has been studied extensively. However, the realization of the traditional full digital system requires a high hardware complexity, so the hybrid analog and digital (HAD) system was considered as an alternative[2]. Then, the DOA estimation problem for the HAD system was discussed in [3], and the directional modulation design for hybrid antenna array was also considered in [4]. In addition to the common architecture, various hybrid architectures were considered in [5] and [6], the overlapped subarrays (OSA) architecture in [7] was proved to have better beamforming performance than nonoverlapped subarrays (NOSA) architecture.

Recently, machine learning (ML) and deep learning (DL) have been applied to wireless communication and signal processing in many papers, such as [8] proposed a ML-based low-complexity method for channel state information (CSI) estimation, the neural network is used to locate the structural sound source in [9]. DL-based methods are also popular in DOA estimation. Traditional DOA estimation methods are mainly divided into two categories: parameter estimation-based methods and subspace methods[10]. The first category contains nonlinear least-square (NLS) estimator and maximum likelihood estimator, while the classical subspace methods include MUSIC, ESPRIT, root-MUSIC, etc. Compared to them, the DL-based methods have lower complexity than parameter estimation-based methods and higher accuracy than subspace methods. In [11], a deep neural network (DNN) was proposed for DOA estimation with array imperfections. [12] gave a DNN-based DOA estimation method for hybrid massive MIMO systems with uniform circular arrays (UCAs). [13] also combined DL with massive MIMO system and proposed a super-resolution channel estimation method.

Besides the ordinary DNN, there are some special neural network structures that can be used for estimation. For example, the convolution neural network (CNN) is another popular choice [14, 15], especially to improve the accuracy in the low signal-to-noise-ratio (SNR) regime [16, 17]. In [18], the complex ResNet was applied for DOA estimation in the near-field MIMO systems. A deep residual network was also proposed for underdetermined DOA estimation in [19]. And deep adaptive temporal network (DAT-Net) was also considered in [20].

The autoencoder (AE) is a kind of neural network, and it is trained to copy the input to the output. In [11], the AE was used to map the inputs into the corresponding DNN network. When the input data contains noise, we can obtain the noiseless data by denoising autoencoder (DAE) proposed by [21]. And replace the hidden layers in the DAE with the convolution layers, then the convolution denoising autoencoder (CDAE), which is widely used in the field of image processing [22].

Hybrid architecture has been popular in the research of DOA estimation with large-scale arrays, but there are also some problems that arise, such as NOSA architecture can significantly reduce the hardware costs, but at the expense of estimation performance, and fully-connected (FC) architecture requires much more phase shifters than NOSA to get better performance. Therefore, in order to achieve a better balance between DOA estimation performance and hardware costs for hybrid architectures, the OSA-based hybrid massive MIMO receive array is proposed in this work, which can be viewed as the common form of NOSA and FC. In addition, to further improve the DOA estimation accuracy, we propose a novel DL-based method, which combines CDAE and DNN, where CDAE can remove the noise of sample covariance matrix, and DNN is used to perform high-precision DOA estimation. Therefore, our main contributions are summarized as follows:

  1. 1.

    Different from [6, 7], the OSA architecture is implied in hybrid massive MIMO receive array to achieve a better balance between DOA estimation accuracy and hardware complexity. As the number of elements in each subarray is the same as the NOSA architecture, the OSA architecture has more radio-frequency (RF) chains to achieve a larger virtual aperture, and it can get more accurate estimation results. The simulation results also show that the OSA has better performance than NOSA when the SNR and the number of snapshots are low. And the CRLB for the special HAD-OSA architecture is also given in this work.

  2. 2.

    In order to solve the DOA estimation problem for HAD-OSA architecture, a DL-based method called CDAE-DNN is also proposed in this letter. To improve the accuracy of the estimator under poor situations with low SNR and small number of snapshots, the inputted sample covariance matrix (SCM) is first imported to the CDAE for clearing the approximation errors, and then the DNN is employed to perform the multi-classification task. Comparing the simulation results of the proposed CDAE-DNN, MUSIC, and CNN in [16], it is obvious that the CDAE-DNN has significant advantages over the other methods, especially when \(\text {SNR}\le -6\) dB and \(N\le 2000\).

Notation:Matrices, vectors, and scalars are denoted by letters of bold upper case, bold lower case, and lower case, respectively. Signs \((\cdot )^T\) and \((\cdot )^H\) represent transpose and conjugate transpose. \(\textbf{I}\) and \({\textbf {0}}\) denote the identity matrix and matrix filled with zeros. \(\text {Re}\{\cdot \}\) and \(\text {Im}\{\cdot \}\) represent the real part and imaginary part of a complex number. \(\text {tr}(\cdot )\) stands for trace operation, and \(\odot\) denotes Hadamard product.

Fig. 1
figure 1

An M-antennas hybrid massive MIMO array with ovelapped subarray (OSA) architecture performs DOA estimation via receiving signals from Q different sources

Fig. 2
figure 2

Examples of OSA and other hybrid array architectures, corresponding to the part of Fig. 1 framed by dashed lines, where \(K=2\) and \(M=6\). a OSA with \(M_s=4\) and \(\Delta M_s=2\), b non-overlapped subarray (NOSA) with \(M_s=3\) and \(\Delta M_s=0\), c fully-connected (FC) with \(M_s=\Delta M_s=6\)

2 System model

The diagram of a DOA estimation system with the hybrid massive MIMO receive array is shown in Fig. 1. This hybrid array is equipped with an M-antennas uniform linear array (ULA), and antennas are connected to K RF chains via the OSA architecture. As shown in Fig. 2, we make a comparison between the proposed hybrid OSA architecture and other hybrid architectures, let \(M_s\) represent the number of antennas connected to the same RF chain, and these M antennas form a subarray, then the number of repeated antennas between two adjacent subarrays is denoted by \(\Delta M_s\). Therefore, for a hybrid array with OSA architecture, given M and K, we can get the following relationship

$$\begin{aligned} M=KM_s-(K-1)\Delta M_s, \end{aligned}$$

it is obviously that when \(\Delta M_s=0\) and \(\Delta M_s=M_s\), the OSA architecture is transformed to NOSA architecture and FC architecture, respectively.

Then, back to the DOA estimation problem, we assume Q narrowband signals from different far-field sources impinge on the hybrid array with OSA architecture, and the q-th signal is denoted by \(\tilde{s}_q(t)=s_q(t)e^{j2\pi f_ct}\), where \(s_q(t)\) is baseband signal and \(f_c\) is carrier frequency. Then, the signal received by the m-th antenna is given as [5]

$$\begin{aligned} x_m(t)=\sum _{q=1}^{Q}\tilde{s}_q(t)e^{-j2\pi f_c\tau _{q,m}}+v_m(t), \end{aligned}$$

where \(v_m(t)\sim \mathcal{C}\mathcal{N}(0,\sigma _v^2)\) is the additive white Gaussian noise (AWGN) term and \(\tau _{q,m}\) represents the propagation delay of the q-th signal to m-th antenna, it is expressed by

$$\begin{aligned} \tau _{q,m}=\tau _0-\frac{(m-1)d\sin \theta _q}{c}, \end{aligned}$$

where \(\tau _0\) is the propagation delay from the signal to a reference point on the ULA, d and c denote the antenna space and light speed, respectively, and \(\theta _q\) is the DOA of the q-th signal. Then, by combining the received signals of all the M antennas, and times the analog beamforming matrix \(\textbf{W}\), the output signal of the K RF chains can form a \(K\times 1\) vector, which is given by

$$\begin{aligned} \textbf{y}(t)=\left[ y_1(t),y_2(t),\cdots ,y_K(t)\right] ^T=e^{j2\pi f_ct}\textbf{W}^H\textbf{A}(\varvec{\theta })\textbf{s}(t)+\textbf{W}^H\textbf{v}(t), \end{aligned}$$

where \(y_k(t)\) is the output signal of the k-th RF chain, \(\textbf{s}(t)=[s_1(t),s_2(t),\cdots ,s_Q(t)]^T\in \mathbb {C}^{Q\times 1}\) and \(\textbf{v}(t)=[v_1(t),v_2(t),\cdots ,v_M(t)]^T\in \mathbb {C}^{M\times 1}\). \(\textbf{A}(\varvec{\theta })=[\textbf{a}(\theta _1),\cdots ,\textbf{a}(\theta _Q)]\in \mathbb {C}^{M\times Q}\) is the array steering matrix and

$$\begin{aligned} \textbf{a}(\theta _q)=[1,e^{j\frac{2\pi }{\lambda } d\sin \theta _q},\cdots ,e^{j\frac{2\pi }{\lambda } (M-1)d\sin \theta _q}]^T, \end{aligned}$$

where \(\lambda\) denotes the signal wavelength. The analog beamforming matrix \(\textbf{W}\in \mathbb {C}^{M\times K}\) is expressed by

$$\begin{aligned} \textbf{W}=\left[ \textbf{W}(1),\textbf{W}(2),\cdots ,\textbf{W}(K)\right] , \end{aligned}$$

where \(\textbf{W}(k)\) represents the k-th column of \(\textbf{W}\) and it is given as [7]

$$\begin{aligned} \textbf{W}(k)=\left[ \varvec{0}_{(k-1)(M_s-\Delta M_s)}^T~~\textbf{w}_k^T~~\varvec{0}_{(K-k)(M_s-\Delta M_s)}^T\right] ^T, \end{aligned}$$

where \(\varvec{0}_L\) is an \(L\times 1\) vector filled with 0, and

$$\begin{aligned} \textbf{w}_k=[w_{k,1},w_{k,2},\cdots ,w_{k,M_s}]^T, \end{aligned}$$

where \(w_{k,m_s}=\frac{1}{\sqrt{M_s}}e^{j\alpha _{k,m_s}}\), \(\alpha _{k,m_s}\) is the corresponding phase of \(m_s\)th phase shifter in the kth subarray. Finally, after the down conversion and ADC operations, we can get the discrete baseband signal based on (4)

$$\begin{aligned} \textbf{y}(n)=\textbf{W}^H\textbf{A}(\varvec{\theta })\textbf{s}(n)+\textbf{W}^H\textbf{v}(n), \end{aligned}$$

where \(\textbf{y}(n)=\textbf{y}(nT_s)\), \(1\le n\le N\) and \(T_s\) denotes the sampling interval, \(\textbf{v}(n)\sim \mathcal{C}\mathcal{N}(\varvec{0},\sigma _v^2\textbf{I}_M)\).

For the signal waveform \(\textbf{s}(n)\), there are usually two modeling approaches, the first means treating \(\textbf{s}(n)\) as an unknown deterministic vector, and the second means considering it as an stochastic process[23]. In [24] and [5], the maximum-likelihood estimators were proposed based on the assumption of deterministic signal waveform. But in this paper, we let \(\textbf{s}(t)\) be a zero-mean Gaussian random vector, and its covariance matrix is denoted by

$$\begin{aligned} \textbf{C}_s=\text {E}\left[ \textbf{s}(n)\textbf{s}^H(n)\right] . \end{aligned}$$

Therefore, based on the statistical properties of \(\textbf{s}(n)\) and \(\textbf{v}(n)\), we can know \(\textbf{y}(n)\) is also a zero-mean Gaussian random vector and its covariance matrix can be expressed by

$$\begin{aligned} \textbf{C}=\text {E}\left[ \textbf{y}(n)\textbf{y}^H(n)\right] =\textbf{W}^H\left( \textbf{A}\textbf{C}_s\textbf{A}^H+\sigma _v^2\textbf{I}_M\right) \textbf{W} \end{aligned}$$

However, since the steering matrix \(\textbf{A}\) and noise power are unknown, the covariance matrix \(\textbf{C}\) is usually unavailable in practice, then the sample covariance matrix \(\tilde{\textbf{C}}\) can be employed as an approximation

$$\begin{aligned} \tilde{\textbf{C}}=\frac{1}{N}\sum _{n=1}^N\textbf{y}(n)\textbf{y}^H(n)=\textbf{C}+\varvec{\varepsilon } \end{aligned}$$

where \(\varvec{\varepsilon }\) denotes the approximation error. And if \(N\rightarrow \infty\), we can get \(\tilde{\textbf{C}}=\textbf{C}\).

Fig. 3
figure 3

The procedure of proposed CDAE-DNN. This network is consisting of two parts, convolution denoise autoencoder (CDAE) and fully-connected deep neural network (DNN), the sample covariance matrix \(\tilde{\textbf{R}}\) is firstly input to CDAE, then the denoised covariance matrix \(\textbf{R}\) is sent to DNN for DOA estimation

3 Deep learning-based DOA estimation method for HAD-OSA

In this section, we propose a DNN-based DOA estimator called CDAE-DNN to improve the accuracy of DOA estimation for hybrid massive array with OSA architecture. The proposed CDAE-DNN is composed of two parts, a CDAE network is used to recover the covariance matrix of the received signals from the approximation errors, and a DNN can realize the high-resolution DOA estimation. A diagram of CDAE-DNN is shown in Fig. 3. The CDAE is composed of an encoder and a decoder, which have symmetrical convolution layers. The feature matrices extracted from SCM are sent to CDAE for denoising the approximation errors, and the input vectors of the next part are extracted from the denoised SCM. Then, the DNN-based estimator is a multilayer neural network, and each element of its output layer corresponding to a direction of the considered spatial region. Therefore, the proposed CDAE-DNN relies on the denoised input provided by CDAE and high-resolution estimator to improve the DOA estimation accuracy.

3.1 Data preprocessing

To ensure the stability of the input data and improve the accuracy of the neural network model, we choose the sampled covariance matrix \(\tilde{\textbf{C}}\) as the input feature, which is an alternative to the unavailable covariance matrix \(\textbf{C}\). However, the input of neural networks must be real numbers, so we extract both the real part and the imaginary part of \(\tilde{\textbf{C}}\) and construct a \(K\times K\times 2\) tensor \(\tilde{\textbf{R}}\), i.e., \(\tilde{\textbf{R}}_{:,:,1}=\text {Re}\{\tilde{\textbf{C}}\}\) and \(\tilde{\textbf{R}}_{:,:,2}=\text {Im}\{\tilde{\textbf{C}}\}\).

Then, the label vector \(\textbf{z}=[z_1,z_2,\cdots ,z_L]^T\) of input data is defined as follows. Firstly, we assume the angular region containing all the emitters is \([-\theta _0,\theta _0]\), and the label interval is \(\Delta \theta\) which is determined by resolution requirement. Therefore, the length of \(\textbf{z}\) is given as \(L=\frac{2\theta _0}{\Delta \theta }+1\). And \(\textbf{z}\) is a binary vector containing label 1 at the positions corresponding to the Q training angles and label 0 at the rest positions. So the training dataset can be finally expressed by \(\{(\tilde{\textbf{R}}^{(1)},\textbf{z}^{(1)}),(\tilde{\textbf{R}}^{(2)},\textbf{z}^{(2)}),\cdots ,(\tilde{\textbf{R}}^{(T)},\textbf{z}^{(T)})\}\).

3.2 CDAE

The traditional autoencoder (AE) is a kind of neural network consisting of three parts: encoder, code, and decoder. The input data are first compressed to a lower dimension form, i.e., code, by the encoder, and then the decoder recovers the code to the initial form of the input data. Encoder and decoder have symmetric neural network architectures to perform the opposite operations, so the traditional autoencoders can be summarized as a two-step process

$$\begin{aligned} \textbf{r}=f(\tilde{\textbf{R}})~~ \tilde{\textbf{R}}=g(\textbf{r}), \end{aligned}$$

where \(f(\cdot )\) and \(g(\cdot )\) denote encode and decode, respectively, \(\textbf{r}\) represents the code.

Since the input data contains error \(\varepsilon\), we consider using DAE technique to remove it. Different from the conventional autoencoders, DAE accepts corrupted data as input and is trained to predict the initial uncorrupted data as output. That is, the two-step process of the autoencoder is transformed to: \(\textbf{r}=f(\tilde{\textbf{R}})\) and \(\textbf{R}=g(\textbf{r})\). Otherwise, because the input data are a \(K\times K\times 2\) tensor, we consider using the convolution network to implement the function of both the encoder and the decoder to improve accuracy. Next, we are going to introduce the complete procedure of CDAE.

Firstly, assuming the encoder is constructed by a H-layers convolution network, the encode function can be modified as

$$\begin{aligned} \textbf{r}=f\left( \tilde{\textbf{R}}\right) =f_H\left( f_{H-1}\left( \cdots f_1\left( \tilde{\textbf{R}}\right) \right) \right) , \end{aligned}$$

and each layer contains a convolution layer, a batch normalization (BN) layer and an activation layer. For the H convolution layers, each has \(G_h\) filters \(h\in \{1,2,\cdots ,H\}\). Since the input data are 2-channel, the size of the first convolution layer is \(\kappa _1\times \kappa _1\times 2\times G_1\). And the sizes of the other \(H-1\) convolution layers are given by \(\kappa _h\times \kappa _h\times G_h\). Therefore, the output of the hth convolution layer can be denoted by \(\textbf{F}_h\in \mathbb {R}^{D_h\times D_h\times G_h}\) and its uth channel, \(u\in \{1,2,\cdots ,G_h\}\), is given by

$$\begin{aligned} \textbf{F}_{h,u}=c\left( \textbf{K}_{h,u},\textbf{r}_{h-1},\delta _h\right) +\textbf{b}_{h,u}, \end{aligned}$$

where \(c(\cdot )\) denotes convolution operation, \(\textbf{K}_{h,u}\) represents the uth filter in the hth convolution layer. \(\textbf{r}_{h-1}\) is the output of the corresponding layer in the encoder, \(\textbf{r}_0=\tilde{\textbf{R}}\). \(\delta _h\) denotes the stride. \(\textbf{b}_{h,u}\) is the bias matrix of the uth filter. The activation function adopted here is RELU, so that the layer output of the encoder can be obtained as

$$\begin{aligned} \textbf{r}_{h,u}=\text {RELU}\left( \textbf{F}_{h,u}\right) =\max \left( {\textbf {0}},\textbf{F}_{h,u}\right) , \end{aligned}$$

and \(\textbf{r}=\{\textbf{r}_{H,u}\}_{u=1}^{G_H}\).

Contrary to the encoder, the decoder is required to restore the extracted feature to the form of the original input, which is an upsampling process, also called deconvolution in [25]. Similar to (14), the decode function is expressed by

$$\begin{aligned} \hat{\textbf{R}}=g\left( \textbf{r}\right) =g_H\left( g_{H-1}\left( \cdots g_1\left( \textbf{r}\right) \right) \right) , \end{aligned}$$

since the structure of the decoder is symmetric with the encoder, each layer of decoder also contains convolution, BN and activation layers. And the layers’ sizes are the same as that of encoder, i.e., \({\textbf {\text {size}}}(g_h)={\textbf {\text {size}}}(f_{H-h+1})\). It is obvious that in practical application the DAE cannot completely remove the noise \(\varepsilon\), so the output of the decoder here is \(\hat{\textbf{R}}\) rather than \(\textbf{R}\).

In the DAE training period, our goal is to find the optimal network parameters based on the training dataset. Thus, we choose MSE as the loss function, and it is defined as

$$\begin{aligned} \mathcal {L}\left( \Theta \right) =\frac{1}{T}\sum _{i=1}^{T}\left\| \hat{\textbf{R}}^{(i)}-\textbf{R}^{(i)}\right\| ^2_F, \end{aligned}$$

where i denotes the index of training data, \(\Theta\) contains all the weights and biases in the DAE network.

3.3 Proposed CDAE-DNN

As shown in Fig. 3, the extracted feature tensor \(\tilde{\textbf{R}}\) is first inputted to the CDAE for eliminating the estimation error \(\varvec{\varepsilon }\). Then, the output \(\hat{\textbf{R}}\) is inputted to a \((H_{d}+2)\)-layers DNN. The first layer is a flatten layer, which is used for transforming \(\hat{\textbf{R}}\) into a \(2K^2\times 1\) vector. And it is followed by \(H_{d}\) dense layers, each containing \(G_{h_{d}}\) neurons, \(h_{d}\in \{1,2,\cdots ,H_{d}\}\). We also choose RELU as the activation function for them, and to achieve regularization in the learning process, the dropout ratio is set as \(20\%\). Therefore, the output of \(h_{d}\)th dense layer is given as

$$\begin{aligned} \textbf{r}_{h_{d}}=\text {RELU}\left( \textbf{w}_{h_{d}}\textbf{r}_{h_{d}-1}+\textbf{b}_{h_{d}}\right) , \end{aligned}$$

where \(\textbf{w}_{h_{d}}\) and \(\textbf{b}_{h_{d}}\) denote weight vector and bias vector, respectively. When \(h_{d}=1\), \(\textbf{r}_0=\text {vec}(\hat{\textbf{R}})\).

The last layer of the DNN is the output layer with L neurons, and the form of the final output vector is expressed as

$$\begin{aligned} \hat{\textbf{z}}=\left[ \hat{z}_1,\hat{z}_2,\cdots ,\hat{z}_L\right] ^T. \end{aligned}$$

In order to satisfy \(0\le z_l\le 1,~l\in \{1,2,\cdots ,L\}\), the activation function for this layer can use sigmoid, which is defined as

$$\begin{aligned} f(x)=\frac{1}{1+e^{-x}}. \end{aligned}$$

Then, the Q biggest elements are selected from \(\hat{\textbf{z}}\), and their corresponding angles are the estimation results.

Since this is a multi-label problem and we want the final output vectors in the form of probability distributions, we decide to use the binary cross-entropy (BCE) as loss function, which is given by [16]

$$\begin{aligned} \mathcal {L}_{FC}\left( \Theta _{FC}\right) =-\frac{1}{L}\sum _{l=1}^L&\bigg [z^{(i)}_l\log \left( \hat{z}^{(i)}_l\right) +\\&\left( 1-z^{(i)}_l\right) \log \left( 1-\hat{z}^{(i)}_l\right) \bigg ], \end{aligned}$$

where \(i\in \{1,2,\cdots ,T\}\), then the optimal weights and biases of the DNN can be obtained by minimizing it.

4 Simulation results

In this section, the simulation results are provided for evaluating the performance of the proposed DNN-based DOA estimator for the HAD-OSA architecture, and all the simulations related to deep learning are done based on TensorFlow and Matlab platforms. The batch size and the number of epochs are set as 1000 and 30. We also choose stochastic gradient descent (SGD) algorithm as the optimizer and learning rate is set as 0.1. And the other parameters considered in the simulation are listed in Table 1.

Table 1 The values of simulation parameters
Fig. 4
figure 4

Comparison of DOA estimation performance between different methods and different hybrid architectures. This demonstrates the excellent performance of the proposed CDAE-DNN at low SNR

Figure 4 displays how the DOA estimation accuracy varies with the improvement of SNR. In this simulation, the direction of signal source is set as \(\theta =-35.9^{\circ }\), the number of snapshots is \(N=100\), the range of SNR is -20 dB to 20 dB, and all the simulation results are averaged over 5000 Monte-Carlo experiments. Except for the proposed CDAE-DNN in this work, we also take NOSA architecture and three existing methods into consideration as benchmarks. Since predicting DOA by using DNN is essentially a multi-classification problem, and the implementation principle of the MUSIC algorithm is also based on the grid search, then there is a lower bound on the estimation accuracy of these methods when the angle to be estimated is off-grid, as shown in Fig. 4. This lower bound is dependent on the grid size, which is set as \(1^{\circ }\) in this simulation, and hence the best estimation RMSE of grid-based methods is \(0.1^{\circ }\). For the classical off-grid DOA estimation algorithms like ESPRIT, when applied to arrays with HAD structures, they will be affected by the phase ambiguity problem, and resulting in a severe decrease in estimation accuracy as shown in Fig. 4. We can see in this figure, the proposed CDAE-DNN has great advantages in the low SNR region especially when \(\mathrm{{SNR}\le -8}\) dB, for example, the rmse of CDAE-DNN at -12 dB is \(3.56^{\circ }\), while that for MUSIC and CNN are \(40.11^{\circ }\) and \(34.65^{\circ }\), respectively, so our proposed method can improve the DOA estimation accuracy more than \(30^{\circ }\) over traditional methods. Then, compared to NOSA architecture, the accuracy of proposed OSA architecture is approximately \(10^{\circ }\) higher at low SNR.

Fig. 5
figure 5

DOA estimation performance of different methods versus the number of snapshots. This indicates the performance advantage of the proposed CDAE-DNN with lower N

Figure 5 shows the relationship between RMSE and the number of snapshots in the environment with SNR = -13 dB. The error decreases as N increases and eventually reaches the accuracy lower-bound of \(0.1^{\circ }\). As can be seen in this figure, our proposed method has great performance advantages under low number of snapshots, it can achieve the lower bound at \(N=500\), but MUSIC need \(N=8000\) and CNN need \(N=2000\) to achieve that, and the rmse of MUSIC and CNN at \(N=500\) are \(35^{\circ }\) and \(12^{\circ }\) respectively, so CDAE-DNN can save a lot of resource overhead compared with the traditional algorithms. And OSA needs only half the number of snapshots to achieve the same DOA estimation performance as NOSA.

Fig. 6
figure 6

Comparison of computation time between CDAE-DNN, MUSIC, CNN, and ESPRIT. This demonstrates the proposed CDAE-DNN has lower computation complexity than traditional on-grid methods

To investigate the computational complexity of the proposed method, the relationship between the computation time and the sample number in the test set is given in Fig. 6. For two DL-based methods, CDAE-DNN and CNN, their computational complexities come from the generation of test sets as well as the processing of the neural networks. It can be seen that the computational time of the proposed method is lower than that of the CNN, the time difference is about 1 s as the number of samples less than 2000. Then compared with MUSIC, the computation time of CDAE-DNN is slightly higher when the sample number less than 500. However, as the number increases to about \(10^3\), its computation time is lower than that of MUSIC algorithm, and the largest gap can achieve 5 s as sample number grows to 5000.

5 Conclusion

In this paper, the OSA-based hybrid architecture is considered for DOA estimation with massive MIMO receive array, which can achieve a better balance between estimation resolution and hardware costs, and we also derive the CRLB for the OSA architecture. The simulation results show that OSA has better performance than NOSA especially when \(\textrm{SNR}\le -8\) dB, and the maximum accuracy difference is \(13.6^{\circ }\). Then in order to further improve the resolution of DOA estimation, the DL-based method CDAE-DNN is also proposed. This method is composed of two parts, where CDAE is used to remove the approximation error from SCM and then the denoised covariance matrix is sent to DNN for DOA estimation. In the simulation section, the proposed CDAE-DNN is compared to MUSIC and CNN, results show that CDAE-DNN is much better than other methods at \(\textrm{SNR}\le -8\) dB and \(N\le 1000\), and the computation complexity of CDAE-DNN is also lower when the number of samples greater than \(10^3\). The simulation results prove that CDAE-DNN has better performance under poor communication conditions and saves more computation resources.

6 Future works

The OSA architecture is a considerable solution for DOA estimation problems with large-scale antenna arrays, for it can bring a trade-off between estimation accuracy and hardware complexity. Especially, the OSA architecture can be generalized to two special cases, FC and NOSA, which are two most widely considered hybrid architectures, so the OSA can provide new ideas for the studies related to hybrid arrays, rather than considering only one aspect of performance or hardware complexity. DL is an another exciting technique utilized in this work, its importance has been proved by many works, and DL methods are gradually considered for DOA estimation, but mostly ordinary DNNs and CNNs. In the future, we will try to bring more new DL techniques such as recurrent neural network (RNN), graph learning, transformer, etc., and design some new integrated methods like the CDAE-DNN proposed in this paper.

Availability of data and materials

Not applicable.


  1. R.W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, A.M. Sayeed, An overview of signal processing techniques for millimeter wave MIMO systems. IEEE J. Sel. Top. Signal Process. 10(3), 436–453 (2016)

    Article  Google Scholar 

  2. F. Sohrabi, W. Yu, Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Topics Signal Process. 10(3), 501–513 (2016)

    Article  Google Scholar 

  3. F. Shu, Y. Qin, T. Liu, L. Gui, Y. Zhang, J. Li, Z. Han, Low-complexity and high-resolution DOA estimation for hybrid analog and digital massive MIMO receive array. IEEE Trans. Commun. 66(6), 2487–2501 (2018)

    Article  Google Scholar 

  4. M. Li, B. Zhang, B. Zhang, W. Liu, T. Kim, C. Wang, Directional modulation design for multi-beam multiplexing based on hybrid antenna array structures. EURASIP J. Adv. Signal Process. 2023(1), 1–16 (2023)

    Article  Google Scholar 

  5. R. Zhang, B. Shim, W. Wu, Direction-of-arrival estimation for large antenna arrays with hybrid analog and digital architectures. IEEE Trans. Signal Process. 70, 72–88 (2021)

    Article  MathSciNet  Google Scholar 

  6. N. Song, T. Yang, H. Sun, Overlapped subarray based hybrid beamforming for millimeter wave multiuser massive MIMO. IEEE Signal Process. Lett. 24(5), 550–554 (2017)

    Article  Google Scholar 

  7. A. Hassanien, S.A. Vorobyov, Phased-MIMO radar a tradeoff between phased-array and MIMO radars. IEEE Trans. Signal Process. 58(6), 3137–3151 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. J. Meng, Z. Wei, Y. Zhang, B. Li, C. Zhao, Machine learning based low-complexity channel state information estimation. EURASIP J. Adv. Signal Process. 2023(1), 1–11 (2023)

    Article  Google Scholar 

  9. X. Huang, R. Xu, W. Yu, T. Peng, Research on structural sound source localization method by neural network. EURASIP J. Adv. Signal Process. 2023(1), 54 (2023)

    Article  Google Scholar 

  10. E. Tuncer, B. Friedlander, Classical and Modern Direction-of-Arrival Estimation (Academic Press, Boston, 2009)

    Google Scholar 

  11. Z.-M. Liu, C. Zhang, S.Y. Philip, Direction-of-arrival estimation based on deep neural networks with robustness to array imperfections. IEEE Trans. Antennas Propag. 66(12), 7315–7327 (2018)

    Article  Google Scholar 

  12. D. Hu, Y. Zhang, L. He, J. Wu, Low-complexity deep-learning-based DOA estimation for hybrid massive MIMO systems with uniform circular arrays. IEEE Wirel. Commun. Lett. 9(1), 83–86 (2019)

    Article  Google Scholar 

  13. H. Huang, J. Yang, H. Huang, Y. Song, G. Gui, Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 67(9), 8549–8560 (2018)

    Article  Google Scholar 

  14. L. Wu, Z.-M. Liu, Z.-T. Huang, Deep convolution network for direction of arrival estimation with sparse prior. IEEE Signal Process. Lett. 26(11), 1688–1692 (2019)

    Article  Google Scholar 

  15. W. Zhu, M. Zhang, P. Li, C. Wu, Two-dimensional DOA estimation via deep ensemble learning. IEEE Access 8, 124544–124552 (2020)

    Article  Google Scholar 

  16. G.K. Papageorgiou, M. Sellathurai, Y.C. Eldar, Deep networks for direction-of-arrival estimation in low SNR. IEEE Trans. Signal Process. 69, 3714–3729 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  17. J. Yu, Y. Wang, Deep learning-based multipath DOAs estimation method for MM wave massive MIMO systems in low SNR. IEEE Trans. Veh. Technol. (2023)

  18. Y. Cao, T. Lv, Z. Lin, P. Huang, F. Lin, Complex resnet aided DOA estimation for near-field MIMO systems. IEEE Trans. Veh. Technol. 69(10), 11139–11151 (2020)

    Article  Google Scholar 

  19. C. Ying, W. Xiang, H. Zhitao, Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network. J. Syst. Eng. Electron. 32(6), 1354–1363 (2021)

    Article  Google Scholar 

  20. K. Yan, W. Jin, Y. Huang, P. Song, Z. Li, Deep adaptive temporal network (DAT-Net): an effective deep learning model for parameter estimation of radar multipath interference signals. EURASIP J. Adv. Signal Process. 2023(1), 94 (2023)

    Article  Google Scholar 

  21. P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008)

  22. X.-J. Mao, C. Shen, Y.-B. Yang, Image restoration using convolutional auto-encoders with symmetric skip connections. arXiv preprint arXiv:1606.08921 (2016)

  23. C.E. Chen, F. Lorenzelli, R.E. Hudson, K. Yao, Stochastic maximum-likelihood DOA estimation in the presence of unknown nonuniform noise. IEEE Trans. Signal Process. 56(7), 3038–3044 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. M. Pesavento, A.B. Gershman, Maximum-likelihood direction-of-arrival estimation in the presence of unknown nonuniform noise. IEEE Trans. Signal Process. 49(7), 1310–1324 (2001)

    Article  Google Scholar 

  25. H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)

Download references


Not applicable.


This work was supported in part by the National Natural Science Foundation of China (Nos. U22A2002, and 62071234), the Hainan Province Science and Technology Special Fund (ZDKJ2021022), and the Scientific Research Fund Project of Hainan University under Grant KYQD(ZR)-21008.

Author information

Authors and Affiliations



Not applicable.

Corresponding author

Correspondence to Yifan Li.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of CRLB for HAD-OSA

Appendix: Derivation of CRLB for HAD-OSA

Referring to the derivation in [10], the Fisher information matrix (FIM) related to \(\varvec{\theta }\) is given as

$$\begin{aligned} \textbf{F}=\left[ \begin{array}{ccc} F_{\theta _1\theta _1} &{} \cdots &{} F_{\theta _1\theta _Q} \\ \vdots &{} \ddots &{} \vdots \\ F_{\theta _Q\theta _1} &{} \cdots &{} F_{\theta _Q\theta _Q} \\ \end{array} \right] \end{aligned}$$

and its element \(F_{\theta _p\theta _q}\) can be expressed as

$$\begin{aligned} F_{\theta _p\theta _q}=\text {tr}\left( \textbf{C}^{-1}\frac{\partial \textbf{C}}{\partial \theta _p}\textbf{C}^{-1}\frac{\partial \textbf{C}}{\partial \theta _q}\right) \end{aligned}$$

where \(1\le p,q\le Q\), \(\textbf{C}\) is the covariance matrix of received signal and

$$\begin{aligned} \frac{\partial \textbf{C}}{\partial \theta _q}&=\frac{\partial \tilde{\textbf{A}}}{\partial \theta _q}\textbf{C}_s\tilde{\textbf{A}}^H+\tilde{\textbf{A}}\textbf{C}_s{\frac{\partial \tilde{\textbf{A}}}{\partial \theta _q}}^H\\&=\textbf{W}^H\textbf{D}_q\textbf{C}_s\tilde{\textbf{A}}^H+\tilde{\textbf{A}}\textbf{C}_s\textbf{D}_q^H\textbf{W}\\&=\textbf{W}^H\textbf{D}\textbf{e}_q\textbf{e}_q^T\textbf{C}_s\tilde{\textbf{A}}^H+\tilde{\textbf{A}}\textbf{C}_s\textbf{e}_q\textbf{e}_q^T\textbf{D}^H\textbf{W} \end{aligned}$$

where \(\tilde{\textbf{A}}=\textbf{W}^H\textbf{A}\), \(\textbf{e}_q\) denotes the qth column of identity matrix \(\textbf{I}_Q\) and

$$\begin{aligned}&\textbf{D}=\sum _{q=1}^Q\textbf{D}_q, \end{aligned}$$
$$\begin{aligned}&\textbf{D}_q=\left[ {\textbf {0}}_{M\times (q-1)}~~\textbf{d}_q\textbf{a}(\theta _q)~~{\textbf {0}}_{M\times (Q-q)}\right] , \end{aligned}$$
$$\begin{aligned}&\textbf{d}_q=\text {diag}\left\{ 0,j\frac{2\pi }{\lambda }d\cos \theta _q,\cdots ,j\frac{2\pi }{\lambda }(M-1)d\cos \theta _q\right\} , \end{aligned}$$

according to the equation \(\text {tr}(\textbf{A}^H)=\text {tr}(\textbf{A})^{*}\) we can get

$$\begin{aligned} F_{\theta _p\theta _q} =&\text {tr}\left\{ (\textbf{F}_1+\textbf{F}_1^H)\textbf{C}^{-1}(\textbf{F}_2+\textbf{F}_2^H)\textbf{C}^{-1}\right\} \\ =&\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2\textbf{C}^{-1}\right\} +\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2^H\textbf{C}^{-1}\right\} +\\&\text {tr}\left\{ \textbf{F}_1^H\textbf{C}^{-1}\textbf{F}_2\textbf{C}^{-1}\right\} +\text {tr}\left\{ \textbf{F}_1^H\textbf{C}^{-1}\textbf{F}_2^H\textbf{C}^{-1}\right\} \\ =&\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2\textbf{C}^{-1}\right\} +\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2\textbf{C}^{-1}\right\} ^{*}+\\&\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2^H\textbf{C}^{-1}\right\} +\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2^H\textbf{C}^{-1}\right\} ^{*}\\ =&2\text {Re}\left\{ \text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2\textbf{C}^{-1}\right\} +\text {tr}\left\{ \textbf{F}_1\textbf{C}^{-1}\textbf{F}_2^H\textbf{C}^{-1}\right\} \right\} \end{aligned}$$

where \(\textbf{F}_1=\textbf{W}^H\textbf{D}_p\textbf{C}_s\tilde{\textbf{A}}^H\) and \(\textbf{F}_2=\textbf{W}^H\textbf{D}_q\textbf{C}_s\tilde{\textbf{A}}^H\). Since \(\textbf{D}_q=\textbf{D}\textbf{e}_q\textbf{e}_q^T\), then the equation above can be further transformed as

$$\begin{aligned} F_{\theta _p\theta _q}=&2\text {Re}\bigg \{\text {tr}\left\{ \textbf{W}^H\textbf{D}\textbf{e}_p\textbf{e}_p^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\textbf{W}^H\textbf{D}\textbf{e}_q\textbf{e}_q^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\right\} \\&+\text {tr}\left\{ \textbf{W}^H\textbf{D}\textbf{e}_p\textbf{e}_p^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\tilde{\textbf{A}}\textbf{C}_s\textbf{e}_q\textbf{e}_q^T\textbf{D}^H\textbf{W}\textbf{C}^{-1}\right\} \bigg \}\\ =&2\text {Re}\bigg \{\textbf{e}_p^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\textbf{W}^H\textbf{D}\textbf{e}_q\textbf{e}_q^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\textbf{W}^H\textbf{D}\textbf{e}_p\\&+\textbf{e}_p^T\textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\tilde{\textbf{A}}\textbf{C}_s\textbf{e}_q\textbf{e}_q^T\textbf{D}^H\textbf{W}\textbf{C}^{-1}\textbf{W}^H\textbf{D}\textbf{e}_p\bigg \} \end{aligned}$$

then by combining all the \(Q^2\) elements, the FIM of \(\varvec{\theta }\) can be expressed by

$$\begin{aligned} \textbf{F}=&2\text {Re}\bigg \{\left( \textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\textbf{W}^H\textbf{D}\right) \odot \left( \textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\textbf{W}^H\textbf{D}\right) ^T\\&+\left( \textbf{C}_s\tilde{\textbf{A}}^H\textbf{C}^{-1}\tilde{\textbf{A}}\textbf{C}_s\right) \odot \left( \textbf{D}^H\textbf{W}\textbf{C}^{-1}\textbf{W}^H\textbf{D}\right) ^T\bigg \}. \end{aligned}$$

By collecting the signals at all the N snapshots, the CRLB is given as

$$\begin{aligned} \text {CRLB}=\frac{1}{N}\textbf{F}^{-1}. \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Shi, B., Shu, F. et al. Deep learning-based DOA estimation for hybrid massive MIMO receive array with overlapped subarrays. EURASIP J. Adv. Signal Process. 2023, 110 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: