Deep learning-based DOA estimation for hybrid massive MIMO receive array with overlapped subarrays

As massive MIMO is a key technology in the future sixth generation (6G), the large-scale antenna arrays are widely considered in direction-of-arrival (DOA) estimation for they can provide larger aperture and higher estimation resolution. However, the conventional fully digital architecture requires one radio-frequency (RF) chain per antenna, and this is challenging for the high hardware costs and much more power consumption caused by the large number of RF chains. Therefore, an overlapped subarray (OSA) architecture-based hybrid massive MIMO array is proposed to reduce the hardware costs, and it can also have better DOA estimation accuracy compared to non-overlapped subarray (NOSA) architecture. The simulation results also show that the accuracy of the proposed OSA architecture has 6∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6^{\circ }$$\end{document} advantage over the NOSA architecture with signal-to-noise ratio (SNR) at 10 dB. In addition, to improve the DOA estimation resolution, a deep learning (DL)-based estimator is proposed by combining convolution denoise autoencoder (CDAE) and deep neural network (DNN), where CDAE can remove the approximation error of sample covariance matrix (SCM) and DNN is used to perform high-resolution DOA estimation. From the simulation results, CDAE-DNN can achieve the accuracy lower bound at SNR=-8\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{SNR}=-8$$\end{document} dB and the number of snapshots N=100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=100$$\end{document}, this means it has better performance in poor communication situation and can save more software resources compared to conventional estimators.


I. INTRODUCTION
Direction-of-arrival (DOA) estimation has been an important research direction in the areas of wireless communications, radar, sonar, etc, for a long time.With the development of 5G, the massive MIMO system has been studied extensively.However, the realization of the traditional full digital system requires a high hardware complexity, so the hybrid analog and digital (HAD) system was considered as an alternative [1].Then the DOA estimation problem for the HAD system was discussed in [2] and [3].In addition to the common architecture, various special architecture were considered in [4] [5], and the overlapped subarrays (OSA) architecture in [6] was proved to have better beamforming performance than nonoverlapped subarrays (NOSA) architecture.
Traditional DOA estimation methods are mainly divided into two categories: parameter estimation-based methods and subspace methods [7].The first category contains nonlinear least-square (NLS) estimator and maximum likelihood (ML) estimator, while the classical subspace methods include MU-SIC, ESPRIT, root-MUSIC, etc.Recently, the deep learning (DL)-based methods have been new choices for solving the DOA estimation problems, since they have lower complexity than parameter estimation-based methods and higher accuracy than subspace methods.In [8], a deep neural network (DNN) was proposed for DOA estimation with array imperfections.The convolution neural network (CNN) was also used in [9] for the improvement of accuracy in the low signal-to-noiseratio (SNR) regime.And [10] gave a DNN-based method for the DOA estimation with HAD massive array.The DOA estimation problem with low-resolution ADC was considered in [11] and [12].A fast ambiguous elimination method for DOA estimation was also proposed in [13].
The autoencoder (AE) is a kind of neural network, and it is trained to copy the input to the output.In [8], the AE was used to map the inputs into the corresponding DNN network.When the input data contains noise, we can obtain the noiseless data by denoising autoencoder (DAE) proposed by [14].And replace the hidden layers in the DAE with the convolution layers, then the convolution denoising autoencoder (CDAE) which is widely used in the field of image processing is obtained [15].
In this letter, the DL-based DOA estimation method CDAE-DNN is proposed for the HAD massive MIMO receive array with overlapped subarrays.Our main contributions are summarized as follows: 1) To improve the accuracy of DOA estimation for hybrid massive MIMO array, the HAD-OSA architecture is implied in this work.As the number of elements in each subarray is the same as the NOSA architecture, the OSA architecture has more RF chains to achieve a larger virtual aperture, and it can get more accurate estimation results.The simulation results also show that the OSA has better performance than NOSA when the SNR and the number of snapshots are low.And the CRLB for the special HAD-OSA architecture is also given in this work.2) In order to solve the DOA estimation problem for HAD-OSA architecture, a DL-based method called CDAE-DNN is also proposed in this letter.In this method, the input data is first imported to the CDAE for clearing errors, and then the FC network is employed to perform the multi-classification task.Comparing the simulation results of the proposed CDAE-DNN, MUSIC and CNN in [9], it is obvious that the CDAE-DNN has significant advantages over the other methods.Especially, the CDAE-DNN can achieve the accuracy lower bound at SNR=-10dB when N = 100, but MUSIC and CNN in [9] need SNR≥-5dB to achieve the same bound.Notation:Matrices, vectors, and scalars are denoted by letters of bold upper case, bold lower case, and lower case, respectively.Signs (•) T and (•) H represent transpose and conjugate transpose.I and 0 denote the identity matrix and matrix filled with zeros.Re{•} and Im{•} represent the real part and imaginary part of a complex number.

RF chain RF chain ADC ADC
Baseband signal processing q ...

II. SYSTEM MODEL
Consider Q far-field narrow-band signal received by a massive MIMO receiver equipped with an M -element uniform linear array (ULA).The qth signal is expressed as s q (t)e j2πfct , where s q (t) is baseband signal and f c is carrier frequency.As shown in Fig. 1, we divide this array into K overlapped subarrays, each of which is connected to a RF chain and composed of M s antenna elements.The number of overlapped antennas between two adjacent subarrays is denoted by ∆M s , and we can get M = KM s − (K − 1)∆M s .As special cases, when ∆M s = M or ∆M s = 0, this array is with fullyconnected hybrid architecture or NOSA hybrid architecture.After down conversion and analog/digital conversion, the received baseband signal is formulated as where s(n) is assumed as a stationary zero-mean Gaussian random process. where is the corresponding phase of mth phase shifter in kth subarray and it is clear that two adjacent columns of W are overlapped.
The received signal of kth subarray is expressed as where C Ms×Q is the array steering matrix of kth subarray and J k is a M s × M selection matrix which only contains 0 and 1.The noise vector of the kth subarray is also given as As the signal and noise are assumed uncorrelated, the covariance matrix of received signal ( 1) is defined as where C s = E s(n)s H (n) .However, the covariance matrix C is usually unavailable in practice, then the sample covariance matrix C can be employed as an approximation where ε denotes the approximation error.

A. Data Reprocessing
To ensure the stability of the input data and improve the accuracy of the neural network model, we choose the sampled covariance matrix C as the input feature, which is an alternative to the unavailable covariance matrix C.However, the input of neural networks must be real numbers, so we extract both the real part and the imaginary part of C and construct a K × K × 2 tensor R, i.e., R:,:,1 = Re{ C} and R:,:,2 = Im{ C}.

Then the label vector
input data is defined as follows.Firstly, we assume the angular region containing all the emitters is [−θ 0 , θ 0 ], and the label interval is ∆θ which is determined by resolution requirement.Therefore, the length of z is given as L = 2θ0 ∆θ + 1.And z is a binary vector containing label 1 at the positions corresponding to the Q training angles and label 0 at the rest positions.So the training dataset can be finally expressed by

B. Convolution DAE
The traditional autoencoder (AE) is a kind of neural network consisting of three parts: encoder, code, and decoder.The input data is first compressed to a lower dimension form, i.e. code, by the encoder, and then the decoder recovers the code to the initial form of the input data.Encoder and decoder have symmetric neural network architectures to perform the opposite operations, so the traditional autoencoders can be summarized as a two-step process where f (•) and g(•) denote encode and decode operations respectively, r represents the code.Since the input data contains noise ε, the autoencoder can output the noiseless data rather than a simple copy of the input.That is, the two-step process of the autoencoder is transformed to: r = f ( R) and R = g(r).Otherwise, because the input data is a K × K × 2 tensor, we consider using the convolution network to implement the function of both the encoder and the decoder.Next, we are going to introduce the complete procedure of the proposed convolutional DAE.
Firstly, assuming the encoder is constructed by a H-layers convolution network, the encode function can be modified as and each layer contains a convolution layer, a batch normalization (BN) layer and an activation layer.For the H convolution layers, each has G h filters h ∈ {1, 2, • • • , H}.Since the input data is 2-channel, the size of the first convolution layer is κ 1 ×κ 1 ×2×G 1 .And the sizes of the other H −1 convolution layers are given by κ h × κ h × G h .Therefore, the output of the hth convolution layer can be denoted by where c(•) denotes convolution operation, K h,u represents the uth filter in the hth convolution layer.r h−1 is the output of the corresponding layer in the encoder, r 0 = R. δ h denotes the stride.b h,u is the bias matrix of the uth filter.The activation function adopted here is RELU, so that the layer output of the encoder can be obtained as and r = {r H,u } GH u=1 .Contrary to the encoder, the decoder is required to restore the extracted feature to the form of the original input, which is an upsampling process, also called deconvolution in.Similar to (7), the decode function is expressed by since the structure of the decoder is symmetric with the encoder, each layer of decoder also contains convolution, BN and activation layers.And the layers' sizes are the same as that of encoder, i.e., size(g h ) = size(f H−h+1 ).It is obvious that in practical application the DAE cannot completely remove the noise ε, so the output of the decoder here is R rather than R.
In the DAE training period, our goal is to find the optimal network parameters based on the training dataset.Thus, we choose MSE as the loss function, and it is defined as where Θ contains all the weights and biases in the DAE network.

C. Proposed CDAE-DNN
As shown in Fig. 2, the extracted feature tensor R is first inputted to the CDAE for eliminating the estimation error ε.Then the output R is inputted to a (H F C + 2)-layers fullyconnected (FC) network.The first layer is a flatten layer, which is used for transforming R into a 2K 2 × 1 vector.And it is followed by H F C dense layers, each containing We also choose RELU as the activation function for them, and to achieve regularization in the learning process, the dropout ratio is set as 20%.Therefore, the output of h F C th dense layer is given as where w hF C and b hF C denote weight vector and bias vector respectively.When The last layer of the FC network is the output layer with L neurons, and the form of the final output vector is expressed as In order to satisfy 0 ≤ z l ≤ 1, l ∈ {1, 2, • • • , L}, the activation function for this layer can use sigmoid, which is defined as Then the Q biggest elements are selected from ẑ, and their corresponding angles are the estimation results.Since this is a multi-label problem and we want the final output vectors in the form of probability distributions, we decide to use the binary cross-entropy (BCE) as loss function, which is given by where i ∈ {1, 2, • • • , T }, then the optimal weights and biases of the FC network can be obtained by minimizing it.

IV. SIMULATION RESULTS
In this section, the simulation results are provided for evaluating the performance of the proposed DNN based DOA estimator for the HAD-OSA architecture, and all the simulations related to deep learning are done based on TensorFlow.Firstly, we assume the massive ULA has M = 128 elements, and the distance between two adjacent elements is λ/2.Since the OSA architecture is employed in this work, the number of elements in each subarray is M s = 16 and the overlapped number is ∆M s = 8.Then the number of RF chains is obtained as K = 15.We also suppose the signal source is within the angular range [−90 • , 90 • ], and the angular interval is set as ∆θ = 1 • .In the training period, the CDAE contains one input layer, three convolution layers and one output layer.The FC network contains one flatten layer, three dense layers and one output layer.The three dense layers have 2048, 4096, 2048 neurons respectively, and the output layer also has 181 neurons.The training dataset contains approximately 60000 samples, the batch size and the number of epochs are set as 1000 and 30.Finally, we choose SGD as the optimizer and learning rate is set as 0.1.
Fig. 3 displays how the DOA estimation accuracy varies with the improvement of SNR.In this simulation, the direction of signal source is set as θ = 10.1 • , the number of snapshots is N = 100, the range of SNR is -20dB to 10dB, and all the simulation results are averaged over 1000 Monte-Carlo experiments.Except for the proposed CDAE-DNN in this work, we also take three existing methods into consideration as benchmarks.The first is NOSA [3], since there is no overlapped region between two adjacent subarrays of it, we let its subarray elements is the same as OSA and the number of RF chains is K NOSA = 8.The second is a CNN estimator proposed in [9], which contains four convolution layers and four FC layers.The last is MUSIC [7], which is the most popular subspace method for the DOA estimation.Since predicting DOA by using DNN is essentially a multiclassification problem, and the implementation principle of the MUSIC algorithm is also based on the grid search, then there is a lower bound on the estimation accuracy of these methods when the angle to be estimated is off-grid, as shown in Fig. 3.This lower bound is dependent on the grid size, which is set as 1 in this simulation, and hence the best estimation RMSE is 0.1.As also can be seen in Fig. 3, the proposed CDAE-DNN achieves a significant improvement in estimation accuracy, especially in the low SNR region.Compared with other deep learning-based methods, our method has a greater advancement over the traditional methods.And the comparison with NOSA also proves that OSA can get higher estimation accuracy.Fig. 4 shows the relationship between RMSE and the number of snapshots in the environment with SNR=-13dB.The error decreases as N increases and eventually reaches the accuracy lower-bound of 0.1.As can be seen in this figure, our proposed method has great performance advantages under low number of snapshots, especially N ≥ 500, so it can save a lot of resource overhead compared with the traditional MUSIC algorithm.OSA has also been shown to significantly improve the accuracy of HAD architectures.
The simulation results in multiple signal source scenarios are given in Fig. 5, where Q = 2, θ 1 = 10.1 • and θ 2 = 20.1 • .The overall trend of the curves is similar to that in the single-target case.The proposed CDAE-DNN has a significant advantage over the MUSIC algorithm, and also has better performance than CNN [9] in medium-high SNR region.

V. CONCLUSION
In this letter, a DL-based DOA estimation method called CDAE-DNN was proposed for the hybrid massive MIMO array with OSA architecture.This estimator was composed of a CDAE and a FC network, the sample covariance matrix was chosen as the input data and its approximation error can be removed by CDAE.The FC network was trained to predict the label of the input signal.Simulation results validated the performance of the proposed CDAE-DNN, which has significant advantages over traditional MUSIC algorithms and other DL-based methods with low SNR and low snapshots.OSA was also proved to be a reliable option if committed to improving the accuracy of DOA estimation for hybrid array.Finally, the CRLB for the proposed architecture was derived.
APPENDIX: DERIVATION OF CRLB FOR HAD-OSA Referring to the derivation in [16] and [7], the Fisher information matrix (FIM) related to θ is given as and its element F θpθq can be expressed as where 1 ≤ p, q ≤ Q, C is the covariance matrix of received signal and = W H D q C s ÃH + ÃC s D H q W = W H De q e T q C s ÃH + ÃC s e q e T q D H W (18 where Ã = W H A, e q denotes the qth column of identity matrix I Q and D q = 0 M×(q−1) d q a(θ q ) 0 M×(Q−q) , (19b) according to the equation tr(A H ) = tr(A) * we can get (20) therefore, by combining all the elements in F can obtain
21)and by collecting the signals at all the N snapshots, the CRLB is