This section describes the proposed SBL scheme for sparse beamspace channel vector estimation. We begin by introducing the proposed hierarchical prior model. Then, we use the DGAMP-SBL to estimate the mmWave channel. After that, we propose a refined algorithm to get the accurate estimation of AoAs and AoDs. And then we propose a refined algorithm to estimate the AoAs and AoDs accurately. Finally, we analyze the computation complexity of the proposed algorithm and make a comparison to the complexity of SBL.

### 4.1 Hierarchical prior model

We have the following results using the assumption of circular symmetric complex Gaussian noises

$$\begin{aligned} \begin{aligned} p({{\mathbf {y}}}|{{\mathbf {c}}},\lambda ) = \mathcal {C}\mathcal {N}\left( {{{\mathbf {y}}}|{\varvec{\Phi }} {\mathbf {c}},{\lambda ^{ - 1}}{{\mathbf {I}}}} \right) , \end{aligned} \end{aligned}$$

(12)

where \(\lambda = {\sigma ^{-2}}\) stands for the noise precision. In order to leverage the sparse structure of the underlying mmWave channel, we enforce sparsity constraints on the channel vector which is commonly used in the sparse Bayesian model. The sparse channel vector \({{\mathbf {c}}}\) is assumed to follow a two hierarchical prior Gaussian distribution. In the first layer, \({{\mathbf {c}}}\) is assumed that the entries of \({{\mathbf {c}}}\) are independent and identically distributed, i.e.

$$\begin{aligned} \begin{aligned} p({{\mathbf {c}}}|{\gamma })\sim \prod \limits _{i = 1}^{{N_1}{N_2}} \mathcal {C} \mathcal {N}\left( {{c_i};0,\gamma _i^{ - 1}} \right) , \end{aligned} \end{aligned}$$

(13)

where \({\gamma _i}\) denotes the inverse variance, and \({ \gamma } = [{\gamma _1},{\gamma _2}, \ldots ,{\gamma _{{N_1}N_2}}]\). Meanwhile, a Gamma prior is employed for the inverse variance vector

$$\begin{aligned} \begin{aligned} p({{{\gamma }} })&= \prod \limits _{i = 1}^{{N_1}{N_2}} {{\text {Gamma}} } \left( {{\gamma _i}\mid a + 1,b} \right) \\&= \prod \limits _{i = 1}^{{N_1}{N_2}} {{\Gamma ^{ - 1}}(a + 1){b^a}\gamma _i^a{\mathrm{{e}}^{ - b{\gamma _i}}}}, \end{aligned} \end{aligned}$$

(14)

where *a*, *b* are parameters associated with the above distribution, and \(\Gamma (a) = \int _0^\infty {{t^{a - 1}}} {\mathrm{{e}}^{ - t}}\) is the Gamma function. Besides, \({{\begin{array}{c} \frown \\ {\mathbf {n}} \end{array}}}\) is Gaussian noise with zero mean and covariance matrix \(\left( {1/\lambda } \right) {{\mathbf {I}}}\). We set a Gamma hyperprior over \(\lambda\): \(p(\lambda ) = {\text {Gamma}} (\lambda \mid c+1,d) = \Gamma {(c+1)^{ - 1}}{d^{\left( c+1 \right) }}{\lambda ^{c}}{\mathrm{{e}}^{ - d\lambda }}\).

To obtain a broad hyperprior, we set \(a,b \rightarrow 0\) [35, 36]. This two-stage hierarchical prior gives

$$\begin{aligned} \begin{aligned} p({c_i}) = \int _0^\infty {p({c_i}\mid \gamma _i^{ - 1})p({\gamma _i})} d{\gamma _i}, \end{aligned} \end{aligned}$$

(15)

which is helpful to obtain sparse solutions due to the sharp peak and heavy tails with small *a* and *b*. Actually, according to paper [37], the maximum posterior estimation of \({{\mathbf {c}}}\) is consistent with \({l_0}\)-norm solution in formula (10) by FOCUSS with \(p \rightarrow 0\). To update the parameter \(\theta = \left\{ {{{\gamma }}, \lambda } \right\}\), we can also use maximum posteriori estimation to achieve the most probable values, i.e.,

$$\begin{aligned} \begin{aligned} \left( {{{{{\gamma }} }^*},{\lambda ^*}} \right) = \arg {\max _{{{{\gamma }} },\lambda }} \; p({{{\gamma }} ,}\lambda |{{\mathbf {y}}}), \end{aligned} \end{aligned}$$

(16)

or, equivalently,

$$\begin{aligned} \begin{aligned} \left( {{{{{{\gamma }}} }^*},{\lambda ^*}} \right) = \arg {\max _{{{{\gamma }} },\lambda }}\ln p({{{\gamma }} ,}\lambda ,{{\mathbf {y}}}). \end{aligned} \end{aligned}$$

(17)

Then, the EM-SBL algorithm is employed to learn the sparse vector \({\mathbf {c}}\) and iteratively update the hyper-parameters \({{\theta }} = \left\{ {{{\gamma }}, \lambda } \right\}\). Note that the key step is to update the hyper-parameters \(p({{\gamma ,}}\lambda \mid {{\mathbf {y}}})\) by maximizing the posterior probability when the EM-SBL is in the updating phase. And in the stage of E-step, the likelihood function can be written as follows

$$\begin{aligned} \begin{aligned} p\left( {{{\mathbf {y}}}|{{\mathbf {c}}};{\sigma ^2}} \right) = \frac{1}{{{{\left( {2\pi {\sigma ^2}} \right) }^{{N_xN_y}}}}}\exp \left( { - \frac{1}{{2{\sigma ^2}}}\Vert {{\mathbf {y}}} - {{\varvec{\Phi }} }{{{\mathbf {c}}}^2}}\Vert \right) . \end{aligned} \end{aligned}$$

(18)

As noted before, the conditional density \(p\mathrm{{(}}{{\mathbf {c}}}\mathrm{{|}}{{\mathbf {r}}}\mathrm{{)}}\) shows \({\mathbf {c}}\) is Gaussian distribution. When we treat \({{\mathbf {c}}}\) as a hidden variable, its posterior distribution we need to compute conditioned on the observed vector \({{\mathbf {y}}}\) and the updated hyper-parameters \(\theta\) is a complex Gaussian [35] function

$$\begin{aligned} & p\left( {\left. {\mathbf{c}} \right|{\mathbf{y}},\gamma ^{{(t)}} ,\lambda ^{{(t)}} } \right) = \frac{{p\left( {{\mathbf{c}}|\gamma ^{{(t)}} } \right)p\left( {{\mathbf{y}}|{\mathbf{c}},\lambda ^{{(t)}} } \right)}}{{p({\mathbf{y}}|\gamma ^{{(t)}} ,\lambda ^{{(t)}} )}} \\ & \quad = (2\pi )^{{ - \frac{{(N_{1} N_{2} + 1)}}{2}}} \left| {{\mathbf{\Sigma }}_{c} } \right|^{{ - \frac{1}{2}}} \exp \left\{ { - \frac{1}{2}({\mathbf{c}} - \mu _{c} )^{{\text{T}}} {\mathbf{\Sigma }}^{{ - 1}} ({\mathbf{c}} - \mu _{c} )} \right\}, \\ \end{aligned}$$

(19)

where the posterior covariance \({{{\varvec{\Sigma }}}_c}\) and mean \({{{\mu }}_c}\) are given, respectively,

$$\begin{aligned} {\mathbf{\Sigma }}_{c} & = \left( {\lambda {\mathbf{\Phi }}^{T} {\mathbf{\Phi }} + {\mathbf{D}}} \right)^{{ - 1}} \\ \mu _{c} & = \lambda {\mathbf{\Sigma }}_{c} {\mathbf{\Phi }}^{T} {\mathbf{y}}, \\ \end{aligned}$$

(20)

where \({{\mathbf {D}}} = {\text {diag}} \left( {{\gamma _0},{\gamma _1}, \ldots ,{\gamma _{{N_1}{N_2}}}} \right)\), \({{{\varvec{\Sigma }} }_c}\) and \({{{{\mu }} }_c}\) are the posterior mean and variance with relevant for \(p\left( {{{\mathbf {c}}}\mid {{\mathbf {y}}},{{{{\gamma }} }^{(t)}},{\lambda ^{(t)}}} \right)\), respectively. We assume that \({{{\tau }}_c}\) is the vector whose elements are composed of the diagonal of the covariance matrix \({{{\varvec{\Sigma }}}_c}\). As mentioned above, in the EM algorithm iterative process, the hyper-parameters are updated by iteratively maximizing the *R*-function, i.e.,

$$\begin{aligned} \begin{aligned} {{{{\theta }} }^{(t + 1)}}&\triangleq \arg {\max _{{{\theta }} }} \; R\left( {{{{\theta }} }|{{{{\theta }} }^{(t)}}} \right) \\&\triangleq \arg {\max _{{{\theta }} }}{\; E_{{{\mathbf {c}}}|{{\mathbf {y}}},{{{{\theta }} }^{(t)}}}}[\log p({\theta }|{{\mathbf {c}}},{{\mathbf {y}}})] . \\ \end{aligned} \end{aligned}$$

(21)

Using Bayesian rule, (21)) can be rewritten by ignoring part unrelated to \({{\theta }}\) as follows

$$\begin{aligned} & E_{{{\mathbf{c}}/{\mathbf{y}};\gamma ^{t} ,\lambda }} \left[ {\log p(\gamma ^{t} ,\lambda |{\mathbf{c}},y)} \right] \\ & \quad = E_{{{\mathbf{c}}/{\mathbf{y}};\gamma ^{t} ,\lambda }} \left[ { - \log p({\mathbf{y}}|{\mathbf{c}};\lambda ) - \log p(c|\gamma ) - \log p(\gamma )} \right]. \\ \end{aligned}$$

(22)

Firstly. the algorithm carries out the M-step for the hyper-parameters \(\left\{ {{\gamma _n}} \right\}\). We take the partial derivative of the R-function with respect to \({\gamma _n}\) with eliminating independent terms. Since the first term in (22) does not depend on \({{{\gamma }} }\), it can be ignored as it will not be relevant for the M-Step. The objective function in (22) becomes

$$\begin{aligned} & E_{{{\mathbf{c}}|{\mathbf{y}},\theta ^{{(t)}} }} [\log p(\theta |{\mathbf{c}},{\mathbf{y}})] \\ & \quad = E_{{{\mathbf{c}}|{\mathbf{y}},\theta ^{{(t)}} }} \left[ { - \log p(c|\gamma ) - \log p(\gamma )} \right] \\ & \quad = \sum\limits_{{n = 1}}^{{N_{1} N_{2} }} {\left( {\frac{{\gamma _{n} \left( {\hat{c}_{n}^{2} + \tau _{{c_{n} }} } \right)}}{2} - \frac{1}{2}\log \gamma _{n} + \log p(\gamma _{n} )} \right)} \\ & \quad = \sum\limits_{{n = 1}}^{{N_{1} N_{2} }} {\left( {\left( {\frac{{\gamma _{n} \left( {\hat{c}_{n}^{2} + \tau _{{c_{n} }} } \right)}}{2} - \frac{1}{2}\log \gamma _{n} + a\log \gamma _{n} - b\gamma _{n} } \right)} \right)} . \\ \end{aligned}$$

(23)

We take the partial derivative of the R-function with respect to \({\gamma _n}\) with eliminating independent terms, and the iteration of \({\gamma _n}\) can be denoted by

$$\begin{aligned} \begin{aligned} \gamma _n^{i + 1} = \mathop {\arg \min }\limits _{{\gamma _n}} \;\left( {\frac{{{\gamma _n}\left( {{\hat{c}}_n^2 + {\tau _{{c_n}}}} \right) }}{2} +\log p({\gamma _n}) -\frac{1}{2}\log {\gamma _n} } \right) . \end{aligned} \end{aligned}$$

(24)

According to the hyperprior \(p({\gamma _n})\) which possesses a non-informative when the parameter *a* and *b* tend to zero, we can simplify the update formalization as

$$\begin{aligned} \begin{aligned} \gamma _n^{i + 1} = \frac{1}{{{\hat{c}}_n^2 + {\tau _{{c_n}}}}}. \end{aligned} \end{aligned}$$

(25)

Similarly, we then compute the estimation of the scalar hyper-parameter \(\lambda\) and it can be updated as

$$\begin{aligned} \begin{aligned} {\lambda ^{i + 1}}&= \arg \max {E_{c|y,{\gamma };\lambda }}[p({{\mathbf {y}}},{{\mathbf {c}}},{\gamma };\lambda )] \\&= \frac{{{{\left\| {{{\mathbf {y}}} - {\varvec{\Phi }} {\mathbf {c}}} \right\| }^2} + {{\left( \lambda \right) }^i}\sum \limits _{n = 1}^{{N_1}{N_2}} {(1 - \frac{{{\tau _{{c_n}}}}}{{{\gamma _n}}})} }}{N} , \\ \end{aligned} \end{aligned}$$

(26)

where \(N = {N_x}{N_y}\) represents the dimension of \({{\mathbf {y}}}\). According to the references and practical implementation tips, we can consistently set the constant as follows: \(a=b=c=d=0.0001\), since the result of constant initialization has little effect on estimation performance.

### 4.2 Update by DGAMP

As mentioned earlier, it is obvious that the calculation of the posterior mean and posterior variance which involve the inversion of high-dimensional matrices is extensive. Hence, the high computational complexity characteristic of the EM-SBL algorithm causes that it is impractical to be adopted by the massive MIMO channel estimation. To simplify the calculation, we replace posterior calculation with the GAMP algorithm which is a very-low-complexity Bayesian iterative technique. It is noted that the hyper-parameters \(\{ { {\gamma },\lambda }\}\) are considered as known constants during the iterative process of the GAMP algorithm.

GAMP is a fast heuristic algorithm and can be utilized for simplifying matrix inversion within the SBL framework [38, 39]. GAMP algorithm obtains the maximum posterior estimation of \({\mathbf {c}}\) by Taylor approximation. Specifically, the process of iteratively computing the marginal posterior \(p\left( {{c_n}\mid {{\mathbf {y}}},{\gamma },\lambda } \right)\) is performed by message passing on the GAMP factor graph. By utilizing the condition that all posteriors are Gauss, the process can be simplified by replacing posterior probability with expectation and variance of the sparse variables \(\left\{ {{c_n}} \right\}\) and mixture variables \(\left\{ {{z_m}} \right\}\) whose elements are denoted by \({{\mathbf {z}}} = {{{\varvec{\Phi }}} {\mathbf {c}}}\). To detour the convergence of GAMP whose measurement matrix satisfies independent Gaussian distribution, paper [40] proposes a DGAMP algorithm to improve the robustness of the measurement matrix through importing damping factors \({\rho _s}\), \({\rho _c} \in \left( {0,1} \right]\), but it will also slow down the convergence speed. Nevertheless, the process is computationally efficient since that it only contains scalar operations. Then, we summarize the key steps of DGAMP in Algorithm 1. The readers can refer to [22] to learn more about details of the derivation of DGAMP.

There are two versions which are the max-sum version and sum-product versions damp-GAMP algorithm. The input and output functions \({g_s}\left( {{{\mathbf {p}}},{{{\tau }}_p}} \right)\) and output functions \({g_x}\left( {{\mathbf {r}},\tau _\mathbf {r}} \right)\) in Algorithm 1 are distinguished according to whether the max-sum or the sum-product version of GAMP. Coincidentally, both the sum-product and max-sum version simplify the same equation. We only introduce the functions of the input and output of the sum-product, and readers can refer [41] to get furthermore detail. The intermediate variables \({{\mathbf {r}}}\) and \({{\mathbf {p}}}\) are explained as approximations of Gaussian noise corrupted of \({{\mathbf {c}}}\) and \({{\mathbf {z}}} = {{{\varvec{\Phi }}} {\mathbf {c}}}\) with the noise levels of \({{{\tau }}_r}\) and \({{{\tau }}_p}\), respectively. The difference between the sum-product and max-sum version is the estimation strategy. The sum-product version uses the vector minimum mean-squared error (MMSE) estimation which is reduced to a sequence of scalar MMSE estimation. The input and output functions are shown as follows

$$\begin{aligned}&{\left[ {{g_s}\left( {{{\mathbf {p}}},{{{\tau }}_p}} \right) } \right] _m} = \frac{{\int {{z_m}p({y_m}|{z_m})\mathcal {N}\left( {{z_m};\frac{{{p_m}}}{{{\tau _{{p_m}}}}},\frac{1}{{{\tau _{{p_m}}}}}} \right) } d{z_m}}}{{\int {p({y_m}|{z_m})\mathcal {N}\left( {{z_m};\frac{{{p_m}}}{{{\tau _{{p_m}}}}},\frac{1}{{{\tau _{{p_m}}}}}} \right) d{z_m}} }}, \end{aligned}$$

(27)

$$\begin{aligned}&{\left[ {{g_c}\left( {{\mathbf {r}},{\tau _{\mathbf {r}}}} \right) } \right] _n} = \frac{{\int {{c_n}p({c_n})\mathcal {N}\left( {{c_n};{r_n},{\tau _{{r_n}}}} \right) } d{c_n}}}{{\int {p({c_n})\mathcal {N}\left( {{c_n};{r_n},{\tau _{{r_n}}}} \right) d{c_n}} }}. \end{aligned}$$

(28)

According to the (12–15) which are the prior information and the posterior information we imposed on \({{\mathbf {c}}}\) and \(p\left( {{{\mathbf {y}}}|{{\mathbf {c}}}} \right)\), respectively, the input function and its derivative function can be rewritten as follows

$$\begin{aligned}&{g_c}({{\mathbf {r}}},{{{\tau }}_r}) = \frac{{{\gamma }}}{{{{\gamma }} + {{{\tau }}_r}}}{{\mathbf {r}}}, \end{aligned}$$

(29)

$$\begin{aligned}&g{'_c}({{\mathbf {r}}},{{{\tau }}_r}) = \frac{{{\gamma }}}{{{{\gamma }} + {{{\tau }}_r}}}. \end{aligned}$$

(30)

Similarly, the output function and its derivative function can be rewritten as follows

$$\begin{aligned}&{g_s}({{\mathbf {p}}},{{{\tau }}_p}) = \frac{{\left( {{{{\mathbf {p}}} \big / {{{{\tau }}_p}}} - {{\mathbf {y}}}} \right) }}{{\left( {\lambda + {1 \big / {{{{\tau }}_p}}}} \right) }}, \end{aligned}$$

(31)

$$\begin{aligned}&g{'_s}({{\mathbf {p}}},{{{\tau }}_p}) = \frac{{{\lambda ^{ - 1}}}}{{{\lambda ^{ - 1}} + {{{\tau }}_p}}}. \end{aligned}$$

(32)

Upon convergence, DGAMP-SBL yields a sparse estimate of \({{\mathbf {c}}}\) according to the prior information given by the posterior mean. This also gets a coarse estimate of the channel matrix, which can be represented by a paired frequency of AoAs and AoDs using multiplying the steering vectors. The initial channel estimation can be obtained as

$$\begin{aligned} \begin{aligned} {{{\hat{\mathbf {H}}}}^{(0)}} = {{{\mathbf {A}}}_{{\varvec{r}}}}{{\hat{ {\mathbf {C}}} {{\mathbf {A}}}}}_t^H, \end{aligned} \end{aligned}$$

(33)

where \({{\hat{{\mathbf {C}}}}}\) denote order by columns as \({{\mathbf {H}}}\).

It is worth noting that GAMP is a low complexity algorithm that transforms the vector estimation into the scalar estimation; therefore, (27), (28) and the operations in the Algorithm 1, all vector squares, divisions and multiplications are taken element-wise. Figure 2 represents sparse channel matrix based on discrete Fourier basis. Figure 3 represents the SBL-GAMP algorithm estimation without considering the sparse off-grid problem when the number of grids is chosen as \({N_1} = {N_2} = 180\) for comparison.

### 4.3 Refined estimation

In the following, we propose an exact and fast 2D frequency estimation method base on the interpolation of three 2-DFT spectral lines [42]. The SBL-DGAMP algorithm, represented by the frequency estimate \({\hat{f}} = \left\{ {{{{\hat{f}}}_1},{{{\hat{f}}}_2}, \ldots {{{\hat{f}}}_{{\tilde{L}}}}} \right\}\) where the \({{\hat{f}}_i} = \{ {f_{i1}},{f_{i2}}\} \quad (i = 1,2, \ldots ,{\tilde{L}})\), is only a coarse estimate due to grid mismatch.

For \((k,j) \in [({k_1},{j_1}),({k_2},{j_2}) \ldots ,({k_{{\tilde{L}}}},{j_{{\tilde{L}}}})]\) related to the corresponding index set of the coarse estimate frequency. And then, we present a interpolation on Fractional Fourier Coefficient. The Fractional Fourier transform can be formulated as follows

$$\begin{aligned}&D(k,j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {im + kn} \right) /{N_{DFT}}}}, \end{aligned}$$

(34)

$$\begin{aligned}&D(k \pm \delta ,j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {im + (k \pm \delta )n} \right) /{N_{DFT}}}}, \end{aligned}$$

(35)

$$\begin{aligned}&D(k , \delta \pm j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {i(m \pm \delta ) +kn} \right) /{N_{DFT}}}}, \end{aligned}$$

(36)

where \(\delta\) is a real number, and we formulate the \(D(k,j),D(k\pm \delta ,j)\), \(D(k,j\pm \delta )\) as \(D,{D_{K+\delta }}\), \({D_{K-\delta }}\), \({D_{J+\delta }}\), \({D_{J-\delta }}\), respectively. Then, we propose the interpolation algorithm as follows

$$\begin{aligned}&{{\hat{\lambda }} _x} = \frac{{{N_r}}}{\pi }{\tan ^{ - 1}} \\&\quad \left( {{\text {Re}} \left\{ {\frac{{\left( {{D_{K + \delta }}E_r^ + - {D_{K - \delta }}E_r^ - } \right) \cdot \sin (\pi i/{N_r})}}{{\left( {{D_{K + \delta }}E_r^ + + {D_{K - \delta }}E_r^ - } \right) \cdot \cos (\pi i/{N_r}) - 2D\cos (\pi i)}}} \right\} } \right) , \end{aligned}$$

(37)

$$\begin{aligned}&{{\hat{\lambda }} _y} = \frac{{{N_t}}}{\pi }{\tan ^{ - 1}} \\&\quad \left( {{\text {Re}} \left\{ {\frac{{\left( {{D_{J + \delta }}E_t^ + - {D_{J - \delta }}E_t^ - } \right) \cdot \sin (\pi i/{N_t})}}{{\left( {{D_{J + \delta }}E_t^ + + {D_{J - \delta }}E_t^ - } \right) \cdot \cos (\pi i/{N_t}) - 2D\cos (\pi i)}}} \right\} } \right) , \end{aligned}$$

(38)

where \({{\hat{\lambda }} _x},{{\hat{\lambda }} _y}\) are frequency deviation normalized by grid distance \(\Delta f\) which is defined by \(\Delta f = 1/\left( {{N_{DFT}}/2} \right)\).

And then we could get the frequency estimations that

$$\begin{aligned} \begin{aligned} {{\hat{f}}_x} = \left\{ {\begin{array}{*{20}{c}} { - ({\hat{\lambda }} _x + k)/({N_{DFT}}/2)\quad \quad \quad k < {N_{DFT}}/2} \\ {2 - ({{{\hat{\lambda }} }_x} + k)/({N_{DFT}}/2)\quad \quad k \geqslant {N_{DFT}}/2} \end{array}} \right. , \end{aligned} \end{aligned}$$

(39)

and

$$\begin{aligned} \begin{aligned} {{\hat{f}}_y} = \left\{ {\begin{array}{*{20}{c}} {({{{\hat{\lambda }} }_y} + j)/({N_{DFT}}/2)\quad \quad \quad \quad j < {N_{DFT}}/2} \\ {-(2-({{{\hat{\lambda }} }_y} + j)/({N_{DFT}}/2))\quad j \geqslant {N_{DFT}}/2} \end{array}} \right. . \end{aligned} \end{aligned}$$

(40)

According to [43], in order to get a more accurate frequency estimate, we adopt a parabolic interpolation algorithm. For each frequency dimension \(d \in \left\{ {x,y} \right\}\), we calculate three periodogram sample \({D_{d1}}\), \({D_{d2}}\) and \({D_{d3}}\) at frequency \({\theta _{_{d1}}} = {{\hat{f}}_d} - {\Delta _d}\), \({\theta _{_{d2}}} = {{\hat{f}}_d}\) and \({\theta _{_{d1}}} = {{\hat{f}}_d} + {\Delta _d}\). Middle frequency \({{\hat{f}}_d}\) given by (39) and (40), and the sides of frequency \({\theta _{d1}}\) and \({\theta _{d3}}\) are excursed by \({\Delta _d}\) which is chosen to satisfy \({\Delta _d} \in (0,\frac{1}{{2{N_{DFT}}}})\) , as well as high estimation accuracy. The last step of frequency estimation along the *d*th dimension is achieved by calculating the vertex of a parabola fitted through points \(\left( {{\theta _{d1}},{D_{d1}}} \right) ,\left( {{\theta _{d2}},{D_{d2}}} \right) {\text { and }}\left( {{\theta _{d3}},{D_{d3}}} \right)\), i.e.

$$\begin{aligned} \begin{aligned} \theta _d^{fin} = \frac{1}{2}\frac{{\theta _{d3}^2\left( {{\Delta _{12}}} \right) + \theta _{d2}^2\left( {{\Delta _{31}}} \right) + \theta _{d1}^2\left( {{\Delta _{23}}} \right) }}{{{\theta _{d3}}\left( {{\Delta _{12}}} \right) + {\theta _{d2}}\left( {{\Delta _{31}}} \right) + {\theta _{d1}}\left( {{\Delta _{23}}} \right) }}, \end{aligned} \end{aligned}$$

(41)

where \(d = x,y\), \({\Delta _{12}} = \left( {{D_{d1}} - {D_{d2}}} \right)\), \({\Delta _{31}} = {D_{d3}} - {D_{d1}}\), \({\Delta _{23}} = {D_{d2}} - {D_{d3}}\). Eventually, according to (41) the final estimation frequency, the accurate angles AoAs and AoDs can be obtained, i.e., \(\left\{ {{{{\hat{\theta }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\) and \(\left\{ {{{{\hat{\phi }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\).

### 4.4 Reconstruct mmWave MIMO channel

In this subsection, according to the obtained AoAs \(\left\{ {{{{\hat{\theta }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\) and AoDs \(\left\{ {{{{\hat{\phi }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\), the mmWave MIMO channel will be reconstructed as follow. Firstly, we recover the steering vectors \({{{\mathbf {a}}}_{\mathrm{r}}}({{\hat{\theta }} _l})\) and \({{{\mathbf {a}}}_{\mathrm{t}}}({{\hat{\phi }} _l})\) in (5) and (6) using the estimated AoAs and AoDs. Secondly, we rewrite the expression (6) as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {H}}} = ({{{\varvec{\Phi }}}}_{\mathrm{t}}^* \odot {{{{\varvec{\Phi }} }}_{\mathrm{r}}})vecd({{{\mathbf {H}}}_{\mathrm{v}}}). \end{aligned} \end{aligned}$$

(42)

Thus, the receive signal also can be rewritten using vector form as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {y}}}&= \sqrt{P} {{\mathbf {W}}}_{\mathrm{r}}^H(({{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{{{\varvec{\Phi }} }}_{\mathrm{r}}})vecd({{{\mathbf {H}}}_{\mathrm{v}}})){{{\mathbf {X}}}} + {\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}} \\&= \sqrt{P} ({{{\mathbf {X}}}}^T{{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{\mathbf {W}}}_{\mathrm{r}}^H{{{{\varvec{\Phi }} }}_{\mathrm{r}}}){{{\mathbf {h}}}_{\mathrm{v}}} + {{\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}}} \\&= \sqrt{P} {{{\mathbf {Q}}}^o}{{{\mathbf {h}}}_{\mathrm{v}}} + {\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}} \\ \end{aligned} \end{aligned}$$

(43)

where \({{{\mathbf {h}}}_{\mathrm{v}}} = vecd({{{\mathbf {H}}}_{\mathrm{v}}})\), \({{{\mathbf {Q}}}^{\mathrm{o}}} = {{{\mathbf {X}}}}^T{{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{\mathbf {W}}}_{\mathrm{r}}^H{{{{\varvec{\Phi }} }}_{\mathrm{r}}}\). The estimator estimates \({{{\mathbf {h}}}_{\mathrm{v}}}\) in the LS sense. From (35), the LS estimate of \({{{\mathbf {h}}}_{\mathrm{v}}}\), denoted as \({{{\hat{\mathbf {h}}}}_{\mathrm{v}}}\), is given as follows

$$\begin{aligned} \begin{aligned} {{{\hat{\mathbf {h}}}}_{\mathrm{v}}} = \tfrac{1}{{\sqrt{P} }}{\left( {{{\left( {{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) }^H}{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) ^{ - 1}}{\left( {{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) ^H}{{\mathbf {y}}}. \end{aligned} \end{aligned}$$

(44)

Finally, according to the obtained exact angle estimation, and the gains of path \({{{\hat{\mathbf {h}}}}_{\mathrm{v}}}\) above, we can recover the high-dimensional mmWave MIMO channel as \({{\hat{\mathbf {H}}}} = {{{{\varvec{\Phi }}}}_{\mathrm{r}}}diag({{{\mathbf {h}}}_{\mathrm{v}}}){{{\varvec{\Phi }} }}_{\mathrm{t}}^H\).

### 4.5 Analysis of computational complexity

Apparently, the complexity of the DGAMP-SBL algorithm is dominated by the E-step, and the matrix multiplications are a big part of it which matrix multiplications by \({{\mathbf {S}}}\), \({{{\mathbf {S}}}^T}\), \({{{\varvec{\Phi }} }}\), and \({{{{\varvec{\Phi }} }}^T}\) at each iteration. The complexity of each iteration is \(O(4 \cdot {N_1}{N_2}({N_x}{N_y}))\), since we should convert the complex signal to a real signal. It is worth noting that separate operations can reduce the single time, so we also can neglect the coefficient 4. Nonetheless, it is mentioned above that the multiplications operation in Algorithm 1 is taken element-wise. The complexity is much smaller than \(O\left( {{N_1}{N_2}{{({N_x}{N_y})}^2}} \right)\). The complexity of SBL iteration when the dimension of \({N_x}{N_y}\) is large. And the refined part does not need iteration so that it can ignore the complexity. For the OMP scheme, the main computational complexity is \(O( {N_1}{N_2}({N_x}{N_y}))\) which lies in the correlation operation. According to [16], a preprocessing step is proposed to reduce the computational cost of each iteration greatly. For the iterative reweight (IR) scheme, the main computational cost that is contributed by gradient operation is \(O(L^2 \cdot ({N_x}{N_y}({N_r+N_t})))\). Actually, according to the author’s description in [16], the first *L* maximum correlation angles extracted only according to preprocessing are likely to miss the real angle, especially when *L* is large. That means the first 2*L* maximum correlation angles are often taken so that the computational complexity can be written as \(O(4L^2 \cdot ({N_x}{N_y}({N_r+N_t})))\). Finally, for the least square algorithm, the total complexity of LS is \(O( ({N_x}{N_y})^2({N_r}{N_t})+({N_x}{N_y})^3))\). Due to the large antenna dimension of the channel, it is difficult to use LS for mm-Wave channel estimation.

According to the above analysis, we can find that the computational complexity of DGAMP-SBL is proportional to \({{N_1}{N_2}{{({N_x}{N_y})}}}\), which is similar to the OMP algorithm. Obviously, for large \(N_xN_y\), the computational costs of DGAMP-SBL are much smaller than SBL. When *L* is small, the computational complexity of IR is the smallest, but as the value of *L* becomes larger, the complexity of IR becomes close to that of DGAMP-SBL and OMP.