This section develops the proposed method under assumptions (i) and (ii). Section 3.1 first derives the estimation method. Section 3.2 then discusses the precoding design to combat the effect of finitesample estimation errors. Section 3.3 investigates the equalization performance of the precoding method using PEP analysis. Section 3.4 provides further discussion about the proposed method.
3.1 The estimation method
Under assumption (i), the autocorrelation matrix of y(n) in (2.4) is shown as follows:
\begin{array}{l}\mathbf{R}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}E\left[\mathbf{y}\right(n\left)\mathbf{y}{\left(n\right)}^{\ast}\right]\\ \phantom{\rule{.50em}{0ex}}\phantom{\rule{3pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\mathbf{D}(\mathbf{P}\otimes {\mathbf{I}}_{K})({\mathbf{P}}^{\ast}\otimes {\mathbf{I}}_{K}){\mathbf{D}}^{\ast}+{\sigma}_{w}^{2}{\mathbf{I}}_{\mathit{\text{JM}}}\\ \phantom{\rule{.50em}{0ex}}\phantom{\rule{3pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\mathbf{D}({\mathbf{\text{PP}}}^{\ast}\otimes {\mathbf{I}}_{K}){\mathbf{D}}^{\ast}+{\sigma}_{w}^{2}{\mathbf{I}}_{\mathit{\text{JM}}}\\ \phantom{\rule{.50em}{0ex}}\phantom{\rule{3pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\mathbf{D}(\mathbf{G}\otimes {\mathbf{I}}_{K}){\mathbf{D}}^{\ast}+{\sigma}_{w}^{2}{\mathbf{I}}_{\mathit{\text{JM}}}.\end{array}
(3.1)
Since P is a circulant matrix,\mathbf{G}={\mathbf{\text{PP}}}^{\ast}\in {\mathbb{R}}^{M\times M} is also a circulant matrix[16] with g = [g_{1}g_{2} … g_{
M
}]^{T} being its first column. Let\mathbf{J}\in {\mathbb{R}}^{M\times M} be a circulant matrix with the first column equal to{[0\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}1\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}0\dots 0\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}0]}^{T}\in {\mathbb{R}}^{M}. Thus, G can be expressed as
\mathbf{G}=[\mathbf{g}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{\text{Jg}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{J}}^{2}\mathbf{g}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{\mathbf{J}}^{M1}\mathbf{g}].
(3.2)
Using (3.2), (3.1) can be expressed as
\begin{array}{l}\mathbf{R}\phantom{\rule{2.56865pt}{0ex}}=\phantom{\rule{2.56865pt}{0ex}}\mathbf{D}\left([\mathbf{g}\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\mathbf{\text{Jg}}\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}{\mathbf{J}}^{2}\mathbf{g}\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\dots \phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}{\mathbf{J}}^{M1}\mathbf{g}]\otimes {\mathbf{I}}_{K}\right){\mathbf{D}}^{\ast}+{\sigma}_{w}^{2}{\mathbf{I}}_{\mathit{\text{JM}}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1pt}{0ex}}=\phantom{\rule{2.56865pt}{0ex}}\left[\mathbf{D}(\mathbf{g}\otimes {\mathbf{I}}_{K})\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\mathbf{D}(\mathbf{\text{Jg}}\otimes {\mathbf{I}}_{K})\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\mathbf{D}({\mathbf{J}}^{2}\mathbf{g}\otimes {\mathbf{I}}_{K})\phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\dots \phantom{\rule{2.56865pt}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\mathbf{D}({\mathbf{J}}^{M1}\mathbf{g}\right.\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{4pt}{0ex}}\left(\right)close="]">\otimes {\mathbf{I}}_{K})& {\mathbf{D}}^{\ast}+{\sigma}_{w}^{2}{\mathbf{I}}_{\mathit{\text{JM}}}\end{array}\n \n \n \n \n =\n \n \n \n \n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n 1\n )\n \n \n \n \n g\n \n \n M\n \n \n D\n (\n 1\n )\n \n \n \u2026\n \n \n \n \n g\n \n \n 2\n \n \n D\n (\n 1\n )\n \n \n \n \n \n \n g\n \n \n 2\n \n \n D\n (\n 2\n )\n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n 2\n )\n \n \n \u2026\n \n \n \n \n g\n \n \n 3\n \n \n D\n (\n 2\n )\n \n \n \n \n \vdots \n \n \n \vdots \n \n \n \ddots \n \n \n \vdots \n \n \n \n \n \n \n g\n \n \n M\n \n \n D\n (\n M\n )\n \n \n \n \n g\n \n \n M\n \u2212\n 1\n \n \n D\n (\n M\n )\n \n \n \u2026\n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n M\n )\n \n \n \n \n \n \n \n D\n \n \n \u2217\n \n \n +\n \n \n \sigma \n \n \n w\n \n \n 2\n \n \n \n \n I\n \n \n JM\n \n \n \n \n \n \n \n \n =\n \n \n \n \n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n 1\n )\n D\n \n \n (\n 1\n )\n \n \n \u2217\n \n \n \n \n \n \n g\n \n \n M\n \n \n D\n (\n 1\n )\n D\n \n \n (\n 2\n )\n \n \n \u2217\n \n \n \n \n \u2026\n \n \n \n \n g\n \n \n 2\n \n \n D\n (\n 1\n )\n D\n \n \n (\n M\n )\n \n \n \u2217\n \n \n \n \n \n \n \n \n g\n \n \n 2\n \n \n D\n (\n 2\n )\n D\n \n \n (\n 1\n )\n \n \n \u2217\n \n \n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n 2\n )\n D\n \n \n (\n 2\n )\n \n \n \u2217\n \n \n \n \n \u2026\n \n \n \n \n g\n \n \n 3\n \n \n D\n (\n 2\n )\n D\n \n \n (\n M\n )\n \n \n \u2217\n \n \n \n \n \n \n \vdots \n \n \n \vdots \n \n \n \ddots \n \n \n \vdots \n \n \n \n \n \n \n g\n \n \n M\n \n \n D\n (\n M\n )\n D\n \n \n (\n 1\n )\n \n \n \u2217\n \n \n \n \n \n \n g\n \n \n M\n \u2212\n 1\n \n \n D\n (\n M\n )\n D\n \n \n (\n 2\n )\n \n \n \u2217\n \n \n \n \n \u2026\n \n \n \n \n g\n \n \n 1\n \n \n D\n (\n M\n )\n D\n \n \n (\n M\n )\n \n \n \u2217\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n +\n \n \n \sigma \n \n \n w\n \n \n 2\n \n \n \n \n I\n \n \n JM\n \n \n .\n \n \n
(3.3)
Note that the autocorrelation matrix R in (3.3) is a noiseperturbated matrix involving the coefficients g_{1},g_{2},…,g_{
M
}, and the outerproduct matrices D(m)D(n)^{∗}, ∀m n = 1,2,…,M. Dividing each submatrix in R by the corresponding coefficient yields the matrices\mathbf{D}\left(m\right)\mathbf{D}{\left(m\right)}^{\ast}+\frac{{\sigma}_{w}^{2}}{{g}_{1}}{\mathbf{I}}_{J} and D(m)D(n)^{∗} for m n = 1,2,…,M, m ≠ n. Those matrices are then used to form the following matrix
{\mathbf{Q}}_{F}={\mathbf{D}}_{F}{\mathbf{D}}_{F}^{\ast}+\frac{{\sigma}_{w}^{2}}{{g}_{1}}{\mathbf{I}}_{\mathit{\text{JM}}},
(3.4)
where{\mathbf{D}}_{F}={[\mathbf{D}{\left(1\right)}^{T}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{D}{\left(2\right)}^{T}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{D}{\left(M\right)}^{T}]}^{T}\in {\mathbb{C}}^{\mathit{\text{JM}}\times K} is the channel frequency response matrix. Note that Q_{
F
}is the outer product of D_{
F
} plus a diagonal matrix\frac{{\sigma}_{w}^{2}}{{g}_{1}}{\mathbf{I}}_{\mathit{\text{JM}}} due to noise. If the noise components imposed on Q_{
F
}can be eliminated, then we can obtain the outerproduct matrix\left(\right)close="">\n \n \n \n D\n \n \n F\n \n \n \n \n D\n \n \n F\n \n \n \u2217\n \n \n \n. Next, we can take eigendecomposition of this outerproduct matrix to obtain an estimate{\hat{\mathbf{D}}}_{F} of D_{
F
}. However, taking eigendecomposition of such a large size (JM × JM)of matrix\left(\right)close="">\n \n \n \n D\n \n \n F\n \n \n \n \n D\n \n \n F\n \n \n \u2217\n \n \n \n involves more computations and usually renders a less accurate result, especially when M, the number of subcarriers, is large. To avoid this drawback, we want to use (3.4) to obtain another matrix HH^{∗}, which is the outer product of the channel impulse response matrix\mathbf{H}={[\mathbf{H}{\left(0\right)}^{T}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{H}{\left(1\right)}^{T}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{H}{\left(L\right)}^{T}]}^{T}\in {\mathbb{C}}^{J(L+1)\times K}. The size of HH^{∗} is J(L + 1) × J(L + 1), which is smaller than the size of\left(\right)close="">\n \n \n \n D\n \n \n F\n \n \n \n \n D\n \n \n F\n \n \n \u2217\n \n \n \n.^{a} Hence, taking eigendecomposition of HH^{∗} to obtain an estimate\hat{\mathbf{H}} of H requires less computational load.
To obtain HH^{∗} from (3.4), we first define an M × (L + 1) matrix F_{1} = F(:,1:L + 1), which is the matrix containing the first (L + 1) columns of F. In addition, the relationship between the channel frequency response matrix D_{
F
} and channel impulse response matrix H can be described as follows:
{\mathbf{D}}_{F}=\sqrt{M}({\mathbf{F}}_{1}\otimes {\mathbf{I}}_{J})\mathbf{H}.
(3.5)
With the aid of (3.5) and (3.4), we obtain the following matrix Q_{
H
}:
\begin{array}{l}{\mathbf{Q}}_{H}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\frac{1}{M}({\mathbf{F}}_{1}^{\ast}\otimes {\mathbf{I}}_{J}){\mathbf{Q}}_{F}({\mathbf{F}}_{1}\otimes {\mathbf{I}}_{J})\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{5pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\frac{1}{M}({\mathbf{F}}_{1}^{\ast}\otimes {\mathbf{I}}_{J})({\mathbf{D}}_{F}{\mathbf{D}}_{F}^{\ast}+\frac{{\sigma}_{w}^{2}}{{g}_{1}}{\mathbf{I}}_{\mathit{\text{JM}}})({\mathbf{F}}_{1}\otimes {\mathbf{I}}_{J})\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{5pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\frac{1}{M}({\mathbf{F}}_{1}^{\ast}\otimes {\mathbf{I}}_{J}){\mathbf{D}}_{F}{\mathbf{D}}_{F}^{\ast}({\mathbf{F}}_{1}\otimes {\mathbf{I}}_{J})\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{5pt}{0ex}}+\frac{1}{M}({\mathbf{F}}_{1}^{\ast}\otimes {\mathbf{I}}_{J})\frac{{\sigma}_{w}^{2}}{{g}_{1}}{\mathbf{I}}_{\mathit{\text{JM}}}({\mathbf{F}}_{1}\otimes {\mathbf{I}}_{J})\\ \phantom{\rule{5pt}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}{\mathbf{HH}}^{\ast}+\frac{{\sigma}_{w}^{2}}{M{g}_{1}}{\mathbf{I}}_{J(L+1)}.\end{array}
(3.6)
Since the matrix H is of full column rank by assumption (ii), the rank of HH^{∗} is K. This implies that the associated smallest J(L + 1)−K eigenvalues of Q_{
H
} in (3.6) are equal to the scalednoise variance\frac{{\sigma}_{w}^{2}}{M{g}_{1}}. Hence in practice, we can estimate the scalednoise variance as the average of the smallest J(L + 1)−K eigenvalues of Q_{
H
}. Then the outerproduct matrix HH^{∗} can be obtained by substracting\frac{{\sigma}_{w}^{2}}{M{g}_{1}}{\mathbf{I}}_{J(L+1)} from Q_{
H
} in (3.6). Finally, taking eigendecomposition of the Hermitian and positive semidefinite matrix HH^{∗}with rank K yields K positive eigenvalues and the associated unitnorm eigenvectors, say, λ_{1},…,λ_{
K
} and d_{1},…,d_{
K
}, respectively. We can thus choose the channel impulse response matrix to be
\hat{\mathbf{H}}=[\sqrt{{\lambda}_{1}}{\mathbf{d}}_{1}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\sqrt{{\lambda}_{2}}{\mathbf{d}}_{2}\phantom{\rule{5.69046pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\sqrt{{\lambda}_{K}}{\mathbf{d}}_{K}]\in {\mathbb{C}}^{J(L+1)\times K},
(3.7)
up to a unitary matrix ambiguity\mathbf{U}\in {\mathbb{C}}^{K\times K}, i.e.,\mathbf{H}=\hat{\mathbf{H}}\mathbf{U}, since\hat{\mathbf{H}}{\hat{\mathbf{H}}}^{\ast}={\mathbf{\text{HH}}}^{\ast}=\mathbf{Q}. The ambiguity matrix U is intrinsic to semiblind estimation of multiple input systems using only secondorder statistics technique[17]. This ambiguity can be resolved using a short pilot sequence[18].
3.2 Precoding design
In Section 3.1, we obtain Q_{
F
} from the autocorrelation matrix R. However, in practice, we have\hat{\mathbf{R}}=\mathbf{R}+\u25b3\mathbf{R} instead of R, where △R is the error matrix due to the presence of finitesample estimation error. As a result, dividing each submatrix in the autocorrelation matrix\hat{\mathbf{R}} by the corresponding coefficient g_{
m
} to obtain{\hat{\mathbf{Q}}}_{F} involves an error term, i.e.,{\hat{\mathbf{Q}}}_{F}={\mathbf{Q}}_{F}+\u25b3{\mathbf{Q}}_{F}, where △Q_{
F
} is the error term, and\u25b3{\mathbf{Q}}_{F}\left(\right(k1)J+1:\mathit{\text{kJ}},(l1)J+1:\mathit{\text{lJ}})=\frac{1}{{g}_{m}}\u25b3\mathbf{R}\left(\right(k1)J+1:\mathit{\text{kJ}},(l1)J+1:\mathit{\text{lJ}}) with m = (k−l)_{
M
} + 1, ∀1 ≤ k,l ≤ M. Here (·)_{
M
}is the moduloM operation. It is obvious that a large value of the corresponding g_{
m
} attenuates the error term △Q_{
F
}, which in turn increases the accurancy of estimation for{\hat{\mathbf{Q}}}_{F}.
As a result, we need to design the precoding coefficients p_{1},p_{2},…,p_{
M
} to maximize g_{1},g_{2},…,g_{
M
} to reduce the error term. However, this results in a multiobjective optimization problem which does not seem to easily yield a tractable way to design. Hence, we present another feasible approach to design the precoding in the following.
Since no prior information of the distortion △R can be obtained in advance, we combine all the M objective functions into a single cost with the same weight, i.e., g = g_{1} + g_{2} + ⋯ + g_{
M
}, and try to design the precoding to maximize g. In addition, it is easy to verify thatg={({p}_{1}+{p}_{2}+\cdots +{p}_{M})}^{2}. Then the optimization problem can be formulated as follows:
\begin{array}{l}{\text{max}}_{{p}_{1},{p}_{2},\dots ,{p}_{M}}{({p}_{1}+{p}_{2}+\cdots +{p}_{M})}^{2}\phantom{\rule{2em}{0ex}}\\ \text{subject to}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}& \sum _{n=1}^{M}{p}_{n}^{2}=1.\phantom{\rule{2em}{0ex}}\end{array}
(3.8)
The constraint in (3.8) normalizes the power gain of each precoded symbol in the precoded vector P x^{(k)}(n) to 1. Appendix shows that the optimal solution to (3.8) is
{p}_{1}={p}_{2}=\cdots ={p}_{M}=\frac{1}{\sqrt{M}}.
(3.9)
Although (3.9) is the optimal solution for channel estimation, it makes symbol detection impossible because (3.9) produces a singular matrix P that can not decode the precoded vector P x^{(k)}(n) at the receiver. To make symbol detection possible after channel estimation, we modify the optimal solution (3.9) as the following precoding scheme
\left\{\begin{array}{l}{p}_{1}=\sqrt{\frac{1}{M}+\tau}\\ {p}_{n}=\sqrt{\frac{1}{M}\frac{\tau}{M1}},\phantom{\rule{1em}{0ex}}n=2,3,\dots ,M,\end{array}\right.
(3.10)
to make a nonsingular matrix P, where0<\tau <\frac{M1}{M} is small. The solution in (3.10) is a small perturbation of the optimal solution in (3.9). In addition, if we increase τ from 0 to\frac{M1}{M}, then p_{1} is larger than p_{
n
}, n = 2,3,…,M, which would improve the channel equalization performance. In the following subsection, we will prove this fact by evaluating the equalization performance under the precoding scheme (3.10).
3.3 Analysis of PEP
One approach to evaluating the equalization performance is BER analysis, but it is generally quite complex. Hence, we use PEP analysis, a technology which is widely used in spacetime communications and OFDM systems, to examine the equalization performance[19–25]. In addition, to better understand the intrinsic impact of the precoding (3.10) on equalization, we assume that the channel state information is known at the receivers. This assumption also appears in[26, 27] to evaluate the equalization performance. Now, let us consider the system model (2.4) with zeroforcing (ZF) equalization and drop the time index n for notational convenience.
The PEP analysis measures the probability that a symbol vector x is sent but another\stackrel{\mathbf{~}}{\mathbf{x}}\ne \mathbf{x} is detected. Let ∥·∥ denote the twonorm of a vector. Then by definition, the PEP conditioned on the channel impulse response matrix H is given by
\text{Pr}[\mathbf{x}\to \stackrel{\mathbf{~}}{\mathbf{x}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{H}]=\text{Pr}[\parallel \hat{\mathbf{x}}\stackrel{\mathbf{~}}{\mathbf{x}}\parallel <\parallel \hat{\mathbf{x}}\mathbf{x}\parallel \phantom{\rule{2.77695pt}{0ex}}\left\mathbf{H}\right],
(3.11)
where\hat{\mathbf{x}}=\mathbf{x}+({\mathbf{P}}^{1}\otimes {\mathbf{I}}_{K}){\mathbf{D}}^{\u2020}\mathbf{v} is the estimate of x after ZF equalization, and D^{†} is the pseudoinverse of D. Letd=\parallel \mathbf{x}\stackrel{\mathbf{~}}{\mathbf{x}}\parallel be the distance between x and\stackrel{\mathbf{~}}{\mathbf{x}}, and let\mathbf{e}=\frac{\stackrel{\mathbf{~}}{\mathbf{x}}\mathbf{x}}{d} be the normalized error vector. Then (3.11) can be directly simplified to
\text{Pr}[\mathbf{x}\to \stackrel{\mathbf{~}}{\mathbf{x}}\phantom{\rule{2.77695pt}{0ex}}\mathbf{H}]=\text{Pr}[u>\frac{d}{2}\phantom{\rule{2.77695pt}{0ex}}\left\mathbf{H}\right],
(3.12)
where\left(\right)close="">\n \n u\n =\n Re\n [\n \n \n v\n \n \n \u2217\n \n \n \n \n (\n \n \n D\n \n \n \u2020\n \n \n )\n \n \n \u2217\n \n \n (\n \n \n P\n \n \n \u2212\n T\n \n \n \u2297\n \n \n I\n \n \n K\n \n \n )\n e\n ]\n \n and Re[·] denotes the real part. Since each element in v is a zeromean circular Gaussian random variable with variance\left(\right)close="">\n \n \n \n \sigma \n \n \n w\n \n \n 2\n \n \n \n, the random variable u is also a zeromean Gaussian with variance
\begin{array}{l}E\left[{u}^{2}\right]=E\left[\right\text{Re}\left[{\mathbf{v}}^{\ast}{\left({\mathbf{D}}^{\u2020}\right)}^{\ast}\right({\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K}\left)\mathbf{e}\right]{}^{2}]\\ \phantom{\rule{3em}{0ex}}=\frac{{\sigma}_{w}^{2}}{2}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}({\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K})\mathbf{e}{\parallel}^{2}.\phantom{\rule{2em}{0ex}}\end{array}
(3.13)
Hence, the conditional PEP in (3.12) becomes
\text{Pr}[\mathbf{x}\to \stackrel{\mathbf{~}}{\mathbf{x}}\phantom{\rule{2.84526pt}{0ex}}\mathbf{H}]=Q\left(\frac{d}{\sqrt{2{\sigma}_{w}^{2}}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}({\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K})\mathbf{e}\parallel}\right),
(3.14)
where Q(·) is the Qfunction[28]. Let ∥·∥_{
F
} denote the Frobenius norm of a matrix. Then by the submultiplicative property of matrix norms[29], we have
\begin{array}{l}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}({\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K})\mathbf{e}\parallel \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\le \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}{\parallel}_{F}\xb7\parallel {\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K}{\parallel}_{F}\xb7\parallel \mathbf{e}\parallel \\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\sqrt{K}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}{\parallel}_{F}\xb7\parallel {\mathbf{P}}^{T}{\parallel}_{F}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}=\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\sqrt{K}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}{\parallel}_{F}\xb7\parallel {\mathbf{P}}^{1}{\parallel}_{F}.\end{array}
(3.15)
The first equality in (3.15) holds since the twonorm of the unit vector e is 1, and the second equality holds since the Frobenius norm of a matrix A equals the Frobenius norm of A^{T}.
Let us now focus on ∥P^{−1}∥_{
F
} in (3.15). Since P is a circulant matrix, it can be decomposed as P = F^{−1}D_{
P
}F, and the inverse of P can be expressed as\left(\right)close="">\n \n \n \n P\n \n \n \u2212\n 1\n \n \n =\n \n \n F\n \n \n \u2212\n 1\n \n \n \n \n D\n \n \n P\n \n \n \u2212\n 1\n \n \n F\n \n, where D_{
P
} is a diagonal matrix with eigenvalues of P on its diagonal[16]. For P with coefficients {p_{1},p_{2},…,p_{
M
}} given in (3.10), the first row of P is a b b … b, wherea=\sqrt{\frac{1}{M}+\tau} andb=\sqrt{\frac{1}{M}\frac{\tau}{M1}}. Then the eigenvalues of P are given by the DFT of the first row of P[16] to form the diagonal matrix{\mathbf{D}}_{P}=\mathtt{\text{diag}}[a+(M1)b,\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}ab,\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}},ab,\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}},ab]\in {\mathbb{R}}^{M\times M}. This leads to
\begin{array}{l}\parallel {\mathbf{P}}^{1}{\parallel}_{F}\phantom{\rule{2.56865pt}{0ex}}=\phantom{\rule{2.56865pt}{0ex}}\parallel {\mathbf{F}}^{1}{\mathbf{D}}_{P}^{1}\mathbf{F}{\parallel}_{F}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\le \phantom{\rule{2.56865pt}{0ex}}\parallel {\mathbf{F}}^{1}{\parallel}_{F}\xb7\parallel {\mathbf{D}}_{P}^{1}{\parallel}_{F}\xb7\parallel \mathbf{F}{\parallel}_{F}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{2.56865pt}{0ex}}=\phantom{\rule{2.56865pt}{0ex}}\sqrt{M}\xb7\sqrt{{[a\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}(M\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}1\left)b\right]}^{\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}2}\phantom{\rule{0.3em}{0ex}}+(M\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}1){(a\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}b)}^{\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}2}}\xb7\sqrt{M}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{2.56865pt}{0ex}}\le \phantom{\rule{2.56865pt}{0ex}}M\sqrt{{(ab)}^{2}+(M1){(ab)}^{2}}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{2.56865pt}{0ex}}=\phantom{\rule{2.56865pt}{0ex}}M\sqrt{M}{(ab)}^{1}.\end{array}
(3.16)
From (3.15) and (3.16), we know
\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}({\mathbf{P}}^{T}\otimes {\mathbf{I}}_{K})\mathbf{e}\parallel \le \sqrt{K{M}^{3}}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}{\parallel}_{F}{(ab)}^{1}.
(3.17)
Using (3.17), we know the conditional PEP (3.14) is upper bounded by
\text{Pr}[\mathbf{x}\to \stackrel{\mathbf{~}}{\mathbf{x}}\phantom{\rule{2.84526pt}{0ex}}\mathbf{H}]\le Q\left(\frac{d(ab)}{\sqrt{2{\sigma}_{w}^{2}K{M}^{3}}\parallel {\left({\mathbf{D}}^{\u2020}\right)}^{\ast}{\parallel}_{F}}\right).
(3.18)
From (3.18), it is obvious that we can increase a (i.e., p_{1} or τ) to decrease the upper bound of PEP, which in turn reduces the symbol/bit detection error. However, it is easy to check that increasing τ from 0 to\frac{M1}{M} would decrease the value of the objective function g = (p_{1} + p_{2} + ⋯ + p_{
M
})^{2} in (3.8), which means the estimation performance deteriorates. Hence, there is a tradeoff in the selection of τ between channel estimation and equalization. In the work of[4, 11], this tradeoff is also observed. We will give a simulation example to demonstrate this tradeoff in Section 4.
3.4 Discussion
We now give some further comments about the proposed method.

(1)
Channel identifiability and the case of more transmitters: The channel identifiability condition, rank (H) = K (assumption (ii)), for the proposed method is the same as that in methods [11, 12, 30], but is more relaxed than the identifiability conditions for methods [9, 10]. If assumption (ii) does not hold, i.e., the matrix H is rank deficient with rank(H) = W < K, then rank(HH ^{∗}) = W < K. In this case, we could only choose W positive eigenvalues and the associated eigenvectors from HH ^{∗}, which can not form the matrix \hat{\mathbf{H}} in (3.7) in theory.
In addition, since the size of the channel impulse response matrix H is J(L + 1) × K, rank(H) = K implies
i.e., the product of the number of receivers (J) and the channel length (L + 1) should be no less than the number of transmitters (K). Hence, the proposed method is capable of identifying not only the more receivers case (J ≥ K), but also the more transmitters case (K > J) as long as (3.19) is fulfilled.

(2)
Channel order overestimation: So far we have assumed that the channel order L is known. If L is unknown, we can set P, the length of CP, as an upper bound of L since P ≥ L is required to avoid interblock interference. With this upper bound \widehat{L}=P and following the process given in Section 3.1, the corresponding matrix Q _{
H
} in (3.6) can be similarly constructed as {\mathbf{Q}}_{H}={\mathbf{H}}_{\mathbf{\text{ov}}}{\mathbf{H}}_{\mathbf{\text{ov}}}^{\mathbf{\ast}}+\frac{{\sigma}_{w}^{2}}{M{g}_{1}}{\mathbf{I}}_{J(\widehat{L}+1)}, where {\mathbf{H}}_{\mathbf{\text{ov}}}={\left[{\mathbf{H}}^{T}\phantom{\rule{2.77695pt}{0ex}}\underset{(\widehat{L}L)\phantom{\rule{2.77695pt}{0ex}}\text{blocks}}{\underset{\u23df}{\mathbf{0}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\mathbf{0}}}\right]}^{T}\in {\mathbb{C}}^{J(\widehat{L}+1)\times K}. Then, similar to the method given in Section 3.1, we eliminate the noise contribution imposed on Q _{
H
} to obtain \left(\right)close="">\n \n \n \n H\n \n \n ov\n \n \n \n \n H\n \n \n ov\n \n \n \u2217\n \n \n \n. Note that the last (\widehat{L}L) block columns and block rows of \left(\right)close="">\n \n \n \n H\n \n \n ov\n \n \n \n \n H\n \n \n ov\n \n \n \u2217\n \n \n \n are zero. Hence, rank \left(\right)close="">\n \n (\n \n \n H\n \n \n ov\n \n \n \n \n H\n \n \n ov\n \n \n \u2217\n \n \n )\n =\n K\n \n and \left(\right)close="">\n \n \n \n H\n \n \n ov\n \n \n \n \n H\n \n \n ov\n \n \n \u2217\n \n \n \n has K positive eigenvalues. Each of the associated eigenvectors has the form \widehat{\mathbf{d}}={[{\mathbf{d}}^{T}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}0\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\dots \phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}0]}^{T}\in {\mathbb{C}}^{J(\widehat{L}+1)} where \mathbf{d}\in {\mathbb{C}}^{J(L+1)}. From these K positive eigenvalues and the associated K eigenvectors, we can estimate {\hat{\mathbf{H}}}_{\mathbf{\text{ov}}}, and then obtain the channel impulse response matrix \hat{\mathbf{H}} from {\hat{\mathbf{H}}}_{\mathbf{\text{ov}}} up to a matrix ambiguity.

(3)
Algorithm: We now summarize the proposed approach as the following algorithm:

(1)
Collect the received data y(n), and then estimate the autocorrelation matrix R via the following time average
\hat{\mathbf{R}}=\frac{1}{S}\sum _{n=1}^{S}\mathbf{y}\left(n\right)\mathbf{y}{\left(n\right)}^{\ast},
(3.20)
where S is the number of data blocks.

(2)
Use the precoding coefficients in (3.10) to compute Q _{
F
} from the autocorrelation matrix of \hat{\mathbf{R}}.

(3)
Form the matrix Q _{
H
}using (3.6) and Q _{
F
}obtained from the previous step.

(4)
Use the method given in Section 3.1 to remove the noise components imposed on Q _{
H
} to obtain HH ^{∗}.

(5)
Finally, obtain the channel impulse response matrix H by computing the K largest eigenvalues and the associated eigenvectors of HH ^{∗}.