Considering the data model in (10), the joint ML channel estimation and data detection problem reduces to minimizing the following objective function:

{J}_{\text{ML}}=\underset{\mathbf{h},\mathcal{X}\in {\Omega}^{2N}}{\text{min}}\left\{\parallel \mathcal{Y}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{X}_{a}\mathbf{h}{\parallel}^{2}\right\},

(12)

where *Ω*^{2N} denotes all possible 2*N*-dimensional signal vectors. As seen from (12), the joint ML problem is a combinatorial problem involving |*Ω*|^{2N} hypothesis tests, and it is almost impossible to solve it exactly for sufficiently large *Ω* and *N*.

To solve it efficiently, we propose the following strategy. We start by decomposing the original cost function as

\phantom{\rule{-14.0pt}{0ex}}{J}_{\text{ML}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\underset{\mathbf{h},\mathcal{X}\in {\Omega}^{2N}}{\text{min}}\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}\underset{{M}_{{\mathcal{X}}_{(i)}}}{\underset{\u23df}{\parallel {\mathcal{Y}}_{(i)}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a(i)}\mathbf{h}{\parallel}^{2}}}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}\sum _{j=i+1}^{N}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\parallel \mathcal{Y}(j)\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a}(j)\mathbf{h}{\parallel}^{2}\phantom{\rule{0.3em}{0ex}}\right\}

(13)

and define

{M}_{{\mathcal{X}}_{(i)}}=\parallel {\mathcal{Y}}_{(i)}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a(i)}\mathbf{h}{\parallel}^{2}

(14)

as the partial joint ML metric up to the index *i* for\mathcal{X}, where {\mathbf{X}}_{a(i)}=\left[\begin{array}{ll}\text{diag}\left({\mathcal{X}}_{1(i)}^{(k)}\right){\mathbf{A}}_{(i)}^{H}& \text{diag}\left({\mathcal{X}}_{2(i)}^{(k)}\right){\mathbf{A}}_{(i)}^{H}\\ -\text{diag}\left({\mathcal{X}}_{2(i)}^{\ast (k)}\right){\mathbf{A}}_{(i)}^{H}& \text{diag}\left({\mathcal{X}}_{1(i)}^{\ast (k)}\right){\mathbf{A}}_{(i)}^{H}\end{array}\right]\phantom{\rule{0.3em}{0ex}} is a partial Alamouti matrix of dimension 2*i*×2*L*,**X**_{
a
}(*j*), the 2×2*L* matrix is the same as **X**_{a(j)} with all {\mathcal{X}}_{(j)} replaced by \mathcal{X}(j), {\mathcal{Y}}_{(i)}={\left[{\left[{\mathcal{Y}}_{(i)}^{(k)}\right]}^{T}{\left[{\mathcal{Y}}_{(i)}^{(k+1)}\right]}^{T}\right]}^{T} is the partial data vector of dimension 2*i*×1 and the partial matrix {\mathbf{A}}_{(i)}^{H} consists of first *i* rows of **A**^{H}. It should be noted that partial Alamouti matrix **X**_{a(i)} is the function of the first *i* data points, while **X**_{
a
}(*i*) is a function of the *i* th data point. Obviously, the solution that minimizes this partial joint ML metric is not the globally optimal. But we have the following lemma^{a}:

**Lemma** **1**.

Let *R* represent the optimal value of the objective function in (12). If {M}_{{\mathcal{X}}_{(i)}}>R, then {\mathcal{X}}_{(i)} cannot be the ML solution {\widehat{\mathcal{X}}}_{(i)}^{\text{ML}} of (12). In other words, for any estimate {\widehat{\mathcal{X}}}_{(i)} to correspond to the ML solution, we should have {M}_{{\mathcal{X}}_{(i)}}<R.

From Lemma 1, if the optimal value *R* of objective function (12) can be estimated, then we can adopt the following tree search procedure for joint estimation and detection: At each subcarrier *i*, make a guess of new value of \mathcal{X}(i)={\left[\begin{array}{ll}{\mathcal{X}}_{1}(i)& {\mathcal{X}}_{2}(i)\end{array}\right]}^{T}and use that along with previous estimates to construct {\widehat{\mathcal{X}}}_{(i)} and {\widehat{\mathbf{X}}}_{a(i)}. Then, estimate **h** to minimize the associated cost function:

{M}_{{\widehat{\mathcal{X}}}_{(i)}}=\underset{\mathbf{h}}{\text{min}}\left\{\parallel {\mathcal{Y}}_{(i)}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\widehat{\mathbf{X}}}_{(i)}\mathbf{h}{\parallel}^{2}\right\}

(15)

and calculate the resulting metric {M}_{{\widehat{\mathcal{X}}}_{(i)}}. If {M}_{{\widehat{\mathcal{X}}}_{(i)}}<R, then proceed to the next subcarrier *i*+1; otherwise, backtrack and change the guess of \mathcal{X}(j) for some *j*≤*i*. We call this approach as the branch-estimate-and-bound strategy, which reduces the search space of exhaustive ML search to those (partial) sequences that satisfy the given constraint {M}_{{\widehat{\mathcal{X}}}_{(i)}}<R. This approach however does not work for *i*≤*L* as **X**_{a(i)} will be full rank for any choice of {\mathcal{X}}_{(i)}, and therefore, **h** with 2*L* degrees of freedom can always be chosen by least squares (LS) to yield the trivial (zero) value for {M}_{{\widehat{\mathcal{X}}}_{(i)}}. To obtain a non-trivial value of {M}_{{\widehat{\mathcal{X}}}_{(i)}}, we have to use *L* pilots, but it would defeat our original motive of *blind* estimation. To overcome this problem, we adopt weighted regularized LS and instead of minimizing the ML objective function, *J*_{ML}, we minimize the maximum *a posteriori* (MAP) objective function:

{J}_{\text{MAP}}=\underset{\mathbf{h},\mathcal{X}\in {\Omega}^{2N}}{\text{min}}\left\{\parallel \mathbf{h}{\parallel}_{{\mathbf{R}}_{h}^{-1}}^{2}+\parallel \mathcal{Y}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a}\mathbf{h}{\parallel}^{2}\right\},

(16)

where **R**_{
h
} is the block diagonal autocorrelation matrix of the composite channel vector **h**, i.e., **R**_{
h
}=*E*{**h** **h**^{H}}. The objective function in (16) can also be decomposed as

\begin{array}{l}{J}_{\text{MAP}}=\underset{\mathbf{h}\mathcal{X}\in {\Omega}^{2N}}{\text{min}}\left\{\underset{{M}_{{\mathcal{X}}_{(i)}}}{\underset{\u23df}{\parallel \mathbf{h}{\parallel}_{{\mathbf{R}}_{h}^{-1}}^{2}+\parallel {\mathcal{Y}}_{(i)}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a(i)}\mathbf{h}{\parallel}^{2}}}\right.\\ \phantom{\rule{3.6em}{0ex}}+\left(\right)close="\}">\sum _{j=i+1}^{N}\parallel \mathcal{Y}(j)-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{X}}_{a}(j)\mathbf{h}{\parallel}^{2}\end{array}\n

(17)

So, if we have a guess of {\widehat{\mathcal{X}}}_{(i-1)}, the partial metric for\mathcal{X} up to index *i*−1 can be written as

{M}_{{\widehat{\mathcal{X}}}_{(i-1)}}=\underset{\mathbf{h}}{\text{min}}\left\{\parallel \mathbf{h}{\parallel}_{{\mathbf{R}}_{h}^{-1}}^{2}+\parallel {\mathcal{Y}}_{(i-1)}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\widehat{\mathbf{X}}}_{a(i-1)}\mathbf{h}{\parallel}^{2}\right\}

(18)

whose optimum value \widehat{\mathbf{h}} and the cost can be computed [23].

### 3.1 Recursive derivation of bound

For our blind search strategy, the calculation of the metric or bound {M}_{{\widehat{\mathcal{X}}}_{(i)}} is needed at each tree node for comparison with the optimal value of objective function, *R*. This bound can be derived recursively by simply expressing {M}_{{\widehat{\mathcal{X}}}_{(i)}} in terms of new observation and an additional regressor {\widehat{\mathbf{X}}}_{a}(i) as follows:

\phantom{\rule{-16.0pt}{0ex}}\begin{array}{l}{M}_{{\widehat{\mathcal{X}}}_{(i)}}=\underset{\mathbf{h}}{\text{min}}\left\{\parallel \mathbf{h}{\parallel}_{{\mathbf{R}}_{h}^{-1}}^{2}+\parallel {\mathcal{Y}}_{(i)}-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\widehat{\mathbf{X}}}_{a(i)}\mathbf{h}{\parallel}^{2}\right\}\\ \phantom{\rule{3em}{0ex}}=\underset{\mathbf{h}}{\text{min}}\left\{\parallel \mathbf{h}{\parallel}_{{\mathbf{R}}_{h}^{-1}}^{2}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{\u2225\left[\begin{array}{l}{\mathcal{Y}}_{(i-1)}\\ \mathcal{Y}(i)\end{array}\right]\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}\sqrt{\rho}\left[\begin{array}{l}{\widehat{\mathbf{X}}}_{a(i-1)}\\ {\widehat{\mathbf{X}}}_{a}(i)\end{array}\right]\mathbf{h}\u2225}^{2}\right\}\end{array}

(19)

By invoking the block version of recursive least squares (RLS) algorithm to the cost function in (19) with the data vector of size 2×1 and the regressor matrix of dimension 2×2*L*, we get [23]

{M}_{{\widehat{\mathcal{X}}}_{(i)}}={M}_{{\widehat{\mathcal{X}}}_{(i-1)}}+{\mathbf{e}}_{i}^{H}{\mathit{\Gamma}}_{i}{\mathbf{e}}_{i}

(20)

{\widehat{\mathbf{h}}}_{i}={\widehat{\mathbf{h}}}_{i-1}+{\mathbf{G}}_{i}{\mathbf{e}}_{i},

(21)

where

{\mathbf{e}}_{i}=\mathcal{Y}(i)-\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\widehat{\mathbf{X}}}_{a}(i){\widehat{\mathbf{h}}}_{i-1}

(22)

{\mathbf{\Gamma}}_{i}={\left[{\mathbf{I}}_{2}+\rho {\widehat{\mathbf{X}}}_{a}(i){\mathbf{P}}_{i-1}{\widehat{\mathbf{X}}}_{a}{(i)}^{H}\right]}^{-1}

(23)

{\mathbf{G}}_{i}=\sqrt{\rho}\phantom{\rule{1em}{0ex}}{\mathbf{P}}_{i-1}{\widehat{\mathbf{X}}}_{a}{(i)}^{H}{\mathit{\Gamma}}_{i}

(24)

{\mathbf{P}}_{i}={\mathbf{P}}_{i-1}-{\mathbf{G}}_{i}{\mathit{\Gamma}}_{i}^{-1}{\mathbf{G}}_{i}^{H}

(25)

The RLS recursions are initialized by

{M}_{{\widehat{\mathcal{X}}}_{(i-1)}}=0,\phantom{\rule{1em}{0ex}}{\widehat{\mathbf{h}}}_{-1}=\mathbf{0}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{\mathbf{P}}_{-1}={\mathbf{R}}_{h}.

Before introducing our algorithm, we first number the |*Ω*|^{2} combinations of the constellation points from two antennas by 1,2,…,|*Ω*|^{2} and treat them as a big constellation set *Ψ*, where the *k* th(1≤*k*≤|*Ω*|^{2}) vector constellation point is denoted by *Ψ*(*k*). We then perform the depth-first search of signal tree for joint ML solution.

The algorithm essentially reduces the search space of exhaustive ML search by performing a trimmed search over the signal tree of *N* layers, where each tree node at the *i* th layer corresponds to a specific partial sequence {\mathcal{X}}_{(i)} and each tree node at the intermediate layer has |*Ω*|^{2} offsprings to the next layer.

The parameter *ρ* can be easily determined by estimating the noise variance, whereas for **R**_{
h
}, our simulation results indicate that we can replace it with an identity matrix with almost no effect on the performance via carrier reordering (see next section). To obtain the initial guess of search radius, we can use the strategy described in [20] to determine *r* that would guarantee a MAP solution with very high probability. Nevertheless, the algorithm itself takes care of the value of *r*, in that if it is too small such that the algorithm is not able to backtrack, then it doubles the value of *r*; if it is too large such that the algorithm reaches the last subcarrier too quickly, then it reduces *r* to the most recent value of objective function (see steps 4 and 6). Therefore, any choice of *r* would guarantee the MAP solution.

The complexity of the algorithm is mainly attributed to the calculation of the bound {M}_{{\widehat{\mathcal{X}}}_{(i)}} (step 2) and the backtracking (step 3). The rest of the steps are simple additions and subtractions. From the RLS recursions, it can be seen that the calculation of the bound depends heavily on computation of 2 *L*×2*L* matrix **P**_{
i
} in (25). In Section 4, we shall see how computation of **P**_{
i
} can be avoided by exploiting the structure of the FFT matrix, while in Section 5 we shall deal with the issue of backtracking.