Optimal MSE solution for a decision feedback equalizer

Kavitha, Veeraruna; Sharma, Vinod

doi:10.1186/1687-6180-2012-172

Research
Open access
Published: 16 August 2012

Optimal MSE solution for a decision feedback equalizer

Veeraruna Kavitha¹ &
Vinod Sharma²

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 172 (2012) Cite this article

3191 Accesses
1 Citations
Metrics details

Abstract

Due to the inherent feedback in a decision feedback equalizer (DFE) the minimum mean square error (MMSE) or Wiener solution is not known exactly. The main difficulty in such analysis is due to the propagation of the decision errors, which occur because of the feedback. Thus in literature, these errors are neglected while designing and/or analyzing the DFEs. Then a closed form expression is obtained for Wiener solution and we refer this as ideal DFE (IDFE). DFE has also been designed using an iterative and computationally efficient alternative called least mean square (LMS) algorithm. However, again due to the feedback involved, the analysis of an LMS-DFE is not known so far. In this paper we theoretically analyze a DFE taking into account the decision errors. We study its performance at steady state. We then study an LMS-DFE and show the proximity of LMS-DFE attractors to that of the optimal DFE Wiener filter (obtained after considering the decision errors) at high signal to noise ratios (SNR). Further, via simulations we demonstrate that, even at moderate SNRs, an LMS-DFE is close to the MSE optimal DFE. Finally, we compare the LMS DFE attractors with IDFE via simulations. We show that an LMS equalizer outperforms the IDFE. In fact, the performance improvement is very significant even at high SNRs (up to 33%), where an IDFE is believed to be closer to the optimal one. Towards the end, we briefly discuss the tracking properties of the LMS-DFE.

Introduction

A channel equalizer is an important component of a communication system and is used to mitigate the inter symbol interference (ISI) introduced by the channel. The equalizer depends upon the channel characteristics. A variety of equalizers have been proposed and utilized in communication systems [1–3] Usually simple linear equalizers (LE) would suffice (see for e.g., [1–3]) but for a channel with deep spectral nulls one would require a more complex, non LE like a decision feed back equalizer (DFE).

A LE is a linear filter that is used to mitigate ISI while a Wiener filter (WF) equalizer is an optimal filter that minimizes the mean square error (MSE) between the input symbols and the decoded symbols (decoded after the equalizer). Closed form expression for WF LE is available ([4, 5] etc). This closed form expression involves a matrix inverse which can be computationally intensive if the filter has a large dimension. Alternatively, least mean square linear equalizer (LMS-LE), a computationally efficient iterative algorithm, is used extensively (see [4–6]) to obtain the WF equalizer. It can also track the time variations in the WF, if required, as in the case of Wireless channels. For a fixed channel its convergence to the WF has been studied in [6, 7] (see also the references therein). Its performance on a wireless (time varying) channel has been studied theoretically in [8, 9] (see also [4, 5, 10] and the references there in, where the performance has been studied via simulations, approximations and upper bounds on probability of error).

Decision feedback are nonlinear equalizers (a pair of linear filters one in the forward path and another in the feedback path), which can provide significantly better performance than LE [3, 11, 12], especially for ‘bad’ channels. A DFE feeds back the previous decisions of the transmitted symbols, to nullify the ISI due to them (which can now happen without amplifying the thermal noise) and makes a better decision about the current symbol. Although these equalizers have also been used for quite sometime, due to feedback their behavior is much more complex than that of the LEs. Hence their performance is not well understood. Existence of a hard decoder inside the feedback loop, due to its nonlinearity, makes the study all the more difficult. A DFE mainly exploits the finite alphabet structure of the hard decoder output [2, 13] and hence the hard decoder cannot be ignored (i.e., its performance is better than a system with a soft decoder).

Since the statistics of the previous decisions in a DFE are not known, there is no known technique available that provides an minimum MSE (MMSE) DFE (we will call it as DFE-WF in the rest of the article) even for a fixed channel [2, 3, 14]. Thus an MMSE DFE is commonly designed by assuming perfect decisions (see, e.g., [2, 15]). For convenience, for the rest of the article, we will call such a DFE as ideal DFE (IDFE). In this article IDFE is also computed using perfect channel estimates. The IDFE often outperforms the Linear WF significantly [3, 11, 12]. But it is generally believed that DFE-WF, the true MSE optimal DFE (designed considering the decision errors), can outperform even this.

Another way to obtain an optimal filter is to replace the feedback filter at the receiver by a precoder at the transmitter [3, 14]. This way one can indeed obtain the optimal filter but this requires the knowledge of the channel at the transmitter. For wireless channels, which are time varying, this is often not an attractive solution [2, 3].

Some research has been done to deal with the decision errors in a DFE. Sternad et al. [16] approximated the errors in decisions with an additive white Gaussian noise (AWGN) independent of the input sequence and obtained a DFE WF. But as is stated in the article this approximation is not realistic. Erdogan et al. [13] obtain an H^∞ optimal DFE taking into account the decision errors. However no comparison to DFE-WF was provided.

Ideal DFE also contains a matrix inverse for which LMS is again used as a computationally efficient alternative in practical communications systems. However, convergence of LMS-DFE is not well understood even for a fixed channel, again due to the complexity introduced by the feedback. Trajectory of the LMS-DFE algorithm, on a fixed channel, with a soft decoder in the feedback loop has been approximated by an ODE in [17]. But this ODE does not approximate the LMS-DFE with a hard decoder. Beneveniste et al. [6] have shown the ODE approximation of an LMS-DFE with a hard decoder. But the ODE obtained by them is not explicit enough. Furthermore, they do not relate the attractors of this ODE to the DFE-WF.

Our conjuncture is that LMS can actually converge to the true DFE WF (obtained considering the decision errors) and one of the main goals of this article is to prove the same. In this article, we study an LMS-DFE on a fixed channel using an ODE approximation. Towards this, we first obtain the stationary performance of the system with DFE and prove the existence of DFE-WF (the minimum MSE solution) for every channel state (whenever the domain of optimization is compact). We then show that the DFE-WF and an LMS-DFE attractor are close to each other at high signal to noise ratios (SNRs). We show the same is true for nominal values of SNRs via simulations.

Further we demonstrate via simulations, that the LMS-DFE can outperform the commonly used IDFE, at all practical SNRs. An interesting observation is that, the improvement is significant even at high SNRs where an IDFE does not suffer from error propagation and is believed to be close to the true DFE-WF.

The article is organized as follows. Our system model, notations and assumptions are discussed in Section “The model and notation”. In Section “The issues and our approach” we discuss our approach. Section “Analysis of LMS-DFE and DFE-WF” obtains an ODE approximation and then the analysis of the attractors of LMS-DFE. Section “Numerical examples” provides some examples. Section “Tracking analysis” briefs the tracking behavior while Section “Conclusions” concludes the article. The sections Appendices 1 to 5 provide proofs for our theorems.

The model and notation

We consider a communication system with a DFE (see Figure 1). Inputs {s_k} enter a finite impulse response channel ${z_{l}}_{l = 0}^{L - 1}$ , and are corrupted by an additive zero mean white Gaussian noise {n_k} with variance σ². The channel output, u_k, at time k, is given by,

u_{k} = \sum_{l = 0}^{L - 1} s_{k - l} z_{l} + n_{k} .

The channel output passes through a DFE given by a linear forward filter θ_f and a linear feedback filter θ_b. In addition, there is a hard decoder Q(.). The output of the decoder at time k is,

ŝ_{k} = Q (\sum_{l = 0}^{N_{f} - 1} {θ_{f}}_{l} u_{k - l} + \sum_{l = 1}^{N_{b}} {θ_{b}}_{l} ŝ_{k - l}) .

(1)

We provide below the assumptions made and the notations used in this article. Most of these assumptions can be generalized as discussed at the end of this section.

Sequences {s_k}and {n_k}are independent and identically distributed (i.i.d) sequences and are independent of each other. The inputs {s_k}are uniformly distributed over { + 1,−1}(BPSK modulation).
$f_{N} (y)$ is the N dimensional standard i.i.d Gaussian density, where N is the dimension of the vector y, i.e., $f_{N} (y) = {(2 Π)}^{- N / 2} ex p^{- \frac{{|y|}^{2}}{2}}$ . Whenever not mentioned, integrability is with respect to $f_{N} (y) dy$ .
The equalizer forward, feedback filters are given by ${θ_{f_{l}}}_{l = 0}^{N_{f} - 1}$ , ${θ_{b_{l}}}_{l = 1}^{N_{b}}$ respectively. Also, let $N_{L} ≜ N_{f} + L - 1$ .
We assume that the symbols are modulated using BPSK and so the hard decoder equals, Q(x):=1_{x≥0}−1_{x<0}in (1).
For any vector, x, x_lrepresents its l^thcomponent and $x_{l}^{k}$ , l≤k, represents the vector [ x_kx_k−1 … x_l ]^T.

The following vector notations are used:
$\begin{array}{l} S_{k} & ≜ s_{k - N_{L} + 1}^{k}, N_{k} ≜ n_{k - N_{f} + 1}^{k}, \\ U_{k} & ≜ u_{k - N_{f} + 1}^{k}, Ŝ_{k} ≜ ŝ_{k - N_{b} + 1}^{k}, \\ X_{k} & ≜ {[U_{k}^{T} Ŝ_{k - 1}^{T}]}^{T}, G_{k} ≜ {[S_{k}^{T} X_{k}^{T}]}^{T}, \\ θ_{f} & ≜ {θ_{f}}_{0}^{N_{f} - 1}, θ_{b} ≜ {θ_{b}}_{1}^{N_{b}}, \\ J_{k} & ≜ {[S_{k}^{T} Ŝ_{k - 1}^{T} N_{k}^{T}]}^{T}, Θ ≜ {[{θ_{f}}^{T} {θ_{b}}^{T}]}^{T}, \\ Z & ≜ [z_{0}, z_{1} \dots z_{L - 1}] . \end{array}$

In the above, S_k,U_k,N_kand $Ŝ_{k - 1}$ , respectively represent the vector of input symbols, channel outputs, noise samples and the decoder decisions that influence the equalizer output at time k. Vector X_k forms input to the equalizer at time k while G_k, J_k are the two alternate representations of the system state at time k. Vector Z is the vector form of the channel while θ_f, θ_b are that of the equalizer feed-forward and feedback filters.
Θ_krepresent the time varying equalizer at time k.
Let $S : = {+ 1, - 1}$ . Under the above assumptions, {G_k} and {J_k} are Markov chains for a fixed channel, equalizer pair at (Z,Θ). These two Markov chains take values in $S^{N_{L}} \times S^{N_{b}} \times R^{N_{f}}$ , where $R$ is the set of real numbers. The current and the previous states of both these Markov chain are represented by the ordered pairs (i,y), (j,y^′) respectively. Here i,j take values from the discrete part of the state part of the state space, $S^{N_{L}} \times S^{N_{b}}$ , while y,y^′ take values in $R^{N_{f}}$ .
Ψ={ψ_l}l=0N_L−1 represents the convolution of the channel {z_l} and the forward filter θ_f.
The input to the hard decoder for a given state of the Markov chain is represented by,
$e_{Θ} (i, y) : = \sum_{l = 0}^{N_{L} - 1} ψ_{l} s_{k - l} + \sum_{l = 0}^{N_{f}} {θ_{f}}_{l} n_{k - l} + \sum_{l = 1}^{N_{b}} {θ_{b}}_{l} ŝ_{k - l} .$

Note that $ŝ_{k - 1} = Q (e_{Θ} (j, y^{'}))$ .
B(Θ,δ), $\bar{B} (Θ, δ)$ are the open and closed balls respectively with center Θ and radius δ.
The equalizer output without noise, e_Θ(i,0)≠0for all values of i at the LMS attractor. Without this assumption the LMS algorithm makes more errors than the correct decisions.

Thus, the channel outputs {u_k}pass through a DFE Θ with a hard decoder. The performance of this system will depend upon the DFE filters Θ. We are interested in a filter Θ that minimizes the commonly used criterion, the mean of the squared error between the input symbol s_k and their corresponding decisions Θ^tX_k(MSE):

MSE = E [{|s_{k} - Θ^{t} X_{k}|}^{2}] .

(2)

The LMS algorithm,

Θ_{k + 1} = Θ_{k} - μ_{k} H_{Θ_{k}} (G_{k}); H_{Θ} (G) ≜ X (X^{t} Θ - s),

(3)

a computationally efficient iterative algorithm, is expected to provide the MMSE solution. However, with a feedback structure inserted, the convergence behavior of LMS is not understood properly. In fact, it is not even clear if the minimum mean square problem is well posed neither is it clear if an MMSE solution exists. Even prior to these questions, one first needs to define the expectation in (2) appropriately. One is often interested in optimizing a stationary performance, i.e., expectation in (2) is with respect to the stationary distribution of the system. However the stationary distribution depends upon the parameter Θ. The existence of the stationary distribution for any given Θ is not known. We take up these issues one by one and our final goal is to show that the above iterative algorithm (3) indeed converges close to the MMSE solution.

One can easily extend the theory of this article to any finite alphabet (complex) input source with any arbitrary distribution and to a complex channel. However we stick to BPSK modulation and to a real channel to keep the explanations simple. Also, the theory to follow, considers an optimal equalizer for delay 0. The entire theory will go through for any arbitrary delay. Indeed in Section “Numerical examples”, an example with an optimal equalizer for delay 1, is presented. This is once again done to simplify the explanations.

The issues and our approach

A DFE-WF on a fixed channel (if it exists) is given by,

Θ^{*} = arg min_{Θ} E {[Θ^{t} X_{k} - s_{k}]}^{2},

(4)

where the expectation on the right hand side is defined under stationarity for a given Θ. Vector $X_{k} = {[U_{k}^{T}, Ŝ_{k - 1}^{T}]}^{T}$ , includes previous decisions $Ŝ_{k - 1}$ and hence its stationary distribution depends upon the parameter Θ. Thus this is a complex case of optimization in which, the stationary distribution defining the average cost also depends upon the parameter to be optimized. There is no known technique to compute a WF, Θ^∗ of (4), even for a fixed channel.

In practical systems, a DFE WF is commonly designed assuming perfect decisions (i.e., $Ŝ_{k} = S_{k - N_{b} + 1}^{k}$ ), which we have called IDFE. It is easy to see that the IDFE for a fixed channel is given by,

\begin{align} Θ_{IDFE} & = {(E [X_{k} X_{k}^{t}])}^{- 1} E [X_{k} s_{k}], \\ where X_{k}^{t} & : = [\begin{matrix} U_{k}^{t} {S_{k - N_{b} + 1}^{k - 1}}^{t} \end{matrix}] . \end{align}

This computation may be expensive because of matrix inversion and LMS (3) is actually used as an alternative [4, 5]. Our claim is that in case of a DFE, apart from being computationally efficient the LMS algorithm also outperforms the IDFE, Θ_IDFE. This is because we will see briefly that the LMS attractors are close to DFE-WF while IDFE is away from DFE-WF. We achieve this goal by showing that the LMS-DFE attractors are close to that of the DFE-WF at high SNRs (later in Section “Numerical examples” we show that this covers the practically used SNRs). Further, LMS can also be used to track the channel variations. We first study an LMS-DFE on a fixed channel and later on briefly discuss its tracking behavior.

Another issue related to (4) is that we should take the expectation in the right hand side under stationarity. However, it appears that the existence of stationary distribution of {X_k} for a given Θ is not known. Thus, first, in Theorem 2, we show the existence of a unique stationary distribution (and stationary density w.r.t. $f_{N} (y) dy$ ) for {X_k} for any Θ.

As is usually done in adaptive algorithm analysis, we study the LMS-DFE using an ODE approximating it. Using the stationary distribution of {X_k} we convert the ODE in [6] to the following more tractable ODE,

\overset{▪}{Θ} (t) = - \frac{1}{2} E_{Θ} [\nabla_{Θ} {[Θ^{t} X - s]}^{2}] = - E_{Θ} [X (Θ^{t} X - s)] .

(5)

The attractors of the LMS-DFE will be the zeros of the RHS of the above ODE, while the DFE-WF will be a zero of the gradient (if it exists) of the cost in the RHS of (4). Under certain conditions (with ∇representing the gradient),

\begin{align} \nabla_{Θ} E_{Θ} [{[Θ^{t} X - s]}^{2}] = & E_{Θ} [\nabla_{Θ} {[Θ^{t} X - s]}^{2}] \\ + E [{[Θ^{t} X - s]}^{2} \nabla_{Θ} Π_{Θ}], \end{align}

(6)

where Π_Θ is the stationary density of the Markov chain, {J_k}, w.r.t. the Lebesgue measure, when the DFE Θ is used. One can expect the LMS-DFE attractors to be close to the DFE-WF, if the second term in the RHS of (6) is close to zero. However, we could not even get differentiability of Π_Θ. Nevertheless, we achieve the required differentiability (Theorem 3) by considering a hard decoder that is a slightly perturbed version of the original hard decoder. We also show that the DFE-WF and an LMS-DFE attractor of this perturbed decoder converge to that of the original hard decoder as the level of perturbation tends to zero (Theorem 4). We then analyze this perturbed decoder and show that the LMS-DFE attractors of this decoder are close to its DFE-WFs at high SNR (Theorem 5). This suggests that at high SNR an LMS attractor for the original decoder is close to its DFE-WF.

Analysis of LMS-DFE and DFE-WF

We provide a step by step analysis of LMS-DFE and its connection to DFE-WF in this section, while addressing the issues raised in Section “The issues and our approach” one after the other.

Previous ODE approximation result

We start with an ODE approximation for LMS-DFE, which will be used in the subsequent sections for performance analysis. DFE with a hard decoder has been approximated by an ODE in [6]. We start our LMS-DFE analysis with this ODE. We reproduce the ODE approximation result of [6] here in our notations. Towards this goal, as a first step, we write down the ODE approximating LMS-DFE (3): let Θ(t,a) denote the solution of the ODE,

\overset{▪}{Θ} (t) = - h (Θ) with h (Θ) : = lim_{n \to \infty} P_{Θ}^{n} H_{Θ} (j, y^{'}),

(7)

with initial condition Θ(0)=a, where $P_{Θ}^{n}$ is the n-step transition function of the Markov chain J_kwith DFE Θ, and $P_{Θ}^{n} H_{Θ} (j, y^{'})$ is the expectation of the function H_Θ(G) (defined in (3)) using the conditional measure P Θ n(.|j,y^′)(Note G_k is a fixed function of J_k). The limit in (7) will be independent of the initial condition (j,y^′) ([6], p. 252).

It is easy to see that the LMS algorithm satisfies all the required hypothesis of ([6], Theorem 13, p. 278) and hence one can approximate its trajectory on any finite time scale with the solution of the ODE (7), the precise result is:

Theorem 1

For any initial condition Θ₀, finite time T, with $t (r) : = \sum_{k = 0}^{r} μ_{k}$ ,

\begin{align} sup_{\{r : t (r) \leq T\}} |Θ_{r} - Θ (t (r), Θ_{0})| \to^{p} 0 \\ as \sum_{k} μ_{k}^{1 + δ} \to 0 for some δ < 0.5, \end{align}

whenever μ_k≤1 for all k and if $lim inf_{k} \frac{μ_{k + r}}{k} > 0$ for every integer r. ▀

Stationary distribution and a simplified ODE

We will show below that the RHS of the ODE (7) is same as that of the ODE (5) and hence equate the ODE (7) with a more tractable ODE (5). As a first step, we prove that the Markov chain {J_k} has a stationary distribution for any given DFE, channel pair (Z,Θ). In the following, at many places we do not include channel value Z for notation, as this article mainly works with fixed channel behavior. However the proofs are applicable for any pair (Z,Θ) and the notation includes Z, when required to be specific.

Theorem 2

The following results hold:

(i)
For every fixed (Z,Θ), Markov chain {J _k}has a unique stationary distribution π _Z,Θ.
(ii)
Starting from any initial condition (i,y), the n-step transition measure ( $P_{Θ}^{n} (. | i, y)$ ) of the Markov chain converges geometrically to the stationary distribution, Π _Θ, in total variation norm.
(iii)
The continuous part of the stationary distribution has a density, Π _Θthat is continuous with respect to (Z,Θ)in L ₁norm.
(iv)
The MSE under stationarity is continuous in (Z,Θ).

Proof: Please see Appendix 1. ▀

For each Θ, {J_k} is a Markov chain taking values in $S^{N_{L}} \times S^{N_{b}} \times R^{N_{f}}$ . Its transition function

\begin{align} P_{Θ}^{1} (i, y \in B | j, y^{'}) = & \tilde{δ} (i, j) \bar{δ} (y, y^{'}) P (i_{1}) P (y_{1} \in B_{y^{'}}) \\ \times P_{Θ} (i_{N_{L} + 1} | j, y^{'}), \end{align}

(8)

where $\bar{δ} (y, y^{'})$ equals 1 when the vector formed from all but the last component of the vector y^′equals the vector formed from all but the first component of the vector y and otherwise zero and $\tilde{δ} (i, j) = \bar{δ} (i_{1}^{N_{L}}, j_{1}^{N_{L}}) \bar{δ} (i_{N_{L} + 1}^{N_{b} + N_{L}}, j_{N_{L} + 1}^{N_{b} + N_{L}})$ (note that the first component, $i_{1}^{N_{L}}$ , represents the sample value of S_k, while the second one, $i_{N_{L} + 1}^{N_{b} + N_{L}}$ , represents the sample value of $Ŝ_{k - 1}$ ). The only component of the transition function (8) that depends upon Θ is $P_{Θ} (i_{N_{L} + 1} = 1 | j, y^{'}) = 1_{\{e_{Θ} (j, y^{'}) > 0\}}$ .

By i.i.d nature of the input s_kand noise n_k one can choose n₀large enough such that the continuous part of the n step transition function $P_{Θ}^{n} (i, y \in B | j, y^{'})$ is absolutely continuous with respect to $f_{N} (y) dy$ for all n≥n₀. Further, n₀ is chosen larger than N_L to ensure ensure s_k, $S_{k - n_{0}}$ are independent. Fix one such n. The corresponding density (Radon–Nikodym derivative)

\begin{align} p_{Θ}^{n} (i, y | j, y^{'}) = \sum_{l} \int_{v} P (S_{k + 1}^{k + n}) π_{q = 1}^{n} \\ \times P_{Θ} (ŝ_{k + n - q} | x (q)) f_{N} (v) dv, \end{align}

(9)

where

\begin{align} l & = (S_{k + 1}^{k + n - N_{L}}, Ŝ_{k}^{k + n - 1 - N_{b}}), v : = N_{k + 1}^{k + n - N_{f}} / σ and \\ x (q) & : = (S_{k + n - q - N_{L} + 1}^{k + n - q}, Ŝ_{k + n - q - N_{b}}^{k + n - q - 1}, N_{k + n - q - N_{f} + 1}^{k + n - q}) . \end{align}

From (9) it is easy to see that the density of the n-step transition function $p_{Θ}^{n} (i, y | j, y^{'}) \leq 1$ , for all values of i,y, j,y^′ and n≥n₀. Also, we have by Theorem 2.ii, $|p_{Θ}^{n} (. | j, y^{'}) - Π_{Θ}| \to 0$ for every value of j,y^′as n→∞ in L₁ norm. Further, the function H_Θ() (given in (3)) can be bounded uniformly by, |H_Θ(G_k)|≤C₁|X_k|² + C₂|X_k| for all Θ in a small neighborhood, for some appropriate constants C₁, C₂. The above bound is square integrable and depends only on {J_k}. Hence Lemma 1 of Appendix 5 is applicable and we have,

\begin{align} lim_{n \to \infty} (P_{Θ}^{n} H_{Θ}) (j, y^{'}) & = π_{Θ} H_{Θ} (G) which \Rightarrow h (Θ) \\ = \frac{1}{2} E_{Θ} [\nabla_{Θ} {[Θ^{t} X - s]}^{2}] . \end{align}

Thus ODE (7) simplifies to ODE (5).

By Theorem 2, MSE is a continuous function of Θ and so by confining our search in (4) to a compact region, we obtain the existence of the WF, DFE-WF. Next we consider the LMS attractors which are now the attractors of ODE (5). The ODE attractors will be zeros of the RHS of (5), while the DFE-WF will be a zero of the gradient (if it exists) of the MSE (the cost in the RHS of (4)). As discussed in Section “The issues and our approach”, these two can be related as in (6) and for comparison of the two zeros, one needs to study, ∇_ΘΠ_Θ, the gradient of the stationary density. That is, to get the connection between an LMS-DFE attractor and the DFE-WF one needs to consider the differentiability of the stationary density.

Differentiability of the stationary density

One can see from Equation (9) that it is difficult to comment on differentiability of the n-step transition density itself. Thus, it is even more difficult to discuss the differentiability of the stationary density. To proceed further with the analysis, we perturb the hard decoder Q such that the n-step transition density and the stationary density become differentiable. Next we show that the LMS attractors and the DFE-WF of this perturbed decoder converge to that of the original decoder as the level of perturbation tends to zero. Finally we study the DFE using these perturbed decoders in Section “LMS attractors versus WF at high SNRs”.

We alter the decoder function Q(x)(of Equation (1)) to,

\begin{align} Q_{ε_{0}} (x) = \{\begin{matrix} 1, & with & prob & 1, & if & x > ε_{0}, \\ - 1, & with & prob & 1, & if & x < - ε_{0}, \\ 1, & with & prob & \frac{1}{2} [\cos (\frac{(x - ε_{0}) Π}{2 ε_{0}}) + 1], & if & |x| \leq ε_{0}, \end{matrix} \end{align}

(10)

where ε₀ is a small constant. Also, in (10) when |x|≤ε₀, $Q_{ε_{0}} (x)$ will be taken as −1 when it is not 1. Observe that the perturbed decoder is also a hard decoder. With the perturbed decoder $Q_{ε_{0}} (x)$ , the Θ dependent component of the transition function is,

\begin{align} P_{Θ}^{(ε_{0})} (i_{N_{L} + 1} = 1 | j, y^{'}) = & \frac{1_{\{|e_{Θ} (j, y^{'})| \leq ε_{0}\}}}{2} \\ \times [cos (\frac{(e_{Θ} (j, y^{'}) - ε_{0}) Π}{2 ε_{0}}) + 1] \\ + 1_{\{e_{Θ} (j, y^{'}) \geq ε_{0}\}} . \end{align}

The partial derivative, $\frac{\partial P_{Θ}^{(ε_{0})} (i_{N_{L} + 1} = 1 | j, y^{'})}{∂Θ}$ exists everywhere and equals,

\frac{- 1_{\{|e_{Θ} (j, y^{'})| \leq ε_{0}\}} Π}{4 ε_{0}} sin (\frac{(e_{Θ} (j, y^{'}) - ε_{0}) Π}{2 ε_{0}}) \frac{\partial e_{Θ} (j, y^{'})}{∂Θ} .

(11)

By the uniform upper bound on the derivative (11) and by the bounded convergence theorem one can see that the n-step transition density (9) (with n≥n₀) becomes differentiable (details are in Appendix 2, Lemma 2) and equals (using the notations of Equation (9)),

\begin{align} \frac{\partial p_{Θ}^{(ε_{0}), n}}{∂Θ} (i, y | j, y^{'}) = & \sum_{l} \int_{v} \sum_{m = 1}^{n} π_{\overset{q = 1}{q \neq m}}^{n} P_{Θ}^{(ε_{0})} (ŝ_{k + n - q} | x (q)) \\ \times \frac{\partial P_{Θ}^{(ε_{0})} (ŝ_{k + n - m} | x (m))}{∂Θ} \\ P (S_{k + 1}^{k + n}) f_{N} (v) dv. \end{align}

(12)

For these perturbed decoders, we show that the stationary density (with respect to $f_{N} (y) dy$ ) also becomes differentiable. Furthermore, using an Implicit function theorem, we get a bound on the norm of this gradient.

Theorem 3

For every ε₀>0, for every Θ₀, the Markov chain {J_k} has a unique stationary distribution, $Π_{Θ}^{(ε_{0})}$ . It’s continuous part has a density, $Π_{Θ}^{(ε_{0})}$ , that is continuously differentiable with respect to Θ in L₂ norm. Further, for every δ>0 and $σ_{0}^{2} > 0$ there exists a constant C<∞ such that for all Θ∈B(Θ₀,δ), $σ^{2} \leq σ_{0}^{2}$ ,

\begin{align} {|\nabla_{Θ} Π_{Θ}^{(ε_{0})}|}^{2} \leq & C (\sum_{i} P (|S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} \\ + θ_{f}^{t} N_{k}| \leq ε_{0}) + σ^{2}) . \end{align}

(13)

Proof: The proof is provided in Appendix 2. ▀

We conclude this section by showing that the DFE-WFs and the LMS-DFE attractors of the perturbed decoder converge to that of the original decoder. In the following, let $Θ_{n}^{*}$ and $Θ_{n}^{LMS}$ denote the DFE-WF and an LMS-DFE attractor (whose existence at high SNRs with small ε₀ is established at the end of Appendix 4 and hence is assumed in the proof of the following theorem) for perturbation ${ε_{0}}_{n}$ .

Theorem 4

For any σ², for any sequence ${ε_{0}}_{n} \to 0$ , there exists a subsequence ${ε_{0}}_{nk} \to 0$ , a DFE-WF Θ^∗ of the original decoder and an LMS-DFE attractor Θ^LMS of the original decoder, such that,

Θ_{nk}^{*} \to Θ^{*} and Θ_{nk}^{LMS} \to Θ^{LMS} .

Proof: Please see Appendix 3. ▀

Thus we can always take the perturbation ε₀in (10) small enough such that the LMS attractors and the DFE-WFs for the perturbed decoder are close enough to the corresponding equalizers for the original decoder. Henceforth, we analyze these perturbed decoders to draw important conclusions.

LMS attractors versus WF at high SNRs

In this section we would like to understand the connection between an LMS attractor and a DFE-WF for a perturbed decoder. Since the former is a zero of the RHS of Equation (5) and the later is the zero of the gradient of the MSE (the cost in the RHS of (4)), we study the connection between the two.

Fix an ε₀>0. With the error defined by, err_Θ(J_k):=(s_k−e_Θ(J_k))(note that i defined in the notations in Section “The model and notation” represents, $(S_{k}, Ŝ_{k - 1})$ , the discrete part of the Markov chain, {J_k}),

\begin{align} \nabla_{Θ} E_{J_{k} (Θ)} [{err}_{Θ} {(J_{k})}^{2}] \\ =^{a} \sum_{i} \nabla_{Θ} E_{f_{N}} [{err}_{Θ} {(J_{k})}^{2} Π_{Θ}^{(ε_{0})} (J_{k})] \\ =^{b} \sum_{i} E_{f_{N}} \nabla_{Θ} [{err}_{Θ} {(J_{k})}^{2} Π_{Θ}^{(ε_{0})} (J_{k})] \\ = \sum_{i} E_{f_{N}} [\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2}) Π_{Θ}^{(ε_{0})} (J_{k})] \\ + \sum_{i} E_{f_{N}} [{err}_{Θ} {(J_{k})}^{2} \nabla_{Θ} Π_{Θ}^{(ε_{0})} (J_{k})] \\ = E_{J_{k} (Θ)} [\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2})] \\ + \sum_{i} E_{f_{N}} [{err}_{Θ} {(J_{k})}^{2} \nabla_{Θ} Π_{Θ}^{(ε_{0})} (J_{k})] . \end{align}

(14)

Here equality a follows by the existence of the stationary density $Π_{Θ}^{(ε_{0})}$ with respect to the Gaussian measure $f_{N} (y) dy$ . Equality b is given by Lemma 2 of Appendix 5. The above equality above equality (14) is true for any ε₀>0 and for any σ². We will show below that the DFE-WF will be close to the limiting LMS-DFE if the second term on the right hand side of (14) is small.

We have assumed that $S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} \neq 0$ at an LMS attractor. By continuity, $S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} \neq 0$ for all Θ in a small neighborhood of the LMS attractor. We can further choose an ε₁ small enough such that

0 \notin [S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} - ε_{1}, S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} + ε_{1}],

for all $(S_{k}, Ŝ_{k - 1})$ and for all Θ in a small neighborhood of an LMS attractor. Choose ε₀≤ε₁. By Chebyshev’s inequality, if 0∉[c−ε₀,c + ε₀](for some c) and if n is a Gaussian random variable with mean zero and variance σ², then

\begin{align} P (|c + n| \leq ε_{0}) \leq P (|n| \geq min {|c - ε_{0}|, |c + ε_{0}|}) \to 0 \\ as σ^{2} \to 0 . \end{align}

Thus, from the upper bound (13) of Theorem 3, for any fixed ε₀≤ε₁,

|\nabla_{Θ} Π_{Θ}^{(ε_{0})}| \to 0 as σ^{2} \to 0 .

Thus by Cauchy Schwartz inequality as σ²→0(note err_Θhas all moments),

\begin{align} |\nabla_{Θ} E_{Θ} [{err}_{Θ} {(J_{k})}^{2}] - E_{Θ} [\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2})]| \\ = |\sum_{i} E_{f_{N}} [{err}_{Θ} {(J_{k})}^{2} \nabla_{Θ} Π_{Θ}^{(ε_{0})} (J_{k})]| \\ \leq \sum_{i} {(E_{f_{N}} [{err}_{Θ} {(J_{k})}^{4}])}^{1 / 2} | \nabla_{Θ} Π_{Θ}^{(ε_{0})} | \\ \to 0 . \end{align}

(15)

Next we show the following: (15) implies the LMS-DFE attractors will be close to the DFE-WFs. In general two functions f₁, f₂can be close to each other at every point, but their zeros may be far apart, i.e., if x₁ is a zero of f₁ then f₂(x₁) can be close to zero but the zero of f₂closest to x₁may still be far away. It is useful to rule out this possibility in our scenario. We show this using the following theorem. Define,

\begin{align} s (Θ, σ^{2}) : = E_{J_{k} (Θ)} [\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2})] and \\ w (Θ, σ^{2}, η) : = s (Θ, σ^{2}) + η. \end{align}

Theorem 5

There exists an ε₂with 0<ε₂≤ε₁ such that for any ε₀≤ε₂, there exists a continuous function $q : B (0, δ) \subset R \times R \mapsto R^{N_{f}}$ , with

w (q (σ^{2}, η), σ^{2}, η) = 0 .

Proof: Please see Appendix 4. ▀

Using the above theorem, we obtain the proximity of LMS attractors and the WFs in the following.

For any fixed ε₀≤ε₂, $| \nabla_{Θ} Π_{Θ}^{(ε_{0})} |$ near an LMS attractor, tends to zero as σ²→0. Thus by (15), there exists a small enough $σ_{0}^{2}$ such that for all $σ^{2} \leq σ_{0}^{2}$ ,

\begin{align} |(σ^{2}, η_{w})| \leq δ, where η_{w} : = & \nabla_{Θ} E_{Θ} [{err}_{Θ} {(J_{k})}^{2}] \\ - E_{Θ} [\nabla_{Θ} {err}_{Θ} {(J_{k})}^{2}] . \end{align}

Note that q(σ²,0) is a zero of s(Θ,σ²) (note w(q(σ²,0),σ²,0)=s(Θ,σ)) and hence is an LMS attractor at σ². Similarly from (14), q(σ²,η_w)is a zero of the gradient of MSE cost and hence is a DFE-WF. Thus, for all $σ^{2} \leq σ_{0}^{2}$ , the LMS attractors, q(σ²,0), by continuity arguments of Theorem 5 will be close to that of the WFs, q(σ²,η_w).

It is clear from the above Theorem that at high SNRs, for very small ε₀(close to the practical decoder), the LMS attractor is close to the DFE-WF. Since, IDFE Θ_IDFE, is designed with an improper assumption (like perfect decisions), there is a good chance of these filters to be inefficient in comparison to the LMS attractors. We will see this in the examples provided in Section “Numerical examples”.

We conclude this section by pointing out another useful consequence of the Theorem 5. This theorem also establishes the existence of the LMS attractors at high SNRs for perturbed decoders with perturbation level ε₀ small. A Remark at the end of Appendix 4 establishes this point.

One of the uses of the above ODE approximation is that, one can approximately obtain the performance (e.g., Bit error rate, MSE) of LMS-DFE at any time by using the trajectory of this ODE. Of course, obtaining bit error rate (BER) theoretically is still a problem because the BER of a system with a fixed known channel and a fixed DFE is still not available. But our ODE approximation is still useful because one can obtain the performance (transient as well as stationary) of the LMS-DFE with only one simulation, which would not be possible otherwise. This is because by Theorem 1, the ODE solution approximates the LMS-DFE trajectory in probability.

Numerical examples

In this section we reinforce the theory developed so far using some examples. We take a few examples of channels obtained from previous studies and show the proximity of the DFE-WF and the LMS attractor for practical values of SNRs. We also show that in many cases, the IDFE performs much worse than the DFE-WF but an LMS attractor performs close to the DFE-WF. BER and the MSE are used to compare the various equalizers. For every sample of the channel, we have used Monte-Carlo simulations to estimate the corresponding BER and MSE using one million samples of data.

DFE-WF, Θ^∗, for every sample of the channel is obtained by running a gradient descent type of algorithm on the cost function (4) itself, where the gradient was approximated at each point by finite difference approximation, [E_{Θ + Δ}(X(Θ + Δ)^t(Θ + Δ)−s]²−E_Θ(X(Θ)^t(Θ)−s)²) /|Δ|. Here the expectation E_Θ(X(Θ)^t(Θ)−s)² is estimated by the sample path averages,

\frac{1}{N} \sum_{i = 1}^{N} {(X_{i}^{t} (Θ) Θ - s_{i})}^{2}

using a large number of samples, N. Vector sequence ${\{X_{i} (Θ)\}}_{i = 1}^{N}$ is obtained by running the DFE with fixed coefficients Θ(and on a channel that is fixed at its current sample). Thus Θ^∗ is estimated as the limit of the steepest descent algorithm:

\begin{align} Θ_{k + 1} = Θ_{k} & - \frac{μ_{k}}{N |Δ_{k}|} [\sum_{i = 1}^{N} [{(X_{k, i}^{t} (Θ_{k}) Θ_{k} - s_{k, i})}^{2} \\ - {(X_{k, i}^{t} (Θ_{k} + Δ_{k}) (Θ_{k} + Δ_{k}) - s_{k, i})}^{2}]] . \end{align}

Here s_k,i are i.i.d with the distribution of the inputs, s_k. Sequences {Δ_k}and {μ_k}are chosen appropriately to reduce to zero. In our simulations we used $μ_{k} = \frac{0.07}{k^{0.6}}$ , Δ_k=5μ_k and N=4×10⁵.

Least mean square attractors are obtained as the time limit of the LMS algorithm (3), with similar settings as with DFE-WF estimation.

We consider two examples in Tables 1 and 2. In Table 1, we have used an interesting example (significant part of the raised cosine channel of ([1], p. 199)) to show that the LMS attractors are close to the WFs at practical SNRs. Its coefficients are provided in the table. We also provide BER in this table. We further show the euclidean distance between the equalizer and the corresponding DFE-WF in first sub-columns. One can see that the distance between the LMS-DFE and DFE-WF is small while that between the IDFE and DFE-WF is large. One can also see an improvement up to 18% in BER in LMS-DFE in comparison to the IDFE. In fact this improvement is more at high SNRs (where the IDFE is assumed to have lesser problem because of error propagation). Further, the BER of the DFE-WF is close to that of the LMS attractor.

Table 1 Comparison of DFEs for raised cosine channel with N _f = 5, N _b = 10 and channel fixed at [0.45 0.59 0.43 0.11 −0.22 −0.32 −0.27 0 0.11 0.11]

Full size table

Table 2 Comparison of DFEs with N _f = N _b = 2, and channel fixed at [ 0.41 .82 0.41]

Full size table

We have developed the theory for an equalizer with delay zero. One can easily extend these results to the equalizer with any arbitrary delay. In fact, the channel in Table2is one such example. Here the equalizer with delay 1 will be the best one. The channel of Table 2 is very widely used (see [1], p. 165 and [4], p. 414). We can see once again a huge improvement (up to 30%) in BER for the LMS-DFE with respect to Θ_IDFE. We also see that the LMS attractors are close to the DFE-WF, Θ^∗for all practical SNRs.

In this section, we are comparing directly the time limits of LMS algorithm (3) with that of the true DFE-WF iteration mentioned at the beginning of this section. These two limits are further compared with IDFE, closed form expression. That the LMS trajectory approximates the solution of the ODE (5) is established theoretically (Theorem 1) in this article. In [9, 18] etc., we have demonstrated the same even via numerical simulations, for time varying channels. In Figures two, three and four of [9], it is shown numerically that the LMS trajectory approximates the appropriate ODE solution when the underlying channel is a time varying AR (2) process.

Tracking analysis

LMS being an iterative algorithm can track the channel variations if the update co-efficients μ_k, in (3), converge to a non zero value. In [9, 18], we study the tracking behavior of an LMS-DFE, while it is operating on a wireless channel characterized by an AR(2) process. We demonstrate that an LMS-DFE can also track the time varying DFE-WF, whose variations result from the variations in a wireless channel. We also show that LMS-DFE can outperform the IDFE, on a time varying channel.

Conclusions

Obtaining MSE optimal filter for DFE is a long-standing problem. Precoding provides one practical solution but may not be feasible with wireless channels. The difficulty in the design and or analysis is because, the analysis of the past decisions (with feedback) is not known so far. To circumvent this, one commonly uses the optimal WF obtained assuming perfect past decisions. LMS, a computationally efficient alternative, is an iterative algorithm designed to converge to the WF. However, once again because of the feedback involved, complete analysis of an LMS-DFE is not available.

We show via ODE analysis, that LMS itself can provide/track the optimal WF. This article concentrates on fixed channel behavior and proves that the attractors of the LMS are close to that of the optimal DFE at high SNRs. Proofs become nontrivial partly because of the non-differentiability of the hard decoder. We circumvent this problem, by studying another hard decoder which is a slightly perturbed version of the original one. We first show that the LMS attractors and the DFE-WFs of the perturbed decoder converge to that of the original decoder and then show that the two themselves are close to each other at high SNRs. Next, we show by examples that the SNRs need not be very high, i.e., in fact practically used SNRs (upto 1.5 dB) can be sufficient. We also show that the BER (probability of error) of the commonly used WF, designed assuming perfect past decisions (also using perfect channel estimates), can be up to 33% higher than the optimal WF even at high SNRs (where the former is believed to be closer to the later).

In [18], we show that the LMS-DFE converges and then moves close to the instantaneous DFE-WF after the initial transience, while it is tracking a DFE-WF of a wireless channel modeled by an AR(2) process. We also show in [18] that the performance measures BER and MSE of the LMS-DFE are close to that of the DFE-WF after the transient period, while that of an IDFE are substantially inferior to that of the DFE-WF and the LMS-DFE.

Thus we conclude: (1) in case of a DFE, an LMS algorithm (originally designed for computational efficiency) converges and/or tracks a filter close to the Wiener solution; (2) the closed form expression for DFE WF (obtained after approximating the decision errors to zero) is far away from the Wiener solution and its performance can be significantly inferior.

Appendices

Appendix 1

Proof of Theorem 2: Using the results of [19], we prove the existence and continuity of the stationary distribution of the Markov chain, {J_k}. For any (Z₀,Θ₀) and for any ε>0, ε₀>0,

M_{1} ≜ min_{S_{k}, Ŝ_{k - 1}, Θ \in \bar{B} (Θ_{0}, ε), Z \in \bar{B} (Z_{0}, ε_{0})} (Ψ^{t} S_{k} + θ_{b}^{t} Ŝ_{k - 1}) .

(16)

Continuity of the map considered in (16) and compactness of the closed balls $\bar{B} (Θ_{0}, ε)$ , $\bar{B} (Z_{0}, ε_{0})$ ensures |M₁|<∞.

The map (Θ,N_k)↦θ_f^tN_k is continuous and hence the inverse image of the open set {x>−M₁}under this map is open. Thus it is possible to get a open set C and a δ≤ε such that,

\{(N_{k}, Θ) : θ_{f}^{t} N_{k} > - M_{1}\} \supset C \times \bar{B} (Θ_{0}, δ) .

(17)

Thus whenever $Θ \in \bar{B} (Θ_{0}, ε)$ and $Z \in \bar{B} (Z_{0}, ε_{0})$ , the decoder (1) outputs 1 (irrespective of the inputs/past decisions) when the noise vector is in C. Hence,

\begin{array}{l} P (Ŝ_{k} = [1 . . 1]) & = P (\cap_{l = k - N_{b} - 1}^{k} {ŝ_{l} = 1}) \\ \geq P (\cap_{l = k - N_{b} - 1}^{k} {N_{l} \in C}) \\ \geq P (N_{k - N_{b} - N_{f} + 1}^{k} \in C_{1} \times C_{2}), \end{array}

where sets $C_{1} \in R^{N_{b}}$ , $C_{2} \in R^{N_{f}}$ are selected such that their respective Lebesgue measures are not equal to zero and ∩l=k−N_b−1k{N_l∈C}⊃C₁×C₂.

Define $G : = [1 . . 1] \times [1 . . 1] \times R^{N_{f}}$ . For any n₀>max{N_b + N_f + 1,N_L}, for any initial condition $J_{k - n_{0}}$ , for any measurable set B_N, and for any $Θ \in \bar{B} (Θ_{0}, δ)$ , $Z \in \bar{B} (Z_{0}, ε_{0})$

\begin{align} P (J_{k} \in {[1 . . 1] \times [1 . . 1] \times B_{N}} | J_{k - n_{0}}) \\ = P (S_{k} = [1 . . 1], Ŝ_{k - 1} = [1 . . 1], \\ N_{k} \in B_{N} | J_{k - n_{0}}) \\ \geq P (S_{k} = [1 . . 1], N_{k - N_{b} - N_{f} + 1}^{k} \in C_{1} \\ \times C_{2}, N_{k} \in B_{N}) \\ \geq αP (N_{k} \in B_{N} \cap C_{2}) \end{align}

where α:=P(S_k=[ 1 .. 1 ])P(N k−N_b + N_fk−N_f∈C₁). Thus for any $Θ \in \bar{B} (Θ_{0}, δ)$ , $Z \in \bar{B} (Z_{0}, ε_{0})$ and for any initial condition $J_{k - n_{0}}$ , the n₀-step conditional measure is majorized:

P_{Z, Θ} (J_{k} \in E | J_{k - n_{0}}) \geq ν_{n_{0}} (E \cap G),

where the measure $ν_{n_{0}} ()$ is defined by, $ν_{n_{0}} ([1 . . 1] \times [1 . . 1] \times B_{E}) : = αP (N_{k} \in B_{E} \cap C_{2})$ . Thus the entire state space $S^{N_{L}} \times S^{N_{b}} \times R^{N_{f}}$ is $ν_{n_{0}} -$ small (hence also a petite set) for all the Markov chains {J_k}, parameterized by $Θ \in \bar{B} (Θ_{0}, δ)$ and $Z \in \bar{B} (Z_{0}, ε_{0})$ . Then using ([19], Proposition 9.1.7, p. 206 and Theorem 10.01, p. 230) one obtains the existence and uniqueness of the stationary distribution, π_Z,Θfor each Z,Θ.

Define $ρ = 1 - ν_{n_{0}} (G)$ . Then, by ([19], Theorem 16.2.4 in page 392), for all $Θ \in \bar{B} (Θ_{0}, δ)$ , $Z \in \bar{B} (Z_{0}, ε_{0})$ and for all initial conditions (j,y^′) we get:

|P_{Z, Θ}^{n} (. | j, y^{'}) - π_{Z, Θ}| \leq ρ^{\frac{n}{n_{0}}},

where |.| represents the total variation norm. This along with the continuity of the transition function, establishes the continuity of the stationary distribution π_Z,Θ under total variation norm at (Z₀,Θ₀). This is because, for any $Θ \in \bar{B} (Θ_{0}, δ)$ and $Z \in \bar{B} (Z_{0}, ε_{0})$

\begin{align} lim_{(Z, Θ) \to (Z_{0}, Θ_{0})} & |π_{Z, Θ} - π_{Z_{0}, Θ_{0}}| \\ \leq lim_{(Z, Θ) \to (Z_{0}, Θ_{0})} [|π_{Z, Θ} - P_{Z_{0}, Θ_{0}}^{n} (. | j, y^{'})| \\ + |π_{Z_{0}, Θ_{0}} - P_{Z_{0}, Θ_{0}}^{n} (. | j, y^{'})| \\ + |P_{Z_{0}, Θ_{0}}^{n} (. | j, y^{'}) - P_{Z, Θ}^{n} (. | j, y^{'})|] \\ \leq 2 ρ^{\frac{n}{n_{0}}} + lim_{Θ \to Θ_{0}} |P_{Z_{0}, Θ_{0}}^{n} (x, .) - P_{Z, Θ}^{n} (x, .)| \\ = 2 ρ^{\frac{n}{n_{0}}}, \end{align}

for all n≥1, where the last equality follows by continuity of the transition function with respect to (Z,Θ). By letting n→∞

lim_{(Z, Θ) \to (Z_{0}, Θ_{0})} |π_{Z, Θ} - π_{Z_{0}, Θ_{0}}| = 0 .

The stationary distribution, π_Z,Θ, has discrete and continuous components. The continuous component of π_Z,Θ, is absolutely continuous with respect to the measure $f_{N} (y) dy$ for every (Z,Θ). Hence the stationary density, π_Z,Θfor {J_k}exists. Continuity in total variation norm of the stationary distribution implies the continuity of the stationary densities in L₁norm ([20], Theorem 8.2, p. 110). It is also easy to see that the stationary density Π_Z,Θ(i,y)≤1 for all (i,y).

Now by fixing the channel at some value Z, MSE, the cost in RHS of Equation (4), can be rewritten as, $E_{Θ} {[Θ^{t} X - s]}^{2} = \sum_{S, Ŝ} E_{f_{N}} [{(Θ^{t} X - s)}^{2} Π_{Θ}]$ . Lemma 1 in Appendix 5, now gives the continuity of the MSE with respect to Θ for any fixed Z.

One can show the same conclusions for the Markov chain, {G_k}, as G_k=Γ(J_k) for some fixed one-one, onto C^∞ function Γ, whenever the channel and equalizer values are fixed. ▀

Appendix 2

Proof of Theorem 3: The existence and continuity of the stationary density $Π_{Θ}^{(ε_{0})}$ for every ε₀is achieved in a similar way as in the proof of the Theorem 2. The only difference being, ε₀ must be added to −M₁ in the definition of the set (17). We leave superscript ε₀to simplify the notations in the rest of this proof.

We use Implicit function theorem to prove differentiability. For that, we will consider the Banach spaces:

$X = R^{N_{f} + N_{b}}$ with Euclidean norm.
$Y = {g : S^{N_{L} + N_{b}} \times X \to R; |g| < \infty}$ with L₂ norm, |.|, defined by,
$|g| : = \frac{1}{|S|} \sum_{i} {(\int_{y} {|g (i, y)|}^{2} f_{N} (y) dy)}^{1 / 2},$

where $|S|$ represents the cardinality of set $S^{N_{L} + N_{b}}$ .

Fix n₀>max{N_f + N_b,N_L}. We consider the following continuous map f:X×Y↦Y,

f (Θ, Π) = g (Θ, Π) - Π + (\sum_{j} \int_{y^{'}} Π (j, y^{'}) f_{N} (y^{'}) {dy}^{'} - 1),

where,

g (Θ, Π) (i, y) : = \sum_{j} \int_{y^{'}} p_{Θ}^{n_{0}} (i, y | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} .

Observe that (Θ,Π_Θ) is a zero of f.

By Lemmas 1 and 2 the function f is differentiable with respect to Π and Θ, respectively and further the derivative $\frac{∂f}{∂Π}$ is a homeomorphism. Also, $|{(\frac{∂f}{∂Π})}^{- 1}|$ and $|\frac{∂f}{∂Θ}|$ are upper bounded locally by the RHS of (18) and (24) respectively.

Using similar logic one can easily show that both the partial derivatives of f are continuous in (Θ,Π). Hence by Implicit function theorem on Banach spaces, ([21], Theorem 3.1.10 and Corollary 3.1.11, p. 115), the map Θ↦Π_Θis continuously differentiable and the derivative is given by,

\nabla_{Θ} Π_{Θ} = - {[{\frac{∂f}{∂Π}|}_{(Θ, Π_{Θ})}]}^{- 1} {\frac{∂f}{∂Θ}|}_{(Θ, Π_{Θ})} .

Upper bound 13 is obtained by bounding the above gradient using the upper bounds (18) and (24). ▀

Lemma 1.

f is differentiable with respect to Π and the derivative is a homeomorphism. Also for any $δ > 0, σ_{0}^{2} > 0$ there exists a constant C₀<∞ such that,

|{[{\frac{∂f}{∂Π}|}_{(Θ, Π_{Θ})}]}^{- 1}| \leq C_{0}

(18)

for all Θ∈B(Θ₀,δ), $σ^{2} \leq σ_{0}^{2}$ .

Proof: The function f is affine linear in the second variable Π∈Y. Thus,

{\frac{∂f}{∂Π}|}_{(Θ, \hat{Π})} (Π) = g (Θ, Π) - Π + (\sum_{j} \int_{y^{'}} Π_{(} j, y^{'}) f_{N} (y^{'}) {dy}^{'}) .

(19)

We will show below that this map is one-one through contradiction. It is easy to see that g(Θ,Π)−Π is in the set,

H : = \{Π : \sum_{j} \int_{y^{'}} Π_{(} j, y^{'}) f_{N} (y^{'}) d y^{'} = 0\} \subset Y.

Operator, Γ that maps $Π \mapsto (\sum_{j} \int_{y^{'}} Π_{(} j, y^{'}) f_{N} (y^{'}) d y^{'})$ , i.e.,

\begin{align} Γ : Y & \to Y \\ Π & \mapsto Γ (Π) with \\ Γ (Π) (i, y) : & = \sum_{j} \int_{y^{'}} Π_{(} j, y^{'}) f_{N} (y^{'}) d y^{'} for all (i, y) . \end{align}

has one-dimensional range which lies inside $H^{c}$ . We can show that the partial derivative (19) is one-one, if we show that there is no common non-zero vector in the null space of both the operators. Say there exists a vector Π≠0 in the null space of both the operators. Let,

\begin{align} D & : = \{(i, y) : Π (i, y) \geq 0\}, \\ α & : = \sum_{j} \int_{{y^{'} : (j, y^{'}) \in D}} Π_{(} j, y^{'}) f_{N} (y^{'}) d y^{'}, \\ {|Π|}_{1} & : = \sum_{j} \int_{y^{'}} |Π_{(} j, y^{'})| f_{N} (y^{'}) d y^{'} . \end{align}

As Π is in the null space of the operator, Γ,

\sum_{j} \int_{{y^{'} : (j, y^{'}) \in D^{c}}} Π_{(} j, y^{'}) f_{N} (y^{'}) d y^{'} = - α.

Hence |Π|₁=2α. Also, because g(Θ,Π)=Π,

\begin{align} g (Θ, Π) (i, y) & \geq 0 for all i, y \in D and \\ g (Θ, Π) (i, y) & < 0 for all i, y \in D^{c} . \end{align}

Then,

\begin{array}{l} {|g (Θ, Π)|}_{1} \\ = \sum_{i} \int_{y : (i, y) \in D} g (Θ, Π) (i, y) f_{N} (y) dy - \sum_{i} \int_{y : (i, y) \in D^{c}} g (Θ, Π) (i, y) f_{N} (y) dy \\ = \sum_{i} \int_{y : (i, y) \in D} \sum_{j} \int_{y^{'}} p_{Θ}^{n_{0}} (i, y | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} f_{N} (y) dy \\ - \sum_{i} \int_{y : (i, y) \in D^{c}} \sum_{j} \int_{y^{'}} p_{Θ}^{n_{0}} (i, y | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} f_{N} (y) dy \\ \overset{Fubini}{=} \sum_{j} \int_{y^{'}} (\sum_{i} \int_{y : (i, y) \in D} p_{Θ}^{n_{0}} (i, y | j, y^{'}) f_{N} (y) dy) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} \\ - \sum_{j} \int_{y^{'}} (\sum_{i} \int_{y : (i, y) \in D^{c}} p_{Θ}^{n_{0}} (i, y | j, y^{'}) f_{N} (y) dy) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} \\ = \sum_{j} \int_{y^{'}} P_{Θ}^{n_{0}} (i, y \in D | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} - \sum_{j} \int_{y^{'}} P_{Θ}^{n_{0}} (i, y \in D^{c} | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} \\ = \sum_{j} \int_{y^{'}} (P_{Θ}^{n_{0}} (i, y \in D | j, y^{'}) - P_{Θ}^{n_{0}} (i, y \in D^{c} | j, y^{'})) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} \\ = \sum_{j} \int_{y^{'} : j, y^{'} \in D} (1 - 2 P_{Θ}^{n_{0}} (i, y \in D^{c} | j, y^{'})) |Π (j, y^{'})| f_{N} (y^{'}) d y^{'} + \sum_{j} \int_{y^{'} : j, y^{'} \in D^{c}} (1 - 2 P_{Θ}^{n_{0}} (i, y \in D | j, y^{'})) |Π (j, y^{'})| f_{N} (y^{'}) d y^{'} \\ \leq \sum_{j} \int_{y^{'} : j, y^{'} \in D} (1 - 2 ν_{n_{0}} (D^{c})) |Π (j, y^{'})| f_{N} (y^{'}) d y^{'} + \sum_{j} \int_{y^{'} : j, y^{'} \in D^{c}} (1 - 2 ν_{n_{0}} (D)) |Π (j, y^{'})| f_{N} (y^{'}) d y^{'} \\ = \frac{{|Π|}_{1}}{2} (2 - 2 ν_{n_{0}} (D^{c}) - 2 ν_{n_{0}} (D)) \\ = {|Π|}_{1} (1 - ν_{n_{0}} (Y)) < {|Π|}_{1} . \end{array}

This provides a contradiction as $0 < ν_{n_{0}} (Y) < 1$ and hence |Π|₁=|g(Θ,Π)|₁<|Π|₁. This proves that the partial derivative (19) is one-one. The inequality is obtained by using the majorizing measure, $ν_{n_{0}} (.)$ , defined in the proof of continuity of stationary distribution.

The map g(Θ,Π)is compact integral operator ([22], Example 2, p. 277). The last map of the partial derivative has one-dimensional range and hence is compact. Therefore, the partial derivative equals T−I, where T is a compact operator. Then by Riesz–Schauder Theory ([22], Theorem 1, p. 283), the fact that $\frac{∂f}{∂Π}$ is one-one implies that it is onto and also further that the inverse is bounded. Hence $\frac{∂f}{∂Π}$ is a linear homeomorphism.

Furthermore, the mapping $(σ^{2}, Θ) \mapsto |{[{\frac{∂f}{∂Π}|}_{(Θ, Π_{Θ})}]}^{- 1}|$ is continuous. This continuity follows by the joint continuity of the n₀-step transition function, $p_{Θ}^{n_{0}} (i, y | j, y^{'})$ with respect to (σ²,Θ) and then by bounded convergence theorem (as $p_{Θ}^{n_{0}} (i, y | j, y^{'}) + 1$ is uniformly bounded) and finally by the continuity of the map x↦x⁻¹ ([23], p. 135). Hence the lemma follows for some C₀<∞, $δ > 0, σ_{0}^{2} > 0$ . ▀

Lemma 2.

f is differentiable with respect to Θ. The partial derivative ${\frac{∂f}{∂Θ}|}_{(Θ, Π_{Θ})}$ is upper bounded by bound (24).

Proof: We reintroduce the notations that will be used here (notation of Equation (9)).

$i = [S_{k + n_{0} - (N_{f} + L - 2)}^{k + n_{0}}, Ŝ_{k + n_{0} - 1 - (N_{b} - 1)}^{k + n_{0} - 1}]$ , $y = N_{k + n_{0} - (N_{f} - 1)}^{k + n_{0}}$ , represent the current state of the Markov Chain, at k + n₀.
$j = [S_{k - (N_{f} + L - 2)}^{k}, Ŝ_{k - (N_{b} - 1)}^{k - 1}], y^{'} = N_{k - (N_{f} - 1)}^{k}$ represent the initial condition for n₀−step transition function, which transition function, which is the state of the Markov chain at k.
$l = [S_{k + 1}^{k + n_{0} - (N_{f} + L - 3)}, Ŝ_{k}^{k + n_{0} - 1 - (N_{b} - 2)}]$ , v=N k + 1k + n₀−(N_f−2)represent the intermediate input, decision and noise vectors.
$x (q) : = [S_{k + n_{0} - q - (N_{f} + L - 1)}^{k + n_{0} - q}, Ŝ_{k + n_{0} - q - N_{b}}^{k + n_{0} - q - 1}, N_{k + n_{0} - q - N_{f}}^{k + n_{0} - q}]$ represent the intermediate state of the Markov chain at k + n₀−q.

To begin with, we will show component wise differentiability of the function f, i.e., differentiability of f(Θ,Π)(i,y) for every (i,y). We will show the differentiability of the n₀−step transition function, $p_{Θ}^{n_{0}} (i, y | j, y^{'})$ along with that. Positive and finite constants (like c, c^′′etc) are introduced in the derivations as and when required. While obtaining upper bounds we have taken advantage of the finite alphabet nature of the set $S$ . By simple computations, one can see that the density with respect to the Gaussian measure is,

\begin{align} p_{Θ}^{n_{0}} (i, y | j, y^{'}) = \sum_{l} & \int_{v} π_{q = 1}^{n_{0}} P_{Θ} (ŝ_{k + n_{0} - q} | x (q)) \\ \times P (S_{k + 1}^{k + n_{0}}) f_{N} (v) dv. \end{align}

(20)

Hence,

\begin{align} f (Θ, Π) (i, y) = & \sum_{l, j} \int_{v, y^{'}} π_{q = 1}^{n_{0}} P_{Θ} (ŝ_{k + n_{0} - q} | x (q)) P (S_{k + 1}^{k + n_{0}}) \\ \times Π (j, y^{'}) f_{N} (v) f_{N} (y^{'}) dvdy \\ - Π (i, y) + (\sum_{j} \int_{y^{'}} Π (j, y^{'}) f_{N} (y^{'}) d y^{'} - 1) . \end{align}

The only component of the above functions, depending upon Θ is $P_{Θ} (ŝ_{k + n_{0} - q} | j, y^{'})$ . By (11),

\begin{align} |\frac{\partial P_{Θ}}{∂Θ} (ŝ_{k + n_{0} - q} | x (q))| \leq c |N_{k + n_{0} - q - N_{f}}^{k + n_{0} - q}|, \end{align}

uniformly in (i,y), for every Θ, (j,y^′) and for every q. Thus for any Θ_h in a small neighborhood of 0 and for any i,y, j,y^′, Θ and q (by mean value ([24], Theorem X.4.5, p.312)),

\begin{align} |P_{Θ} (ŝ_{k + n_{0} - q} | x (q)) - P_{Θ + Θ_{h}} (ŝ_{k + n_{0} - q} | x (q)) \\ - Θ_{h}^{t} \frac{\partial P_{Θ}}{∂Θ} (ŝ_{k + n_{0} - q} | x (q))| \\ \leq |P_{Θ} (ŝ_{k + n_{0} - q} | j, y^{'}) - P_{Θ + Θ_{h}} (ŝ_{k + n_{0} - q} | j, y^{'})| \\ + |Θ_{h}^{t} \frac{\partial P_{Θ}}{∂Θ} (ŝ_{k + n_{0} - q} | j, y^{'})| \\ \leq 2 c |Θ_{h}| |N_{k + n_{0} - q - N_{f}}^{k + n_{0} - q}| . \end{align}

(21)

For obtaining the above upper bound, the mean value theorem is used as explained below for a two dimensional function, which can easily be generalized to any n-dimensional function. Say f is any function of two variables. One can write, f(a + h,b + k)−f(a,b) as sum of terms f(a + h,b + k)−f(a + h,b)−f(a,b + k) + f(a,b), f(a + h,b)−f(a,b) and f(a,b + k)−f(a,b). The first term is bounded by mean value theorem for two variables, ([23], Theorem 9.40, p.235), while the remaining two terms can be bounded using mean value theorem for one variable, ([23], Theorem 5.10, p.108).

Finally by dominated convergence theorem, we will obtain the existence of the following partial derivatives, in the paragraphs that follow:

\begin{align} \frac{\partial p_{Θ}^{n_{0}}}{∂Θ} (i, y | j, y^{'}) & = \sum_{l} \int_{v} \frac{\partial (π_{q = 1}^{n} P_{Θ} (ŝ_{k + n_{0} - q} | x (q)))}{∂Θ} \\ \times P (S_{k + 1}^{k + n_{0}}) f_{N} (v) dv, \end{align}

(22)

\begin{align} {\frac{∂f}{∂Θ}|}_{Θ, Π} (i, y) & = \sum_{j} \int_{y^{'}} \frac{\partial p_{Θ}^{n_{0}}}{∂Θ} (i, y | j, y^{'}) Π (j, y^{'}) f_{N} (y^{'}) d y^{'} . \end{align}

(23)

For obtaining the partial derivative (23), we will need to study the following set of functions one for each different value of i,j,l,r

\begin{align} \int_{v, y^{'}} (P_{Θ} (ŝ_{k + n_{0} - r} | x (r)) - P_{Θ + Θ_{h}} (ŝ_{k + n_{0} - r} | x (r)) \\ - Θ_{h}^{t} \frac{\partial P_{Θ}}{∂Θ} (ŝ_{k + n_{0} - r} | x (r))) \\ π_{q = 1, q \neq r}^{n_{0}} P_{Θ} (ŝ_{k + n_{0} - q} | x (q)) \\ \times Π (j, y^{'}) f_{N} (v) f_{N} (y^{'}) dvdy. \end{align}

One can easily see from (21) that the function inside each of the above integral is dominated by some constant multiple of the integrable function,

\begin{align} |N_{k + n_{0} - r - N_{f}}^{k + n_{0} - r}| |Π (j, y^{'})|, \end{align}

and that the above bound is is integrable by Cauchy Schwartz inequality. So by dominated convergence theorem, the limit $lim_{|Θ_{h}| \to 0}$ (which arises while defining the partial derivate) can be taken inside the integral for every i,j,l,r and this establishes the existence of component wise partial derivative, ${\frac{∂f}{∂Θ}|}_{Θ, Π} (i, y)$ . Also, this component wise partial derivative, is uniformly upper bounded by,

\begin{align} |{\frac{∂f}{∂Θ}|}_{Θ, Π} (i, y)| \leq \frac{c^{′′}}{ε_{0}} [|y| + 1] for all (i, y), all Θ, \end{align}

where the constant c^′′depends on |Π|.

One can now prove the existence of the overall partial derivative $\frac{∂f}{∂Θ}$ at every (Θ,Π)using the above upper bound and the dominated convergence theorem (in L₂ norm). Consider the limit,

\begin{align} lim_{|Θ_{h}| \to 0} & \frac{1}{|Θ_{h}|} \sum_{i} \int_{y} |f (Θ, Π_{Θ}) (i, y) - f (Θ + Θ_{h}, Π_{Θ}) (i, y) \\ {- Θ_{h}^{t} {\frac{∂f}{∂Θ}|}_{Θ, Π_{Θ}} (i, y)|}^{2} f_{N} (y) dy \\ = \sum_{i} \int_{y} lim_{|Θ_{h}| \to 0} \frac{1}{|Θ_{h}|} |f (Θ, Π_{Θ}) (i, y) \\ - f (Θ + Θ_{h}, Π_{Θ}) (i, y) \\ {- Θ_{h}^{t} {\frac{∂f}{∂Θ}|}_{Θ, Π_{Θ}} (i, y)|}^{2} f_{N} (y) dy \\ = 0 . \end{align}

The first equality follows because the function inside the integral tends to zero at every point and is upper bounded by the following integrable function,

\begin{align} \frac{c^{'}}{ε_{0}} [|y| + 1] 1_{\{|{\frac{∂f}{∂Θ}|}_{Θ, Π} (i, y)| \leq 1\}} \\ + {(\frac{c^{′′}}{ε_{0}} [|y| + 1])}^{2} 1_{\{|{\frac{∂f}{∂Θ}|}_{Θ, Π} (i, y)| > 1\}} . \end{align}

We will now upper bound this partial derivative for all (Θ,Π_Θ). First observe that, because Π_Θ(i,y)≤1for all (i,y), from (11),

\begin{align} |{\frac{∂f}{∂Θ}|}_{(Θ, Π_{Θ})} (i, y)| \\ = |\sum_{l, j} \int_{y^{'}, v} \frac{\partial (π_{q = 1}^{n} P_{Θ} (ŝ_{k + n_{0} - q} | x (q)))}{∂Θ} \\ \times P (S_{k + 1}^{k + n_{0}}) Π_{Θ} (j, y^{'}) f_{N} (v) f_{N} (y^{'}) dvd y^{'}| \\ \leq (c_{1} \sum_{r = 1}^{n} \sum_{l, j} \int_{v, y^{'}} 1_{\{|e_{Θ} (x (r))| \leq ε_{0}\}} f_{N} (v) f_{N} (y^{'}) dvd y^{'}) \\ + c_{2} |y| + c_{3} E (|N_{k}|), \end{align}

for some appropriate constants c₁,c₂,c₃. Then using ${(\sum_{k = 1}^{n} a_{k})}^{2} \leq n \sum_{k = 1}^{n} a_{k}^{2}$ , |x|²≤|x| (when |x|≤1), we get,

\begin{align} {|{\frac{∂f}{∂Θ}|}_{(Θ, Π_{Θ})}|}^{2} = {(\sum_{i} {(\int_{y} {|{\frac{∂f}{∂Θ}|}_{(Θ, Π_{Θ})} (i, y)|}^{2} \frac{f_{N}}{|S|} (y) dy)}^{1 / 2})}^{2} \\ \leq c_{1}^{'} \sum_{r = 1}^{n} \sum_{l, j, i} \int_{v, y^{'}, y} 1_{\{|e_{Θ} (x (r))| \leq ε_{0}\}} f_{N} (v) f_{N} (y^{'}) f_{N} (y) dvd y^{'} dy \\ + c_{2}^{'} \sum_{i} \int_{y} {|y|}^{2} f_{N} (y) dy + c_{3}^{'} {[E (|N_{k}|)]}^{2} \\ = c_{1}^{′′} \sum_{l} P (\{|b_{l} + θ_{f}^{t} N_{k}| \leq ε_{0}\}) + c_{2}^{′′} σ^{2}, \end{align}

(24)

where the constants b_ltake values $S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1}$ . ▀

Appendix 3

Proof of Theorem 4: Let $f_{1} (Θ, ε_{0}) : = E_{J_{k} (Θ)}^{(ε_{0})} [{err}_{Θ} {(J_{k})}^{2}],$ and $f_{2} (Θ, ε_{0}) : = |E_{J_{k} (Θ)}^{(ε_{0})} \nabla_{Θ} [{err}_{Θ} {(J_{k})}^{2}]| .$ Note that for any fixed ε₀, LMS attractors will be the zeros, i.e., minima of f₂(.,ε₀) while the DFE-WFs are the minima of the MSE cost, f₁(.,ε₀).Also note that ε₀=0corresponds to the original decoder.

Let ${{ε_{0}}_{n}}$ be any sequence converging to 0. Let $Ω = {{ε_{0}}_{n}}$ . Take a compact set C large enough such that the WF is inside it (as Θ is increased to infinity, eventually MSE will start increasing and will tend to infinity). One can follow steps as in Theorem 2 and show that the stationary density $Π_{Θ_{n}}^{{ε_{0}}_{n}}$ converges to $Π_{Θ}^{0}$ as $({ε_{0}}_{n}, Θ_{n}) \to (0, Θ)$ . Similarly, one can also show that both functions f₁,f₂are jointly continuous in (Θ,ε₀)∈C×Ω.

The domain of the parameter Θ for every ε₀, say D(ε₀), is the same compact set C and hence the correspondence ε₀↦D(ε₀)is compact and continuous [25]. Then by the maximum theorem ([25], p. 235), ${D_{1}}_{n}^{*} : = arg {min}_{Θ \in C} f_{1} (Θ, {ε_{0}}_{n})$ and ${D_{2}}_{n}^{*} : = arg {min}_{Θ \in C} f_{2} (Θ, {ε_{0}}_{n})$ are compact valued upper semi-continuous correspondences on Ω. Thus by ([25], Proposition 9.8, p. 231) there exists a subsequence of LMS attractors $Θ_{n_{k}}^{LMS}$ converging to an LMS attractor of the original decoder, $Θ_{0}^{LMS}$ . Once again by the same proposition there exists a further subsequence such that the DFE-WFs $Θ_{{n_{k}}_{l}}^{*}$ converge to a DFE-WF of the original decoder, $Θ_{0}^{*}$ . Thus there exists a sequence (after renaming) ${ε_{0}}_{n} \to 0$ such that $Θ_{n}^{LMS} \to Θ_{0}^{LMS}$ and $Θ_{n}^{*} \to Θ_{0}^{*}$ . ▀

Appendix 4

Proof of Theorem 5: We have assumed in Section “The model and notation”, that $S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1} \neq 0$ for all values of $S_{k}, Ŝ_{k - 1}$ at an LMS attractor. By continuity, this implies the same (in fact the sign of the term, $S_{k}^{t} Ψ + θ_{b}^{t} Ŝ_{k - 1}$ , for each $S_{k}, Ŝ_{k - 1}$ remains same) in a small neighborhood of the LMS attractor. Thus, when σ² = 0 (the noiseless case), for the original decoder at an LMS attractor (call it $Θ_{0}^{*}$ ), we have,

\frac{\partial P_{Θ}}{∂Θ} (i, y | j, y^{'}) = 0 for all (i, y) (j, y^{'}) .

Thus following the steps as in the proof of Theorem 3 (whose proof is obtained in Appendix 2) one can show that the gradient of the stationary density, ∇_ΘΠ_Θ exists and equals zero at $Θ_{0}^{*}$ . Hence at $Θ_{0}^{*}$ (LMS attractor of the original decoder at σ² = 0),

\nabla_{Θ} E_{J_{k} (Θ)} [{err}_{Θ} {(J_{k})}^{2}] = E_{J_{k} (Θ)} [\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2})] .

Therefore in this case, the DFE-WF coincides with the LMS-DFE attractor, $Θ_{0}^{*}$ .

Choose ${ε_{0}}_{2} > 0$ such that, $ε_{0_{2}} < |(S_{k}^{t} Ψ_{0}^{*} + Ŝ_{k - 1}^{t} Θ_{0_{b}}^{*}|$ for all values of s_k and $Ŝ_{k - 1}$ . The DFE-WF ( $Θ^{*, ε_{0}}$ ) and the LMS-DFE attractor ( $Θ^{LMS, ε_{0}}$ ) coincide and equal $Θ_{0}^{*}$ for a noiseless system having a perturbed decoder, with $ε_{0} \leq {ε_{0}}_{2}$ . This happens because when there is no noise the perturbed decoder coincides with the original decoder for $ε_{0} \leq ε_{0_{2}}$ .

Fix $ε_{0} \leq {ε_{0}}_{2}$ . Then, $w (Θ^{*, ε_{0}}, 0, 0) = 0$ and the partial derivative,

\begin{align} {\frac{∂w}{∂Θ}|}_{(Θ_{ε_{0}}^{*}, 0, 0)} & = \frac{ds}{dΘ} = E_{f_{N}} [\nabla_{Θ} (\nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2}) Π_{Θ} (J_{k}))], \\ = E_{Θ} [\nabla_{Θ} \nabla_{Θ} ({err}_{Θ} {(J_{k})}^{2})] = 2 R_{xx} (Θ^{*, ε_{0}}), \end{align}

where $R_{xx} (Θ^{*, ε_{0}})$ is the autocorrelation matrix of the vector X_k(Θ), under stationarity, at $Θ^{*, ε_{0}}$ . As $Θ^{*, ε_{0}}$ is a WF, the above partial derivative will be invertible (all the eigenvalues of the derivative should be negative for the equilibrium point to be an attractor).

Continuity of the above partial derivative with respect to σ²,η,Θ can be seen as before. Applying Implicit function theorem at $(Θ^{*, ε_{0}}, 0, 0)$ , one gets a δ>0, and a continuous function q(σ²,η)such that $q (0, 0) = Θ^{*, ε_{0}}$ and w(q(σ²,η),σ²,η)=0, for all (σ²,η) with |(σ²,η)|≤δ. ▀

Remark on existence of LMS attractors

The above theorem also provides the following useful conclusion. For all σ²≤δ, the zeros of w(.,σ²,0) exist and equal q(σ²,0). These zeros are continuous in σ². One can see that these zeros will indeed be LMS attractors as invertibility of the derivative of the function f() at σ²=0 guarantees its invertibility in a small neighborhood of σ²=0.

Appendix 5

In this appendix we state and prove the lemmas, which are used in this article.

Lemma 3.

Let $Π_{Θ_{n}} \overset{L_{1}}{\to} Π_{Θ}$ . Let |f(Θ_n,x)|≤g₁(x), |f(Θ_n,x)|²≤g₂(x) for all n, where g₁,g₂ are integrable functions (with respect to measure μ). Also let f be continuous and $|Π_{Θ_{n}} (x)| \leq C < \infty$ for all x. Then as n→∞,

\int f (Θ_{n}, x) Π_{Θ_{n}} (x) μ (dx) \to \int f (Θ, x) Π_{Θ} (x) μ (dx) .

Proof: We have

\begin{align} |\int (f (Θ_{n}, x) Π_{Θ_{n}} (x) - f (Θ, x) Π_{Θ} (x)) μ (dx)| \\ \leq \int |(f (Θ_{n}, x) Π_{Θ_{n}} (x) - f (Θ, x) Π_{Θ} (x))| μ (dx) \\ \leq \int |f (Θ_{n}, x)| |Π_{Θ_{n}} (x) - Π_{Θ} (x)| μ (dx) \\ + \int |f (Θ_{n}, x) - f (Θ, x)| Π_{Θ} (x) μ (dx) \\ \leq {(\int {|f (Θ_{n}, x)|}^{2} μ (dx))}^{1 / 2} \\ \times {(\int {|Π_{Θ_{n}} (x) - Π_{Θ} (x)|}^{2} μ (dx))}^{1 / 2} \\ + \int |f (Θ_{n}, x) - f (Θ, x)| Π_{Θ} (x) μ (dx) . \end{align}

The first term on the right converges to zero because,

\begin{align} (\int {|Π_{Θ_{n}} (x) - Π_{Θ} (x)|}^{2} μ (dx)) \\ \leq 4 C^{2} (\int |Π_{Θ_{n}} (x) - Π_{Θ} (x)| μ (dx)) . \end{align}

The second term converges to zero by continuity of the function f(.,.) in Θ and by the bounded convergence theorem, ▀

Lemma 4.

Let Π_Θ represent the Radon-Nikodym derivative of measure Π_Θ with respect to the common measure μ for all $Θ \in R^{m}$ , for some m. Assume Π_Θ≤1everywhere for all Θ. If ∇_ΘΠ_Θ exists in L₂ norm, then

\nabla_{Θ} E_{Θ} (g (Θ)) = E_{μ} \nabla_{Θ} (g (Θ) Π (Θ)),

where g(Θ,.) is square integrable, continuously differentiable in Θ and bounded by a square integrable function uniformly in a neighborhood of Θ.

Proof: Since,

\begin{align} \frac{1}{|Θ_{h}|} \int |g (Θ + Θ_{h}) Π_{Θ + Θ_{h}} - g (Θ) Π_{Θ} \\ - Θ_{h}^{t} (g (Θ) \nabla_{Θ} Π_{Θ} - \nabla_{Θ} g (Θ) Π_{Θ})| μ (dw) \\ \leq \frac{1}{|Θ_{h}|} \int |g (Θ) (Π_{Θ + Θ_{h}} - Π_{Θ} - Θ_{h}^{t} \nabla_{Θ} Π_{Θ})| μ (dw) \\ + \frac{1}{|Θ_{h}|} \int |Π_{Θ + Θ_{h}} (g (Θ + Θ_{h}) - g (Θ) \\ - Θ_{h}^{t} \nabla_{Θ} g (Θ))| μ (dw) \\ + \frac{1}{|Θ_{h}|} \int |(Π_{Θ + Θ_{h}} - Π_{Θ}) Θ_{h}^{t} \nabla_{Θ} g (Θ)| μ (dw), \end{align}

(25)

we will have the result if we show that each of the terms on the right tend to zero as |Θ_h|→0. By Cauchy Schwartz inequality,

\begin{align} lim_{|Θ_{h}| \to 0} \frac{1}{|Θ_{h}|} \int |g (Θ) (Π_{Θ + Θ_{h}} - Π_{Θ} - Θ_{h}^{t} \nabla_{Θ} Π_{Θ})| μ (dw) \\ \leq | g |_{2} lim_{|Θ_{h}| \to 0} \frac{1}{|Θ_{h}|} \\ \times {(\int {|(Π_{Θ + Θ_{h}} - Π_{Θ} - Θ_{h}^{t} \nabla_{Θ} Π_{Θ})|}^{2} μ (dw))}^{1 / 2} \end{align}

The right side tends to zero because the gradient ∇_ΘΠ_Θexists in L₂norm.

The second term on the right hand side of (25) tends to zero by bounded convergence theorem and mean value theorem (as in Appendix 2), because Π_Θ≤1everywhere for all Θ and as ∇g is uniformly bounded in a neighborhood of Θ by an integrable function.

The third term of (25) tends to zero by Cauchy Schwartz inequality and by the continuity of the stationary density Π in L₂ norm, and by uniform boundedness of the function (∇_Θg)in a neighborhood of Θ by a square integrable function. ▀

References

Giannakis GB, Hua Y, Stoica P, Tong L: Signal Processing Advances in Wireless and Mobile Communications, Trends in Channel Estimation and Equalization, vol. 1. Upper Saddle River, NJ: Prentice Hall; 2000.
Google Scholar
Lee EA, Messerschmitt DG: Digital Communications,. : Kluwer Academic Publishers; 1994.
Book Google Scholar
Proakis JG: Digital Communications. New York: McGraw-Hill; 2000.
MATH Google Scholar
Haykin S: Adaptive Filter Theory,. : Prentice-Hall International inc; 1996.
MATH Google Scholar
Sayed AH: Fundamentals of Adaptive Filtering. New York: Wiley; 2003.
Google Scholar
Benveniste A, Metivier M, Priouret P: Adaptive Algorithms and Stochastic Approximation. : Springer; 1990.
Book MATH Google Scholar
Macchi O, Eweda E: Convergence analysis of self-adaptive equalizers. IEEE Trans. Inf. Theory 1984, 30(2):161-176. 10.1109/TIT.1984.1056896
Article MathSciNet MATH Google Scholar
Kavitha V, Sharma V: Analysis of an LMS linear equalizer for fading channels in decision directed mode. In 13th European Wireless Conference. : ; (Paris, France, Apr 2007)
Kavitha V, Sharma V: Tracking performance of an LMS-linear equalizer for fading channels. In 44th Annual Allerton Conference on Communication, Control and Computing. : ; (USA, Sept 2006)
Kommininakis C, Fragouli C, Sayed A, Wesel R: Multiple-input multiple-output fading channel tracking and equalization using Kalman estimation. IEEE Trans. Signal Process 2002, 50(5):1064-1076.
Google Scholar
Belfiore CA, Park Jr. JH: Decision feedback equalization. Proc. IEEE 1979, 67(8):1143-1156.
Article Google Scholar
Taylor DP, Vitetta GM, Hart BD, Mammela A: Wireless channel equalization. Eur. Trans. Telecomun 1998, 9: 117-143. 10.1002/ett.4460090204
Article Google Scholar
Erdogan AT, Hassibi B, Kailath T: MIMO decision feedback equalization from an H∞ perspective. IEEE Trans. Signal Process 2004, 52(3):734-745. 10.1109/TSP.2003.822289
Article MathSciNet Google Scholar
Cioffi JM, Dudevoir CP, Eyuboglu MV, Forney Jr. GD: MMSE decision-feedback equalizers and coding-part i: general results. IEEE Trans. Commun 1995, 43: 2582-2594. 10.1109/26.469441
Article MATH Google Scholar
Tidestav C, Ahlen A, Sternad M: Realizable MIMO decision feedback equalizers: structure and design. IEEE Trans. Signal Process 2001, 49(1):121-133. 10.1109/78.890353
Article Google Scholar
Sternad M, Ahlen A, Lindskog E: Robust decision feedback equalizers. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process, vol. 3. : ; (Minneapolis, MN, Apr 1993), pp. 555–558
Kushner HJ, Yin G: Stochastic Approximation Algorithms and Applications. : Springer; 1997.
Book MATH Google Scholar
Kavitha V, Sharma V: Tracking Analysis of an LMS Decision Feedback Equalizer for a Wireless Channel,. Technical report no: TR-PME-2006-19, DRDO-IISc program on mathematical engineering, ECE Dept., IISc, Bangalore, October 2006. (downloadable from âˆ–_rep06.html) (Short version available in Proc. EuroWireless 2007) http://www.pal.ece.iisc.ernet.in/PAM/tech
Meyn SP, Tweedie RL: Markov Chains and Stochastic Stability, Communications and Control Engineering Series. New York: Springer; 1993.
Book MATH Google Scholar
Thorisson H: Coupling, Stationarity, and Regeneration, Probability and its Applications. New York: Springer; 2000.
Book MATH Google Scholar
Berger MS: Nonlinearity and Functional Analysis. New York: Academic Press; 1977.
MATH Google Scholar
Yoshida K: Functional Analysis. Heidelberg: Springer; 1995.
Google Scholar
Rudin W: Functional Analysis. New York: McGraw-Hill; 1973.
MATH Google Scholar
Bhatia R: Matrix Analysis. New York: Springer; 1997.
Book MATH Google Scholar
Sundaram RK: A First Course in Optimization Theory. Cambridge: Cambridge University Press; 1996.
Book MATH Google Scholar

Download references

Acknowledgements

This work was partially supported by DRDO-IISc program on Mathematical Engineering. Parts of this article were presented in Allerton 2006. This work was mainly done when the first author was a PhD student at IISc, Bangalore.

Author information

Authors and Affiliations

Indian Institute of Technology, Mumbai, India
Veeraruna Kavitha
ECE Department, Indian Institute of Science, Bangalore, India
Vinod Sharma

Authors

Veeraruna Kavitha
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veeraruna Kavitha.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kavitha, V., Sharma, V. Optimal MSE solution for a decision feedback equalizer. EURASIP J. Adv. Signal Process. 2012, 172 (2012). https://doi.org/10.1186/1687-6180-2012-172

Download citation

Received: 19 February 2012
Accepted: 08 July 2012
Published: 16 August 2012
DOI: https://doi.org/10.1186/1687-6180-2012-172

Optimal MSE solution for a decision feedback equalizer

Abstract

Introduction

The model and notation

The issues and our approach

Analysis of LMS-DFE and DFE-WF

Previous ODE approximation result

Theorem 1

Stationary distribution and a simplified ODE

Theorem 2

Differentiability of the stationary density

Theorem 3

Theorem 4

LMS attractors versus WF at high SNRs

Theorem 5

Numerical examples

Tracking analysis

Conclusions

Appendices

Appendix 1

Appendix 2

Lemma 1.

Lemma 2.

Appendix 3

Appendix 4

Remark on existence of LMS attractors

Appendix 5

Lemma 3.

Lemma 4.

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords