 Research
 Open Access
 Published:
Reduced complexity turbo equalization using a dynamic Bayesian network
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 136 (2012)
Abstract
It is proposed that a dynamic Bayesian network (DBN) is used to perform turbo equalization in a system transmitting information over a Rayleigh fading multipath channel. The DBN turbo equalizer (DBNTE) is modeled on a single directed acyclic graph by relaxing the Markov assumption and allowing weak connections to past and future states. Its complexity is exponential in encoder constraint length and approximately linear in the channel memory length. Results show that the performance of the DBNTE closely matches that of a traditional turbo equalizer that uses a maximum a posteriori equalizer and decoder pair. The DBNTE achieves full convergence and nearoptimal performance after small number of iterations.
Introduction
Turbo equalization has its origin in the Turbo Principle, first proposed in[1] where it was applied to the iterative decoding of concatenated convolutional codes. The concept of Turbo Coding was subsequently applied to equalization in[2, 3] by viewing the frequency selective channel as an inner code and replacing one of the convolutional maximum a posteriori (MAP) decoders with a MAP equalizer. The MAP equalizer and the MAP decoder iteratively exchange extrinsic information. By iterating the system a number of times, the bit error rate (BER) performance can be improved significantly, but at the cost of additional computational complexity, especially if the frequency selective channel has long memory.
Turbo equalization becomes exceedingly complex in terms of the number of computations, due to the high computational complexity of the MAP equalizer and MAP decoder most often used in a turbo equalizer. The complexity of the MAP equalizer is linear in the data block length N, but grows exponentially with an increase in channel memory. Its complexity is therefore O(N M^{L−1}), where L is the channel impulse response (IR) length, and M is the modulation alphabet size. Similarly, the complexity of the MAP decoder is linear in the data block length and exponential in the encoder constraint length K.
It is however suggested in[4], and discussed in length in[5], that an MMSE equalizer can be modified for use in a turbo equalizer, to take advantage of prior information on the symbols to be estimated. By replacing the optimal MAP equalizer with a suboptimal, low complexity MMSE equalizer, low complexity turbo equalization is achieved, while still achieving matched filter bound performance using static channels of length five[5]. It was also shown in[5] how a decision feedback equalizer can be used in a turbo equalizer configuration. Also, a softfeedback equalizer (SFE) was proposed in[6] with performance superior to that proposed in[5]. The author of[6] expanded upon ideas proposed in[5], where hard decisions on the equalizer output are fed back by combining prior information with soft decisions[6]. The performance of the SFE turbo equalizer was evaluated for a magnetic recording channel (9 taps), a microwave channel (44 taps), and a powerline channel (58 taps), outperforming the low complexity turbo equalizers proposed in[5], while doing so at reduced complexity. Still, neither of the low complexity turbo equalizers proposed in[4–6] can achieve optimal or even nearoptimal results, due to the suboptimality of their constituent softinput softoutput equalizers.
An interleaver is used as standard in a turbo equalizer not only to mitigate the effect of burst errors by randomizing the occurrence of bit errors in a transmitted data block, but also to aid in the dispersion of the positive feedback effect, which is due to the fact that the MAP algorithm used for equalization and decoding produces outputs that are locally highly correlated[4]. When a random interleaver is used, the Markov assumption, stating that the current state is only dependent on a finite history of previous states, is violated since the interleaver randomizes the encoded data according to some predetermined random permutation. The Markov assumption therefore fails and the turbo equalizer can no longer be modeled as a directed acyclic graph (DAG) to form a cyclefree decision tree. As a result, much attention has been given to approximate inference using belief propagation on graphs with cycles. The junction tree algorithm is used to combine nodes into supernodes until the graph has no cycles[7], with an exponential growth in complexity as nodes are combined. Apart from the very high computational complexity of this approach, it has been shown in[8–10] that exact inference is not guaranteed on graphs with cycles.
In this article, a low complexity nearoptimal dynamic Bayesian network turbo equalizer (DBNTE) is proposed. The DBNTE is modeled as a DAG, while relaxing the Markov assumption. The DBNTE model ensures that there is always one dominant connection between a given hidden state and its corresponding observation, while there may be many weak connections to past and future hidden states. The computational complexity of the DBNTE is exponential in decoder constraint length and approximately linear in the channel memory length. Additional complexity is due to the channel memory, but is only approximately linear since it does not increase the size of the state space, but merely increases the summation terms in the sensor model. Results show that the performance of the DBNTE closely matches that of a conventional turbo equalizer in Rayleigh fading channels, achieving full convergence after only a small number of iterations.
This article is structured as follows. Section 2 provides a brief overview of conventional turbo equalization. Section 3 presents a discussion on the implications of modeling a turbo equalizer as a DAG and quasiDAG, while a theoretical discussion on the iterative convergence of a quasiDAG is discussed in Section 4. The DBNTE formulation is discussed in Section 5 and a complexity comparison between the DBNTE and the conventional turbo equalizer is shown in Section 6. Section 7 presents simulation results and conclusions are drawn in Section 8.
Turbo equalization
A turbo decoder uses two MAP decoders to iteratively decode convolutional coded concatenated codes. Like the MAP equalizer, the MAP decoder produces posterior probabilistic information on the source symbols. The output of each decoder is therefore used to produce prior probabilistic information about the input symbols of the other decoder, thus allowing this scheme to exploit the inherent structure of the code to correct errors with each iteration[11], achieving near Shannon limit performance in AWGN channels[1].
Since the communication channel can be viewed as a nonbinary convolutional encoder, the channel can be viewed as an innercode while a convolutional encoder is used as an outercode in much the same way as in turbo coding[3], so that the turbo principle can be applied to channel equalization. As such, one of the MAP decoders in the turbo decoder is substituted with a MAP equalizer to mitigate the effect of the channel on the transmitted symbols (to “decode” the ISIcorrupted received symbols)[3]. The output of the MAP equalizer is used to produce prior probabilistic information on the encoded symbols, which is exploited by the MAP decoder. In turn, the output of the MAP decoder is used to produce prior probabilistic information on the unequalized received symbols, which is again exploited by the MAP equalizer. By iterating this system a number of times, the performance of the system can be enhanced greatly[2–5].
Figure1 shows the structure of the turbo equalizer. The MAP equalizer takes as input the ISIcorrupted received symbols r and the extrinsic information${L}_{e}^{D}\left(\widehat{\mathbf{s}}\right)$ and produces a sequence of posterior transmitted symbol loglikelihood ratio (LLR) estimates${L}^{E}\left(\widehat{\mathbf{s}}\right)$ (note that${L}_{e}^{D}\left(\widehat{\mathbf{s}}\right)$ is zero during the first iteration). Extrinsic information${L}_{e}^{E}\left(\widehat{\mathbf{s}}\right)$ is determined by
which is deinterleaved to produce${L}_{e}^{E}\left({\widehat{\mathbf{s}}}^{\prime}\right)$, which is used as input to the MAP decoder to produce a sequence of posterior coded symbol LLR estimates${L}^{D}\left({\widehat{\mathbf{s}}}^{\prime}\right)$.${L}^{D}\left({\widehat{\mathbf{s}}}^{\prime}\right)$ is used together with${L}_{e}^{E}\left({\widehat{\mathbf{s}}}^{\prime}\right)$ to determine the extrinsic information
${L}_{e}^{D}\left({\widehat{\mathbf{s}}}^{\prime}\right)$ is interleaved to produce${L}_{e}^{D}\left(\widehat{\mathbf{s}}\right)$.${L}_{e}^{D}\left(\widehat{\mathbf{s}}\right)$ is used together with the received symbols r in the MAP equalizer, with${L}_{e}^{D}\left(\widehat{\mathbf{s}}\right)$ serving to provide prior information on the received symbols. The equalizer again produces posterior information${L}^{E}\left(\widehat{\mathbf{s}}\right)$ of the interleaved coded symbols. This process continues until the outputs of the decoder settle, or until a predefined stopcriterion is met[3]. After termination, the output$L\left(\widehat{\mathbf{u}}\right)$ of the decoder gives an estimate of the source symbols.
The power of turbo equalization lies in the exchange of extrinsic information${L}_{e}^{E}\left(\widehat{\mathbf{s}}\right)$ and${L}_{e}^{D}\left({\widehat{\mathbf{s}}}^{\prime}\right)$ between the equalizer and the decoder. By feeding back the extrinsic information, without creating selffeedback loops, the correlation between prior information and output information is minimized, allowing the system to converge to an optimal state in the solution space[4, 5]. If information is exchanged directly between the equalizer and the decoder by ignoring interleaving and/or extrinsic information, selffeedback loops will be formed. This will cause minimal performance gains, since the equalizer and the decoder will inform each other about information already attained in previous iterations[4].
Modeling a turbo equalizer as a quasiDAG
Suppose a wireless communication system generates a column vector of source bits s of length N_{ u } and s is encoded by a convolutional encoder of rate R_{ c }=1/n, producing a coded bit sequence c of length N_{ c }=N_{ u }/R_{ c }. Now suppose that the coded bits sequence is interleaved using a random interleaver, which produces a bit sequence ć of length N_{ c }, which is transmitted. The resulting symbol sequence to be transmitted is given by
where ^{T} denotes the transpose operation and G is an N_{ u }×N_{ c } matrix
representing the convolutional encoder, where
is the generator matrix of a rate R_{ c }=k/n (k=1) convolutional encoder with constraint length K and J is the N_{ c }×N_{ c } interleaver matrix. Now suppose the symbol sequence ć is transmitted over a singlecarrier frequencyselective Rayleigh fading channel with a timeinvariant IR h of length L, the received symbol sequence is given by
where H is the N_{ c }×N_{ c } channel matrix with the channel IR h={h_{0}, h_{1}, …, h_{L−1}}^{′}on the diagonal such that
and n is a complex Gaussian noise vector with 2N_{ c }samples (N_{ c } for real and N_{ c } for imaginary) from the distribution$\mathcal{N}(0,{\sigma}^{2})$.
Figure2a shows a graphical model of the transmission model in (6), without noise, where it is assumed that J=I where I is an N_{ c }×N_{ c } identity matrix (i.e., no interleaving is performed) where R_{ c }=1/3 and L=2. It shows that every uncoded bit s_{ k }produces${{R}_{c}}^{1}=3$ coded bits c_{ k }^{′}, c_{ k }^{′} + 1 and c_{ k }^{′} + 2, where k^{′}=((k−1)/R_{ c }) + 1 (k runs from 1 to N_{ u } and k^{′} runs from 1 to N_{ c }). Each received symbol can be expressed as
where h={h_{0}, h_{1}}^{′} is the channel IR. Note that h_{0} and h_{1} are not shown in Figure2a. This equalizationanddecoding problem can be modeled as a DAG, and the forward–backward algorithm can be used to optimally estimate c, and hence s, with relative ease, since there exists a onetoone relationship between the observed variables r and the hidden variables c. There also exists a relationship between consecutive codewords (groups of n bits). Figure2a also depicts the causality relationship between the hidden variables and the observed variables.
Now consider Figure2b. It shows a graphical model of the transmission model in (6), again without noise, but now J is a random N_{ c }×N_{ c } interleaver matrix and again R_{ c }=1/3 and L=2. Each received symbol can be expressed as
where${\u0107}_{{k}^{\prime}}$ is the k^{′}th interleaved symbol. It is clear from Figure2b that there is no obvious relationship between the observed variables r and the hidden variables c and that the causality relationship in Figure2a is destroyed due to the randomization effect of the interleaver. Moreover the relationship between consecutive codewords (groups of n bits) is also destroyed. This problem can therefore no longer be modeled as a DAG and exact inference is in fact impossible[8].
Deinterleaving the received sequence r will ensure that the onetoone relationship between each element in r and c is restored, but only with respect to the first coefficient h_{0}of the IR h. If h_{0} is dominant and if h is sufficiently short, approximate inference is possible due to the negligible effect of h_{1} to h_{L−1} on r, but this is not normally the case. In a wireless communication system, transmitting information through a realistic frequencyselective Rayleigh fading channel h_{0} cannot be guaranteed to be dominant and the contribution of h_{1}to h_{L−1} is not negligible, and therefore this approach will fail. This has been simulated and verified by the authors. Another viable alternative is to model the system as a loopy graph in order to perform approximate inference as in[12], but as stated before, exact inference is impossible[8–10] and full convergence is not guaranteed[13]. In loopy graphs, convergence is normally achieved after many iterations.
The proposed DBNTE addresses this problem by modeling the turbo equalizer as a quasiDAG by applying a transformation to the ISIcorrupted received symbols, in order to ensure that there will always exist a dominant connection between the hidden variable (codeword symbols) and the observed variable (received symbols) at a given time instant, and only weak connections between the observed variable at the current time instant and hidden variables at past and future time instances. With this transformation the DBNTE achieves full convergence and is able to perform nearoptimal inference in a small number of iterations, in order to estimate the coded sequence c, and hence the uncoded sequence s.
Theoretical considerations
In this section, we consider a simplified version of the abovementioned decoding problem. There are only two hidden and two observation variables, with one weak connection, between the observed value at the time instant 2 and a hidden variable at time instant 1. The purpose of the section is to illustrate, and mathematically analyze, in a clearcut example how iteration can handle weak connections. Although the analysis in this section is only valid under restrictive assumptions, such as that there is only one weak connection, we end the section with remarks suggesting that it can be generalized.
Consider a directed graph with (hidden) state variables X_{1} and X_{2}, and observables E_{1}and E_{2} (see Figure3). We assume that these four random variables are binary.
As already mentioned, we assume that the observable variable E_{2} depends not only on the hidden variable X_{2}, but also on the hidden variable X_{1}. One way to get back to a classical hidden Markov chain problem is to combine X_{1} and X_{2} into “super nodes.” This increases the size of the state space and hence the computational complexity. We therefore use an iterative method instead, which is explained in Section 5.
We describe the iteration below, but first an informal definition of the weak connection between hidden variable X_{1} and observed variable E_{2}. For the current discussion, it is adequate to understand with a “weak connection” that the conditional probability distribution of the observed variable E_{2}given X_{2} and {X_{1}=0} is close to the distribution given X_{2}and {X_{1}=1}. The answer to the question of how close these two conditional distributions have to be, can in principle be determined from the proof of Lemma 1 below. Similar to the preceding section, our diagram is not a tree but a loopy DAG (or quasiDAG).
To describe the iteration procedure, recall that for a hidden Markov model, the event matrix at position (k,m) is the probability that the observation m is made, given that the value of the hidden state is k. We will modify the forward–backward algorithm (described in Section 5.2.4) to compute the distribution of states given the evidence.
We now describe the iteration. Step 1 is to start with an initial estimate, say (i,j) of the states (X_{1},X_{2}). Step 2 is to modify the forward–backward algorithm by modifying the event matrix that is used at time 2: change the entry in position (k,m) to
Use the modified algorithm to compute the distribution of each state given the evidence. Find the most likely states from the distribution.
Step 2 can be iterated several times, but each time with the most recently obtained sequence of states in the place of the initial estimate (i,j). The claim is that these iterations will converge if the connection between E_{2} and X_{1} is “weak enough.”
In the rest of this section, we will interpret what “weak enough” means through notions of real analysis such as metric spaces and ε−δ description of continuity. The readers unfamiliar with these concepts can skip now to the next section without breaking the flow of their reading.
Note that we can put a metric (distance function) on the collection of event matrices, for example Euclidean distance will do.
The crux of the lemma is illustrated in Figure4.
Lemma 1
Assume that at all iterations, the modified forward–backward estimates of the probabilities P(X_{1}=1E_{1}=r,E_{2}=s) and P(X_{2}=1E_{1}=r,E_{2}=s) are different from$\frac{1}{2}$ and 0. Then, if the event matrices, corresponding to the different possibilities for the previousiterate sequence of bits, are within sufficiently small distance of each other, the iteration procedure will converge to a fixed point; in fact the first and second iterations will already be identical.
Proof
It is easy to show that for any fixed observation sequence (E_{1},E_{2})=(r,s) the dependence, of the posterior probability distribution of (X_{1},X_{2}), on the choice of event matrices, is that of a continuous mapping. We can assume without loss of generality that the iteration is initialized at the state sequence (0,0), because the argument below will apply to any initialization.
Let (p_{1},p_{2}) be the modified forward–backward estimates obtained, for the probabilities P(X_{1}=0,X_{2}=0E_{1}=r,E_{2}=s), when it is initialized at (0,0). Because these estimates are assumed to be different from$\frac{1}{2}$, the point (p_{1},p_{2}) belongs to the interior of one of the quadrants of the square [0,1]×[0,1]. Since it is in the interior, there is an ε>0 so that the ball with midpoint (p_{1},p_{2}) and radius ε is contained in the same quadrant.
Our iteration procedure will now move to step 2 and initialize the modified forward–backward algorithm with the output of the first forward–backward run, which is the corner nearest to (p_{1},p_{2}). Now, by the abovementioned continuity of the posterior distribution on the used event matrices, there is a δ>0 such that if the all the event matrices are within distance δ of the event matrix corresponding to the aboveused initial guess (0,0), then the output of the modified forward–backward algorithm will be within distance ε of (p_{1},p_{2}). Therefore, it will be in the same quadrant as (p_{1},p_{2}). So, rounding to the nearest corner will give the same corner as was yielded by the previous iteration.
We end this section by briefly discussing why this result can be expected to have a version that is also applicable to the less restrictive setup of the previous and next sections.

More than two state variables: With a number of state variables n, the square in the proof will become a hypercube in n dimensions, with 2^{n}corners, each corner representing a possible previous sequence of estimates of each of the n state variables. The proof will carry over; the ε−δ description of continuous mappings is still applicable without modification.

Event matrices not “close enough:” If the first estimate of the hidden states is good, it is likely that it will be mapped to a quadrant, of which the corner will be mapped to the right quadrant in the next iteration. In this case more than one iteration will be needed.

More weak connections: With more than two state variables, it becomes possible that E_{ k }is connected with X_{ l }for some l<k−1. The entries of the event tables will then have to be set to a probability that is conditioned on a sequence longer than that appearing in equation (10). This does not affect the proof, as long as the connections are weak enough. Here the connections being weak enough means that the event matrices that are possible due to different possible observation histories, are all close enough in the sense of the lemma.
□
DBN turbo equalizer
In this section, the DBNTE is discussed. The first part explains the transformation that is applied to the channel matrix in order to ensure that there exists a dominant connection between the observed variables and their corresponding hidden variables, and the second part explains the operation of the DBNTE, which jointly models the equalization and decoding stages on a quasiDAG as defined in the previous section.
Transformation
For this exposition, assume that the coded symbols c are transmitted through a channel Q=H J, where H is the channel matrix and J is the interleaver matrix as previously defined. Therefore, Equation (6) can be written as
For the DBNTE to perform approximate inference there must exist a strong connection between the observed variable and the hidden variable at time instance k, and weak connections must exist between the observed variable at time instance k and hidden variables at other time instances. The randomization effect of the interleaver must also be mitigated in order for the turbo equalizer to be modeled as a quasiDAG so that there can exist a onetoone relationship (dominant connection) between the observed variable and the corresponding hidden variable at time instance k.
To ensure that the connections between the observed variable and neighboring hidden variables are weak, the energy must be concentrated in the first tap h_{ o }of h. This can be achieved by applying a minimum phase prefilter to the received symbols r[14]. This process produces a filtered received symbol sequence and a minimum phase channel IR.
In order to model the turbo equalization problem as a quasiDAG, the randomization effect of the random interleaver must be mitigated. Figures5 and6 show ∥Q∥ for systems with a channel IR lengths of L=1 and L=3, respectively, for a hypothetical system with parameters N_{ u }=50, N_{ c }=150, R_{ c }=1/3 at a mobile speed of 3 km/h and no frequency hops. It should be clear that any sequence c that is transmitted through a channel Q, as described in (11), will be subject to randomization.
To mitigate the effect of the interleaver, the following transformation is applied to r:
which is equivalent to transmitting the coded symbol sequence c through a channel U, where
so that
Figures7 and8 show ∥U∥ for systems with channel IR lengths of L=1 and L=3, respectively. It is clear that this transformation mitigates the randomness exhibited in Q, since the new “channel” U is diagonally dominant. The onetoone relationship between the observed variables and the corresponding hidden variables are therefore restored. Minimumphase filtering of r is performed before performing the transformation in (12).
Therefore, applying a minimumphase filter to r and performing the transformation in (14), all the conditions are met to model the turbo equalizer as a quasiDAG with dominant connections between the observed variables and their corresponding hidden variables. Minimumphase filtering ensures that a dominant connection exists between the observed variable and the hidden variable at time instance t, and that there exist weak connections between the observed variable at time instance t and the hidden variable at other time instances, while the transformation in (14) mitigates the randomization effect of the interleaver so that there exists a onetoone relationship between each observed variable and its corresponding hidden variable. By performing the transformation described here, all conditions for convergence as described in Section 4 are met.
The DBNTE algorithm
After making preparation for the turbo equalization problem to be modeled as a quasiDAG, as explained in the previous section, the DBNTE algorithm can be executed. In this section, various aspects of the DBNTE are discussed after which a stepbystep summary is provided in Section 5.2.8 in the form of a pseudocode algorithm, encapsulating the working of the DNBTE.
We assume a system with a uncoded block length of N_{ u }, using the rate R_{ c }=1/3, constraint length K=3, convolutional encoder in Figure9 to produce N_{ c }bits, where N_{ c } = N_{ u }/R_{ c }. The coded bits are interleaved with a random interleaver and passes through a multipath channel with an IR of length L.
Graph construction
The graph is constructed to model the possible outputs of a convolutional encoder in much the same way as a trellis is constructed in a conventional MAP decoder. For the DBNTE graph the number of states per time instance t is equal to the number of possible state transitions, given by M=2^{K}, where K is the encoder constraint length. This is different from the number of states in a MAP decoder, which is equal to the number of possible states, given by M=2^{K−1}. The number of time instances in the DBNTE graph is equal to the number of uncoded bits N_{ u }, which is also equal to the number of codewords. Figure10a,b shows the graphical model of a DBNTE for the first and subsequent iterations, respectively, where the dashed lines in Figure10b depict weak connections due to ISI. Each X_{ t } on the graph contains a set of M state transitions$\left(\right)close="">{x}_{t}^{\left(m\right)}$, where t=1,2,…,N_{ u }and m=1,2,…,M.
During the first iteration no coded bit estimates$\stackrel{~}{\mathbf{c}}$ are available, so the graphical model is a pure DAG due to the fact that the current state is only dependent on the previous state. Hence only U_{t,t} is used in the cost function of the sensor model, where U_{t,t}is a coefficient on the diagonal of the new channel matrix U in (14). After the first iteration estimates of the coded bits$\stackrel{~}{\mathbf{c}}$ are produced and can therefore be used in subsequent iterations. During subsequent iterations then, U_{t,u} and U_{t,v} are also considered in the cost function of the sensor model, where u=1,2,…,t−1 and v=t + 1,t + 2,…,N_{ u }.
State transition output table
The output associated with each state transition is also tabulated using the encoder state diagram in Figure11. The output produced by each state transition$\left(\right)close="">{x}_{t}^{\left(m\right)}$, m=1,2,…,8, is determined by loading the bitvalues of the current state into the leftmost K−1=2 fields of the convolutional encoder in Figure9, and then placing a 0 and a 1, respectively, on the input of the encoder, each producing a new codeword c^{(1)}c^{(2)}c^{(3)}at the output. This process is followed exhaustively and tabulated. Table1 shows the state transition outputs of the encoder in Figure9 that results from moving from one state to the next.
Transition probability table
The DBNTE depends on a transition model to describe the permissible state transitions. This is constructed by examining the encoder state transition diagram in Figure11 and noting the possible state transitions. The solid lines and dashed lines indicate state transitions caused by ones and zeros at the input of the encoder. It is clear from Figure11 that only two state transitions emanate from any given state transition (one caused by a 1 at the input and one caused by a 0 at the input). Table2 shows the transition probabilities of the encoder in Figure9 of which the state transition diagram is shown in Figure11.
The forward–backward algorithm
The forward–backward algorithm computes the distribution over past states given evidence up to the present[15]. It determines the exact MAP distribution P(X_{ k }e_{1:t}) for 1 ≤ k <t, where e_{1:t}is a sequence of observed variables from time 1 to t. This is done by calculating two evidence “messages”—the forward message from 1 up to k and the backward message from k + 1 up to t.
Forward message
The forward message computes the posterior distribution over the future state, given all evidence up the current state. To compute the forward message, the current state is projected forward from time t to time t + 1 and is then updated using the new evidence e_{t + 1}. To obtain the prediction of the next state it is necessary to condition on the current state X_{ t }, hence:
where α is a normalization constant, P(e_{t + 1}X_{t + 1}) is obtained from the sensor model, P(X_{t + 1}x_{ t }) is the transition model and P(x_{ t }e_{1:t}) is the current state distribution. The forward message can be computed recursively using (15).
Backward message
The backwards message is computed in a similar fashion. It computes the posterior distribution over past state, given all future evidence up to the current state. Whereas the forward message is computed forwards from 1 to k, the backwards message is computed backwards from t to k + 1. Thus, the backwards message determines
where P(e_{k + 1}x_{k + 1}) is obtained from the sensor model, P(x_{k + 1}X_{ k }) is the transition model and P(e_{k + 2:t}x_{k + 1}) is the current state distribution. The backward message can be computed recursively using (16).
Forward–backward message
Finally, by combining the forward and backward message, the posterior distribution over all states at any time instance 1 ≤ k <t can be determined as
Sensor model
The conditional probabilities obtained from the sensor model for the respective forward and backward messages, P(e_{t + 1}X_{t + 1}) and P(e_{k + 1}x_{k + 1}) are determined by calculating a metric between the observed variable e_{ t } and the hidden variable${x}_{t}^{\left(m\right)}$, where m=1,2,…,M. The observed variable e_{ t } consists of${{R}_{c}}^{1}$ received symbols r_{ t }^{′} + 1 to${r}_{{t}^{\prime}+{{R}_{c}}^{1}}$, where t^{′}=((t−1)/R_{ c }) + 1 (t runs from 1 to N_{ u } and t^{′} runs from 1 to N_{ c }), and the hidden variable${x}_{t}^{\left(m\right)}$ consists of the output (c^{(1)}, c^{(2)}, and c^{(3)}) associated with state transition${x}_{t}^{\left(m\right)}$ in Table1.
During the first iteration, only the dominant connection U_{ t }^{′},t^{′} between the observed variable e_{ t } and the hidden variable x_{ t } is used in the cost calculation, since no coded bit estimates$\stackrel{~}{\mathbf{c}}$ are available at that point. Recall that U is the new channel matrix brought about by performing the transformation in (12), resulting in the new transmission model in (14) having the desired properties to model the system as a quasiDAG. During the first iteration the system can therefore be modeled as a pure DAG, as if no ISI occurred. Given a hidden variable or state transition output${x}_{t+1}^{n}$, its associated bits (as tabulated in Table1) are used together with observed variables/received symbols r_{ t }^{′} + 1 to${r}_{{t}^{\prime}+{{R}_{c}}^{1}}$, where t^{′}=((t−1)/R_{ c }) + 1 and${{R}_{c}}^{1}$ is the number of encoder output bits, to calculate the cost of the n th state transition at time instance t
where t=1,2,…,N_{ u } runs over time for the uncoded bit estimates and β(.) is a function that produces an optimization scaling factor for each iteration i. P(e_{t + 1}X_{t + 1}) in (15) is determined by
where α is a normalization constant and σ is the noise standard deviation.
During subsequent iterations, LLR estimates of the uncoded bits are available, since the first set of LLRs are produced after the first iteration. Therefore, the system can be modeled as a quasiDAG due to the fact that there exists a dominant connection between the observed variable e_{ t } and the hidden variable x_{ t } and weak connections between the observed variable e_{ t }and other hidden variables. Thus we also use the rest of the coefficients U_{ t }^{′},1 to${U}_{{t}^{\prime},{N}_{c}}$ (and not only U_{ t }^{′},t^{′}) in the cost calculation. Analogous to the first iteration, the output bits associated with a given state transition${x}_{t+1}^{n}$ are used together with observed variables r_{ t }^{′} + 1 to${r}_{{t}^{\prime}+{{R}_{c}}^{1}}$as well as the LLR estimates$\stackrel{~}{\mathbf{c}}$ of the uncoded bits to calculate the cost of the n th state transition at time instance t. Therefore, the cost of the n th state transition for subsequent iterations at time instance t is given by
The last term in (20) contains the ISI terms that must be subtracted from the received symbols in order to minimize$\left(\right)close="">{\Delta}_{t}^{n}$ so that$\mathbf{P}\left({\mathbf{e}}_{t+1}\right{x}_{t+1}^{n})$, determined as in (19) can be maximized.
Computing LLR estimates
After the forward and backward messages are combined as in (17), the LLRs for each uncoded bit is determined from the graph.${{R}_{c}}^{1}$ LLR vectors of length N_{ u }are determined—each one corresponding to one output bit of the encoder—after which they are multiplexed to form one vector of length N_{ c } containing the LLR estimates$\stackrel{~}{\mathbf{c}}$ of the coded bits c. With reference to the state transitions in Table1, the LLRs for the convolutional encoder in Figure9, are determined as follows:
The final LLR vector is constructed by multiplexing the respective LLR vectors such that
which is used in (20) in the next DBNTE iteration.
Optimization
To improve the BER performance of the system, simulated annealing is used[15]. Simulated annealing is usually used in neural networks to allow the network to escape suboptimal basins of attraction in order to converge to a nearoptimal solution in the solution space. Since the DBNTE employs a softfeedback mechanism, simulated annealing can also be applied to the coded symbol estimates$\stackrel{~}{\mathbf{c}}$ that are fed back after the first iteration, thus allowing the DBNTE to converge to a state where the BER performance is nearoptimal. The optimization scaling function β(.) in (18) and (20)
where Z is the number of iterations, is updated with each iteration i, always starting at 0<β(1)<<1 for the first iteration (i=1) and finishing at β(Z)=1 for the final iteration (i=Z). Figure12 shows β(i) for Z=3, Z=4 and Z=5. Simulation results in Section 7 show the performance of the DBNTE with and without simulated annealing.
Pseudocode algorithm
Additional file1a–e shows the pseudocode of the DBNTE algorithm. The DBNTE algorithm iteratively computes the forward–backward message before producing LLR estimates of the coded symbols. After the final iteration the estimated turbo equalized uncoded information bits are returned.
Function definitions
DBN_TE receives as input the received ISIcorrupted coded symbols, the transition probability table, the transition output table, the codeword length, the coded data block length, the number of states of the graph, the new channel after transformation, the number of iterations and the noise standard deviation.
FORWARD_MESSAGE and BACKWARD_MESSAGE receive as input the same variable as DBN_TE, accept for the number of iterations Z, and additionally they also received the LLR estimates and the optimization scaling factor as input.
FORWARD_BACKWARD_MESSAGE receives only the result of FORWARD_MESSAGE and BACKWARD_MESSAGE, while LLR_ESTIMATES takes as input the forward–backward message resulting from FORWARD_BACKWARD_MESSAGE as well as the state transition output table.
Optimization scaling factor
For each iteration a new optimization scaling factor is produced, as explained earlier. BETA implements (25) and returns the scaling factor used in the calculation of the forward and backward messages.
Forward message
The algorithm starts by initializing the forward message, based on the initial state of the encoder shift register. Since it is assumed that the encoder shift register always starts in the allzero state, the forward message is initialized accordingly, using the appropriate entries in the transition probability table (Table2). The forward message is therefore initialized as
such that forward(1,:)=[0.5,0.5,0,0,0,0,0,0]^{′} since only state transitions${x}_{t+1}^{\left(1\right)}$ and${x}_{t+1}^{\left(2\right)}$ can emanate from state transitions${x}_{t}^{\left(1\right)}$ and${x}_{t}^{\left(7\right)}$, which lead to the allzero state (see Figure11).
The forward message is calculated next by iterating over time from k=2 to k=N while iterating over M states m=1 to m=M for each k. For each (k,m) pair, the forward message is initialized to zero, after which the message is updated by multiplying and accumulating messages from the previous timestep k−1. The forward messages from the previous timestep k−1 are multiplied with their respective transition probabilities (determined by the current state m) and summed together to form the new forward message at timestep k (at state m in the graph). Up to this point the forward message contains the collective contribution of the state distributions of the previous states, corresponding to the terms inside the summation in (15).
To include the evidence at graph state (k,m), the cost of the state transition associated with state m in the graph at time k must be calculated. The GET_COST functions takes as input the received codeword, LLR estimates of the codeword, the new channel U, the optimization scaling factor and the noise standard deviation, and determines the cost as in (18) for the first iterations and (20) for subsequent iterations. The forward message is updated by multiplying with the output of the normal probability distribution function NORMAL_PDF (implemented as in (19)) which takes the cost and noise standard deviation as input. This completes the forward message. It now fully corresponds to (15). After computing M forward messages at timestep k (one for each of the M states at timestep k), all the messages at timestep k are normalized (NORMALIZE) so as to prevent message values from becoming very small due to multiplication of probabilities, when large data block sizes are used.
Backward message
The backward message is initialized similar to the forward message, based on the final state of the encoder. The encoder is forced into the allzero state and the end of the transmitted data block, and hence the backward message is initialized as
such that backward(N,:) =[0.5,0,0,0,0,0,0.5,0]^{′} since the state transitions emanating from the allzero state,${x}_{t+1}^{\left(1\right)}$ and${x}_{t+1}^{\left(2\right)}$, are preceded by state transitions${x}_{t}^{\left(1\right)}$ and${x}_{t}^{\left(7\right)}$.
The backward message is calculated in a similar fashion as the forward message, accept that iteration over time starts at k=N−1 and ends at k=1. Also note that the cost is not calculated for the current graph state (k,m), but for all those preceding it (at time k + 1). Note the sensor model output is inside the summation in (16), whereas it is outside the summation in (15). The backward message is therefore calculated by accumulating information at the current graph state (at time k) from preceding graph states (at time k + 1). The sensor model in (19) is applied (NORMAL_PDF) to each preceding state, multiplied with the transition probability connecting the current state with the preceding state, and then multiplied with the message that corresponds to the preceding state. This result is then summed together for all M preceding states and stored at the current state.
Forward–backward message
The forward backward message is created by multiplying each corresponding forward and backward message value for each (k,m) pair, and normalizing the results as before.
Calculate LLR estimates
The LLR estimates are calculated in three phase to produce a sequence of N LLR estimates for each output bit of the decoder (assuming the rate R_{ c }=1/3 convolutional encoder in Figure9), after which the three LLR sequences are multiplexed to create a sequence of LLR estimates of length N/R_{ c }. The LLR vectors are calculated by noting the ones and zeros in Table1 that correspond to the respective output bits generated by the encoder. For instance, the first output bit c^{(1)} in Table1 is one for state transitions${x}_{t}^{\left(2\right)}$,${x}_{t}^{\left(3\right)}$,${x}_{t}^{\left(5\right)}$ and${x}_{t}^{\left(8\right)}$, and zero for${x}_{t}^{\left(1\right)}$,${x}_{t}^{\left(4\right)}$,${x}_{t}^{\left(6\right)}$ and${x}_{t}^{\left(7\right)}$. The first LLR vector can therefore be calculated as in (21). The second and third LLR vectors are calculated in the same fashion as in (22) and (23). The GET_LLR function calculates the three LLR vectors, after which these vectors are multiplexed in function MULTIPLEX. The LLR estimates are used in (20) during the next iteration.
Result
The result of the DBNTE algorithm is determined after the final iteration when i=Z, by transforming the LLR sequence corresponding to the first output bit of the encoder, into a bit sequence. The first LLR vector is used because the encoder in Figure9 is systematic.
Complexity analysis
The computational complexity of the DBNTE and the conventional turbo equalizer (CTE) are presented in this section. The complexity equations were derived by counting the number of computations needed to perform Turbo Equalization. The complexity of the DBNTE was determined as
where Z is the number of turbo iterations, M_{ d } is the number of decoder states determined by 2^{K−1}where K is the encoder constraint length, Q is the number of interfering symbols which can be approximated by Q≈2L−1, and R_{ c } is the code rate. The approximation for Q was obtained empirically by calculating the average number of interfering symbols in the new channel U in (13) for a given original channel length L, after the transformation in (12) is applied. The complexity of the CTE was determined as
where M_{ e } is the number of equalizer states determined by 2^{L−1}for BPSK modulation.
Figure13 shows the computational complexity graphs of the DBNTE and the CTE, normalized by the number of coded transmitted symbols, for channel IR lengths from L=1 to L=20, for Z=3 and Z=5. R_{ c }=1/3, N_{ u }=400, N_{ u }=1200 and K=3. From Figure13 it can be seen that the computational complexity of the DBNTE is much higher (10 times higher at L=2) than that of the CTE for systems with channel IR lengths L<6. However, as L increases beyond L=6, the computational complexity of the DBNTE becomes significantly less than that of the CTE. The DBNTE is therefore a good candidate for systems with channel IR lengths L>6. Note that for L=20 the complexity of the DBNTE is four orders less than that of the CTE.
Simulation results
The DBNTE was evaluated in a mobile fading environment for BPSK modulation, where we used the Rayleigh fading simulator in[16] to generate uncorrelated fading vectors. Simulations were performed at varying mobile speeds and different channel IR lengths, where the energy in the channel was normalized such that h^{‡}h=1. The channel IR was “estimated” by taking the mean of the respective fading vectors in order to get estimates for the channel IR coefficients, unless otherwise stated. The uncoded data block length was chosen to be N_{ u }=400 and the coded data block length was N_{ c }=1200, where the rate R_{ c }=1/3 convolutional encoder with generator polynomials g_{1}(x)=1, g_{2}(x)=1 + x,${g}_{3}\left(x\right)=x+{x}^{2}$ in Figure9 was used. Frequency hopping was also employed to reduce the BER.
Figures14 and15 show the performance of the DBNTE and the CTE for different mobile speeds through channels with channel IR lengths of L=6 and L=8, respectively. The frequency was hopped eight times, once for every 150 transmitted symbols. The DBNTE was simulated for Z=3 iterations while the CTE was simulated for Z=5 iterations. From Figures14 and15 it can be seen that the performance of the DBNTE is less than a decibel worse than that of the CTE for mobile speeds of 3 and 20 km/h, while the performance of the DBNTE closely matches the performance of the CTE for a mobile speed of 50 km/h. For mobile speeds of 80 and 110 km/h the DBNTE outperforms the CTE. However, this result is of little practical importance, as the performance of both turbo equalizers at 80 and 110 km/h is no longer acceptable.
Figures16 and17 show the performance of the DBNTE and the CTE for a fixed mobile speed of 20 km/h but with varying numbers of frequency hops, through channels with channel IR lengths of L=6 and L=8, respectively. From Figures16 and17 it can be seen that the performance of the DBNTE is again worse by less than a decibel for zero, two, four, and eight frequency hops.
Figure18 shows the performance of the DBNTE for channel IR lengths of L=5, L=10, and L=20 at a mobile speed of 20 km/h using 8 frequency hops for Z=5 iterations. From Figure18 it is clear that the DBNTE is able to turbo equalize signals in systems with longer memory, due to its low complexity. With reference to Figure13, the number of computations required by the DBNTE for L=10 to L=20 is in the range 10^{3.84}−10^{4.15}, whereas the number of computations required by the CTE is in the range 10^{5.01}−10^{8.32}.
In Section 5.2.7, optimization via simulated annealing was discussed. To demonstrate the effect of simulated annealing in the DBNTE, simulations were performed with and without annealing for a channel IR length of L=6 at speeds of 20 and 50 km/h, while the frequency was hopped four times (once for every 300 transmitted symbols) using Z=3 iterations. Figure19 shows the performance of the DBNTE with and without simulated annealing. It is clear that the application of simulated annealing aids in improving the performance of the DBNTE, where improvements of approximately 1 dB are achieved for the selected scenarios.
In order to demonstrate the speed of convergence of the DBNTE, Figure20 shows the performance of the DBNTE for different numbers of iterations (Z), where the channel IR length is L=5 at a mobile speed of 20 km/h using eight frequency hops. From Figure20 it can be seen that there is no significant increase in performance for Z>3. The DBNTE therefore almost fully converges after only three iterations.
The simulation results in Figures14,15,16,17,18,19 and20 were produced under the assumption that the channel state information (CSI) is known, or that at least the very best estimate thereof is available, due to the averaging of each uncorrelated fading vector as explained earlier. To evaluate the robustness of the DBNTE with respect to CSI uncertainties, the channel was estimated with a least squares (LS) estimator, using various amounts of training symbols. Figure21 shows the performance of the DBNTE and the CTE for a channel IR length of L=6 at a mobile speed of 20 km/h using 8 frequency hops, for 4L, 6L, and 8L training symbols in each frequency hopped segment of the transmitted data block. From Figure21 it can be seen that the performance of the DBNTE degrades, along with that of the CTE, due to insufficient amounts of training symbols, causing an increase in channel estimation errors. Is clear that the DBNTE is as resilient against channel estimation errors as the CTE, as its performance closely matches that of the CTE.
The results presented in this section are selfevident and show that the DBNTE achieves acceptable performance compared to that of the CTE. There is a small degradation in BER performance compared to that of the CTE for all simulation scenarios, while a significant computational complexity reduction is achieved for systems with longer memory. The complexity of the DBNTE is only linearly related to the number of channel IR coefficients, while the complexity of the CTE is exponentially related to the channel IR coefficients, allowing the DBNTE to be applied to systems with longer channels. The fast convergence of the DBNTE demonstrated in Figure20 also adds to the complexity reduction, as full convergence is achieved after only three iterations.
Conclusion
In this article, we have proposed and motivated a turbo equalizer modeled on a DBN which uses belief propagation via the forward–backward algorithm, together with a softfeedback mechanism, to jointly equalize and decode the received signal in order to estimate the transmitted symbols. We have motivated theoretically that this approach guarantees full convergence under certain conditions and we have shown that the performance of the new DBNTE closely matches that of the CTE, with and without perfect CSI knowledge. Its complexity is linear in the coded data block length, exponential in the encoder constraint length, but only approximately linear in the channel memory length, which makes it an attractive alternative for use in systems with highly dispersive channels.
References
 1.
Berrou C, Glavieux A, Thitimajshima P: Near Shannon limit errorcorrection and decoding: turbocodes (1). IEEE International Conference on Communications 1993, 10641070.
 2.
Douillard C, Jezequel M, Berrou C, Picart A, Didier P, Glavieux A: Iterative correction of intersymbol intereference: turboequalization. Eur. Trans. Teleommun. 1995, 6: 507511. 10.1002/ett.4460060506
 3.
Bauch G, Khorram H, Hagenauer J: Iterative equalization and decoding in mobile communication systems. Proceedings of European Personal Mobile Communications Conference (EPMCC) 1997, 307312.
 4.
Koetter R, Tuchler M, Singer A: Turbo equalization. IEEE Signal Process. Mag 2004, 21: 6780. 10.1109/MSP.2004.1267050
 5.
Koetter R, Tuchler M, Singer A: Turbo equalization: principles and new results. IEEE Trans. Commun 2002, 50(5):754767. 10.1109/TCOMM.2002.1006557
 6.
Lopes R, Barry J: The soft feedback equalizer for turbo equalization of highly dispersive channels. IEEE Trans. Commun 2006, 54(5):783788.
 7.
Mackay D: Information Theory, Inference, and Learning Algorithms. 2003.
 8.
Friedman N, Koller D: Probabilistic Graphical Models: Principles and Techniques. 2009.
 9.
Weiss Y: Belief Propagation and Revision in Networks with Loops. 1997.
 10.
Weiss Y, Freeman W: Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural. Comput. 2001, 13: 21732200. 10.1162/089976601750541769
 11.
Proakis J: Digital Communications. 2001.
 12.
Jordan MI, Murphy KP, Weiss Y: Loopy belief propagation for approximate inference: an empirical study. Proceedings of Conference on Uncertainty in Artificial Intelligence 1999, 467475.
 13.
Weiss Y: Correctness of local probability propagation in graphical models with loops. Neural Comput 2000, 12: 141. 10.1162/089976600300015880
 14.
Gerstacker W, Obernosterer F, Meyer R, Huber J: On prefilter computation for reducedstate equalization. IEEE Trans. Wirel Commun 2002, 1(4):793800. 10.1109/TWC.2002.804159
 15.
Russell S, Norvig P: Artificial Intelligence: A Modern Approach. 2003.
 16.
Zheng Y, Xiao C: “Improved models for the generation of multiple uncorrelated Rayleigh fading waveforms”. IEEE Commun Lett 2002, 6: 256258.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Electronic supplementary material
DBNTE Pseudocode algorithm.
Additional file 1: (a) DBNTE function pseudocode. (b) FORWARD MESSAGE function pseudocode. (c) BACKWARD MESSAGE function pseudocode. (d) FORWARD BACKWARD MESSAGE function pseudocode. (e) LLR ESTIMATES function pseudocode. (PDF 91 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Myburgh, H.C., Olivier, J.C. & van Zyl, A.J. Reduced complexity turbo equalization using a dynamic Bayesian network. EURASIP J. Adv. Signal Process. 2012, 136 (2012). https://doi.org/10.1186/168761802012136
Received:
Accepted:
Published:
Keywords
 Turbo equalizer
 Dynamic Bayesian network
 Rayleigh fading
 Low complexity