EURASIP Journal on Applied Signal Processing 2003:13, 1328–1334 c ○ 2003 Hindawi Publishing Corporation Interleaved Convolutional Code and Its Viterbi Decoder Architecture

We propose an area-efficient high-speed interleaved Viterbi decoder architecture, which is based on the state-parallel architecture with register exchange path memory structure, for interleaved convolutional code. The state-parallel architecture uses as many add-compare-select (ACS) units as the number of trellis states. By replacing each delay (or storage) element in state metrics memory (or path metrics memory) and path memory (or survival memory) with delays, interleaved Viterbi decoder is obtained where is the interleaving degree. The decoding speed of this decoder architecture is as fast as the operating clock speed. The latency of proposed interleaved Viterbi decoder is "decoding depth (DD) interleaving degree," which increases linearly with the interleaving degree.


INTRODUCTION
It is well known that burst-error is a serious problem especially in storage and wireless mobile communication systems. In order to cope with burst-error, interleaving, denoted here as channel interleaving, with random-error correcting code, is generally used. Interleaving randomizes error bursts by spreading the erroneous bits with introducing a very long delay time, which is intolerable in some applications.
A burst-error correcting Viterbi algorithm, which combines maximum likelihood decoding algorithm with a burst detection scheme, instead of using channel interleaving, was proposed in [1] and extended to the Q 2 PSK in [2]. This adaptive Viterbi algorithm (AVA) outperforms interleaving strategies in the presence of very long bursts. However, when many short error bursts are present, AVA is inferior to interleaving scheme. An interleaved convolutional code also can be used for burst-error correction [3]. A modified Viterbi algorithm (MVA) [4], which is based on the multitrellis decomposition [5], was presented for interleaved convolutional code. The MVA introduces a much smaller delay time and much lower memory requirements than channel interleaving techniques with convolutional code. However, the implementation of MVA in [4], which uses as many delay elements as decoding depth (DD) × interleaving degree (I) for each code word component, is not area efficient. Some applications of interleaved convolutional code for asynchronous transfer mode (ATM) networks [6] and image communication systems [7,8,9] have been proposed.
In this paper, an area-efficient high-speed interleaved Viterbi decoder architecture, which has state-parallel architecture with register exchange path memory structure, is proposed. This paper is an expanded version of [10]. A brief introduction of the interleaved convolutional code is given in Section 2. A proposed interleaved Viterbi decoding algorithm and its architecture for interleaved convolutional code are shown in Section 3.

INTERLEAVED CONVOLUTIONAL CODE (ICC)
Interleaved convolutional code with extra delay (A), which further randomizes the error bursts, can be used for bursterror correction as shown in Figure 1. In this coding scheme, the channel interleaving is not used. The performance of this interleaved convolutional coding scheme depends on the interleaving degree and the extra delay. Interleaved convolutional code with interleaving degree I is obtained by replacing each delay (or storage) element in generator polynomials with I delays. In Figure 1, MUX and DE-MUX represent multiplexer and demultiplexer, respectively.  An interleaved convolutional code can be obtained from a nonrecursive nonsystematic convolutional (NRNSC) code or a recursive systematic convolutional (RSC) code as shown in Figure 2. In order to illustrate the algorithm, we will consider an (n, k, m) = (2, 1, 2) binary convolutional code with the following generator polynomials (G): (1) NRNSC: (1a) (2) RSC: For these codes, the generator polynomials of interleaved convolutional code with interleaving degree I become (1) interleaved NRNSC: (2) interleaved RSC: which yield (2, 1, 2I) interleaved convolutional code. From equations (1) and (2), we can see that each delay element (D) in generator polynomials is replaced by I delays as shown in Figure 2. The encoding and decoding processes will be explained in z-transform domain. In this representation, each delay element D of generator polynomials is replaced by z −1 .
A binary information sequence to be encoded is represented as where a k is a coefficient of information sequence and it has the values 0 or 1 since the binary system is considered. For an (n, 1, mI) interleaved convolutional code, the generator polynomials with interleaving degree I are where g i j is a coefficient of the generator polynomials and g i j ∈ {0, 1}, g i 0 = g i m = 1, and i = 1, 2, . . . , n. For this interleaved convolutional code encoder, codeword (encoder output) sequences are generated as follows: Generator polynomials, for the case of n = 2, m = 2, and I = 2, and with g(D) = (7, 5) 8 for original convolutional code, are Codeword (encoder output) sequences of this encoder are where Two independent codeword sequences are obtained by interleaving with degree 2: (C 1 0 (z 2 ), C 2 0 (z 2 )) and (C 1 1 (z 2 ), C 2 1 (z 2 )). They are transmitted alternatively. Extra delays are used for one codeword sequence to add more interleaving effect. In this case, the decoder also requires extra delays to adjust timing of received sequences as shown in Figure 1.

INTERLEAVED VITERBI DECODING
Viterbi decoding algorithm consists of branch metrics calculation, add-compare-select (ACS) operation, and estimation of the information sequence from the survival path information. Hamming distance (hard decision) or Euclidean distance (soft decision) between the received data and the possible codeword sequences are computed in the branch metrics calculation unit. Those branch metrics are accumulated and the most likely path (survival path) is selected by the ACS unit. For a binary convolutional code with the code rate (R) is k/n, the number of possible codeword is 2 n . From the survival path information, the decoded data sequence is obtained.
Interleaved Viterbi decoding algorithm is based on the decomposed trellis diagram. The trellis diagram of an (n, k, mI) interleaved convolutional code can be decomposed to I × (n, k, m) trellis diagrams. Figure 3 shows the decomposed trellis diagram of (2, 1, 2 × 2) NRNSC. As we can see in Figure 3, each decomposed (n, k, m) trellis diagram is identical.
A received sequence, which may be corrupted by errors, can be represented as From these sequences, branch metrics can be calculated as represent the branch metrics and the possible codeword, respectively. Using this branch metrics, ACS operations can be executed as where λ s k and γ s represent branch metrics and state metrics (or path metrics or accumulated state metrics), respectively; s stands for trellis state, which varies from 0 to 2 m − 1; k = 0, 1, 2, . . . , ∞; and j = 0, 1, 2, . . . , I − 1. The superscripts u and l in (11) mean, respectively, upper and lower branches that merged into a trellis state (see Figure 3). The survivor path information (referred to as path select signal, PS) of this ACS operation is as follows: where p s k is 0 when the upper branch is selected and p s k is 1 when the lower branch is selected for a trellis state s. From (11), (12), and Figure 3, we know that I delays (or storage elements) are needed to guarantee proper ACS operations.
Viterbi decoder consists of branch metrics calculator (BMC), ACS units, state metrics memory (referred to as SMM or path metrics memory), and path memory (PM or survival path memory). For fast decoding applications, state-parallel architecture, which uses as many ACS units as the number of trellis states, is generally used with register exchange path memory structure. The BMC computes Euclidean or Hamming distance between the received data and codeword sequences. Generally, Euclidean distance is used to get better coding gain. The ACS unit selects most probable path by comparing the accumulated branch metrics. The accumulated branch metrics, which are resulted from ACS operation, are stored in SMM and the selected path information (PS) is stored in PM. In case of n = 2, m = 2, and I = 2, received information sequences are represented as The BMC computes branch metrics as follows: where Λ 0 (z), Λ 1 (z), Λ 2 (z), and Λ 3 (z) represent branch metrics between the received symbols and the possible codewords (0, 0), (0, 1), (1, 0), and (1, 1), respectively. These branch metrics are used in ACS computation. The ACS unit adds branch metrics (λ) and previous state metrics (γ), and then selects minimum state metrics from two incoming branches (see Figure 3) as follows: where k = 0, 1, 2, . . . , ∞ and j = 0, 1. The survivor path information is The selected state metrics are stored in the SMM as a new state metrics. For m = 2, which means that the number of trellis states are 2 m = 2 2 = 4, we need four PS signals: PS 0 (z), PS 1 (z), PS 2 (z), and PS 3 (z). These PS signals go into the PM. Since the number of trellis states of interleaved convolutional code, which is 2 mI for the (n, k, mI) interleaved convolutional code with interleaving degree is I, for large interleaving degree or encoder constraint length (K = m + 1) is very large, a straightforward state-parallel implementation of the Viterbi algorithm for this code requires huge hardware resources. For (2, 1, 2 × 4) interleaved convolutional code, the number of trellis states is 256, which is the same as for K = 9. Therefore, area-efficient high-speed Viterbi decoder architecture for interleaved convolutional code is needed.
By substituting I delays for each delay (or storage) element in SMM and path memory cell (PMC) of PM, an areaefficient high-speed interleaved Viterbi decoder architecture for interleaved convolutional code is obtained. In this architecture, we can get the throughput rate of the Viterbi decoder as high as the operating clock speed. Since the decoding latency of the state-parallel Viterbi decoder with register exchange path memory structure is the same as the decoding depth, the decoding latency of the interleaved Viterbi decoder is increased by I × DD. Therefore, the decoding latency of proposed architecture is the decoding depth multiplied by the interleaving degree, that is, decoding latency = DD ×I. Since interleaved convolutional coding scheme uses extra delay (A), its overall decoding latency becomes DD × I + A.
A proposed state-parallel Viterbi decoder architecture for interleaved (n, 1, mI) convolutional code is shown in Figure 4. Figure 5: Interleaved SMM architecture using FIFO.
If the decoding speed is not critical, state-serial architecture, which uses less ACS units than the number of trellis states without changing SMM and PM structures, can be used. But it needs a control unit for proper connection between ACS units and SMM and PM. The BMC and the ACS unit architectures of the proposed Viterbi decoder are identical with that of the original Viterbi decoder architecture.
In general, random access memory (RAM) and D-type flip-flop are used as an SMM for the state-serial and stateparallel noninterleaved Viterbi decoders, respectively. For both cases, its size becomes interleaving degree (I) × number of trellis states (2 m )×state metrics width (w). Figure 5 shows an alternative SMM architecture, which uses first-in first-out (FIFO) memory.
Interleaved PM and interleaved PMC (IPMC) architectures for proposed interleaved Viterbi decoder are shown in Figure 6.
The basic architecture of this interleaved PM is exactly the same as the architecture of the original register exchange PM architecture. However, it uses modified PM cell architecture that consists of one multiplexer and I storage elements as shown in Figure 6b. D-type flip-flop is generally used for storage element in register exchange PM structure. Due to the extra delay elements in IPMC, the estimated information sequence can be properly recovered from the PM. Also by virtue of its simple structure, placement and routing of path memory cells are easier than that of a straightforward implementation. Reduction of power consumption is also expected in this proposed Viterbi decoder architecture. The PS 0 , PS 1 , PS 2 , and PS 3 are used as select signals for the first, second, third, and fourth row of IPMC in PM, respectively. The connection of IPMC in PM is exactly the same as the trellis diagram. The path select signals can be used as inputs of the IPMC for the first column in PM. When the DD is large enough, that is, DD ≥ 4K, the outputs of the IPMC at the last column in PM have the same values with very high probability. Therefore, some IPMC in PM can be removed with ignorable performance degradation as shown in Figure 6a.
The Viterbi decoder for interleaved convolutional code also can be implemented in I-parallel manner. It consists of I-parallel Viterbi decoder components. Each Viterbi decoder component is used for decoding each interleaved data sequence.
In Table 1, the complexity, latency, and throughput rate of this proposed Viterbi decoder architecture are compared with a straightforward implementation.
From Table 1, we can see that the hardware complexity of the proposed Viterbi decoder architecture is much smaller than that of the straightforward implementation for the same throughput rate. For I = 2 and m = 2, we can achieve hardware reduction of 75% for ACS, 50% for SMM, and 50% for PM, approximately. Furthermore, the connections of proposed architecture are reduced. The proposed interleaved Viterbi decoder architecture saves areas for the ACS units and PM. Since the IPMC uses less number of multiplexers, the size of IPMC is smaller than that of (I × PMC) as shown in Figure 6b.
However, the latency of this proposed architecture, which is linearly increased with the interleaving degree, is the largest among three different implementations.

CONCLUSION
An area-efficient high-speed Viterbi decoder architecture is proposed to decode (n, 1, mI) interleaved convolutional code. By replacing each delay (or storage) element in state metrics memory and path memory with I delays, interleaved Viterbi decoder is obtained. More hardware complexity reduction can be achieved with higher interleaving degree. It means that this proposed architecture is more area efficient for interleaved Viterbi decoder with higher interleaving degree.
However, it is inevitable that the latency of this proposed architecture is increased as the interleaving degree is increased. The latency of proposed interleaved Viterbi decoder itself is "decoding depth (DD) × interleaving degree (I)," which is linearly increased with the interleaving degree. Since interleaved convolutional coding scheme uses extra delay (A), its actual decoding latency becomes DD × I + A.
The performance of this interleaved convolutional coding scheme depends on the interleaving degree and the size of extra delay.  Figure 6: (a) Interleaved PM for DD is 12 (b) IPMC architectures for (n, k, 2I).