Multiple Description Coding with Side Information: Practical Scheme and Iterative Decoding

Multiple description coding (MDC) with side information (SI) at the receiver is particularly relevant for robust transmission in sensor networks where correlated data is being transmitted to a common receiver, as well as for robust video compression. The rate-distortion region for this problem has been established in (Vaishampayan 1993). Here, we focus on the design of a practical MDC scheme with SI at the receiver. It builds upon both MDC principles and Slepian-Wolf (SW) coding principles. The input source is ﬁrst quantized with a multiple description scalar quantizer (MDSQ) which introduces redundancy or correlation in the transmitted streams in order to take advantage of the path diversity. The resulting sequences of indexes are SW encoded, that is, separately encoded and jointly decoded. While the ﬁrst step (MDSQ) plays the role of a channel code the second one (SW coding) plays the role of a source code, compressing the sequences of quantized indexes. In a second step, the cross-decoding of the two descriptions is proposed. This allows us to account for both the correlation with the SI as well as the correlation between the two descriptions.


Introduction
Multiple description coding (MDC) has been introduced as a generalization of source coding subject to a fidelity criterion for communication systems that use diversity to overcome channel impairments. Several correlated representations of the signal are created and transmitted on different channels. The design goals are therefore to achieve the best average rate-distortion (RD) performance when all the channels work, subject to constraints on the average distortion when only a subset of the channels is received correctly. Practical approaches to MDC include scalar quantization [1], polyphase decompositions [2][3][4][5], correlating transforms [6,7], and frame expansions [8]. In the sequel, we consider multiple description scalar quantization (MDSQ) which allows a very easy tuning of the redundancy as well as a simple coding and decoding.
MDC is an interesting tool for robust communication over lossy networks such as the Internet, peer-to-peer, diversity wireless networks, and sensor networks. MDC avoids the cliff effect of classical forward error correction techniques. A resilient peer-to-peer streaming approach is proposed in [9] based on the transmission of multiple descriptions on distribution trees which introduce diversity in network paths. Jointly optimized multipath routing and MDC is also shown in [10] to improve the end-to-end quality of service in dense mesh networks. This paper goes one step further and considers the case where correlated side information (SI) about the transmitted source is available at the receiver. Since MDC introduces redundancy in the transmitted data, the overall rate increases. We will show that the use of SI at the decoder allows decreasing the overall coding rate while preserving the robustness inherent to the MDC structure. The RD region for MDC when SI about a correlated random process is only known at the decoder has been established in [11]. Analytical expressions of the RD bounds are derived for Gaussian sources and a Gaussian correlation model, assuming the SI to be common to the two descriptions. Here, we focus on the design of a practical MDC scheme with SI at the receiver. It builds upon both MDC principles and Slepian-Wolf (SW) coding principles. The input source is first quantized with a multiple description scalar quantizer (MDSQ). After quantizing the source on a given alphabet, two indexes are assigned to the resulting discrete source symbols. This index assignment can be seen as a lossless MDC step which introduces redundancy or correlation in the transmitted streams in order to take advantage of network path diversity. The resulting sequences of indexes are SW encoded, that is, separately encoded and jointly decoded. Indeed, in the lossless case, the SW theorem [12] yields the surprising result that one can compress correlated sources in a distributed manner as efficiently as if they were jointly compressed. While the first step (MDSQ) plays the role of a channel code, the second one (SW coding) plays the role of a source code compressing the sequences of quantized indexes.
Recently, in [13], a deterministic annealing [14] approach was described for optimal design of multiple description vector quantizer with SI available at the decoder. The performance of the quantizer over channels subject to noise and packet loss was investigated and compared with the RD bound. However, it was assumed that each description is compressed and decompressed independently using an ideal SW encoder and decoder, respectively. In this paper, we present a complete MDC scheme with SI where channel codes are used as SW codes. The design of good quantizers for this problem is not considered. Instead, we study the influence of the amount of redundancy on SW decoding as well as the impact of using the SI during reconstruction and describe a way to perform a joint decoding of multiple descriptions with SI.
The first use of channel codes-based on trellis codesas SW codes was proposed in [15]. Later, the first capacity approaching channel codes to be proposed as SW codes were turbo codes in [16,17]. In [18], turbo codes were employed for asymmetric distributed source coding. In [19], it was shown that low-density parity check (LDPC) codes can also be used in a source coding with SI setup to compress close to the SW limit for memoryless correlated binary sources and in [20] for memory correlated binary sources. More recently [21], arithmetic codes were proposed as an alternative to turbo codes and LDPC codes for small and medium block lengths. A rate-compatible system was also provided in [22].  In this paper, we thus first consider common SI to be available for the decoding of the two descriptions. Focusing on the particular case of two descriptions, the approach results in a balanced two-description coding scheme with decoder-only common SI (see Figure 1). In a second step, cross-decoding of the two descriptions which allows accounting for both the correlation with the SI as well as the correlation between the two descriptions is considered. Assuming on-off channels (description received or lost), it has been observed that for a certain amount of correlation between the input source X and the SI Y , increasing the redundancy in the MDSQ does not necessarily increase as much the transmission rate. As the correlation of the two descriptions with the SI increases, the rate of the SW code decreases. In that case, the extra robustness brought by increasing the redundancy in the MDSQ comes at a moderate rate cost.
The paper is organized as follows. In Section 2, we briefly review the theoretical background of MDC with SI. We then describe our proposed practical MDC scheme with SI in Section 3. The latter is further improved in Section 4 with the introduction of iterative cross-decoding of multiple descriptions with SI. Simulation results are presented in Section 6. Finally, conclusions and future work in video coding are provided in Section 7.

Lossless
Coding. The duality between lossless MDC and SW coding has been discussed in [23], in the particular case where one description D 1 (resp., D 2 ) is transmitted at full rate and used as SI to decode the second description D 2 (resp., D 1 ). The corner points of the SW and the MDC rate regions are shown to overlap. In the balanced setup considered here where both descriptions are SW encoded and decoded with the help of extra SI Y correlated with the input source, the two regions overlap. For the central decoder, in which both descriptions are jointly decoded, all rate points of the SW region can be reached (see Figure 2). In the lossless case, the SW theorem [12] shows that the minimum rate (R = R 1 + R 2 ) to compress the two sources is the joint entropy H(D 1 , D 2 | Y ) with (1)

Lossy
Coding. The problem of MDC with SI has already been studied in [11]. The authors have determined the RD region for the general case when the decoders have different SIs or when they have common SI, and when both the encoder and decoder have access to the SI or when it is only available at the decoder. Additionally, they have established the two-description RD region for the Gaussian case through the following theorem.
Theorem 1 (from [11]). Let (X(1), Y (1)), (X(2), Y (2)) . . . be a sequence of independent and identically-distributed (i.i.d.) jointly Gaussian random variables. Let Z(k) model the correlation via a virtual AWGN channel between the random variables Y (k) and X(k). Then, we can write that Only the decoder has access to the SI {Y (k)}. For a quadratic distortion measure, the set of all achievable tuples (R 1 , R 2 , D 1 , D 2 , D 12 ) is given by where This theorem states that, similarly to the Wyner-Ziv coding (WZC) case [24], the RD region in the two-description Gaussian case when the SI is only known at the decoder is the same as the one obtained when the SI is also known at the encoder.
This problem has also been studied in [25,26] where the authors focus on the case when the decoders use two different SIs Y 1 and Y 2 . In [25], the RD region was defined for Gaussian sources when the SIs are known at both encoder and decoder and it was compared with the region obtained in [26] when the SIs are not available at the encoder. It was shown that the latter region is included in the former and that they coincide if and only if Y 1 = Y 2 .
In this paper, we focus on the scenario when the SI is common and only known at the decoder (see Figure 1). A practical two-description scheme with decoder-only SI is described in the next section.

Multiple Description Scalar Quantization with Side Information
Multiple description coding (MDC) consists in creating a number of distinct correlated representations of a source. Those representations are called descriptions. The reception of only one description should permit the reconstruction of the source with an acceptable quality level. Every description, that is, received should increase the quality of the reconstruction. The particular case of coding with two descriptions has been studied extensively, in theory and in practice [27]. MDC is well adapted to the transmission of data on multiple independent channels or on a fading channel without memory. MDSQ consists in generating two coarse side descriptions of a scalar source using two (or more) independent scalar quantizers. The quantizers refine each other in a way that guarantees a central description of lower distortion, when both side descriptions are available at the decoder. This can be achieved by partitioning the real line and assigning ordered pairs of indexes to the partition cells. The choice of the index assignment entails the definition of the partitions of the side decoders and thus allows for a systematic tradeoff between the central distortion and the side distortions. Practical approaches to build index assignment matrices are presented in [1].
As an example, consider the matrices shown in Figure 3. The indexes q ∈ {1, 2, . . . , Q} belonging to the partition cells of the central quantizer occupy distinct positions within the matrices and are thus assigned as pair of indexes, namely, the row index i ∈ {1, 2, . . . , M}, and the column index j ∈ {1, 2, . . . , M}. Each of these indexes represents a side description, which is sent over a separate channel. If both channels are available to the receiver, decoding can be performed by simple matrix lookup. With access to only one description the decoder knows that the correct value is among the indexes in a certain row or column. The redundancy is controlled by choosing the number of diagonals covered by the index assignment. In the following, the matrices will be identified by their d value where 2d + 1 is the number of diagonals covered by the index assignment.
The proposed multiple description Wyner-Ziv coding (MD-WZC) scheme is described in Figure 4. A source sample X n , n = 1, 2, ..., N is mapped to an index q by a quantizer which is then mapped to a pair of indexes (i, j) by the index assignment. Then, the two bitstreams of indexes are separately encoded by a channel encoder. Only the parity bits are being sent in the descriptions to the decoder. The decoder begins by separately decoding the indexes using Y as SI. The channel probabilities are calculated from the parity bits sent by the encoder and the virtual channel output Y . The dependencies between Y and the indexes, P(I | Y ) and P(J | Y ), are obtained from the index assignment matrix and P(X | Y ). Then, depending on the number of descriptions received, a certain quality is achieved for the reconstructed version of X. If only one description, that is one sequence of indexes is received, then the decoder only has access to either I or J. The corresponding quantization intervals and the SI Y are used by the side decoders to compute X 1 or X 2 , the reconstructed versions of X: Their quality depends on the amount of redundancy introduced by the MDSQ and by the correlation between X and Y . In the case the two descriptions, that is, the two sequences of indexes are received, the indexes are combined to obtain the quantization intervals where X belongs. The central decoder uses these intervals and the SI Y to compute X 12 , the reconstructed version of X: Note that MD-WZC schemes could be implemented using other MDC techniques, for example, relying on signal polyphase decompositions [2][3][4][5], on pairwise correlating transforms [6,7] or on frame expansions [8]. The derivation of the conditional pdf of each description given the SI Y , from the given conditional pdf of the input signal X given Y , will need to be adapted, since it depends on the transformation or mapping of the input signal X into its multiple descriptions. A specific design will also be required to further exploit the SI in the decoding steps which follow the SW decoder.

Cross-Decoding of Multiple Descriptions with Side Information
To further improve the performance of the scheme, we can exploit the redundancy between the descriptions at the central decoder. This was first suggested for turbo codes in [28] by performing cross-decoding between the descriptions and further studied in [29,30] for wireless communications systems. We propose to generalize this approach to the case where instead of channel outputs, an extra SI is available at the decoder. Moreover, in our approach, the bitrate is controlled by the decoder, which means that if the decoding does not succeed, more parity bits may be requested to the encoder. The correlation between the descriptions is given by the index assignment matrix. For example, if we consider the matrix in Figure 3(c), we get P(i = 1 | j = 1) = 1/3, , and so forth. This correlation information can be used as an a priori knowledge about i by the channel decoder of i, the same applies for j. The overall decoder must combine the extrinsic information L out,(1) (resp., L out, (2) ) at the output of the decoder of i (resp., j) with the conditional probability distribution P( j | i) (resp., P(i | j)) and send the results as a priori information to the channel decoder of j (resp., i) (see Figure 5). The improved scheme is given in Figure 6, where the channel cross-decoder block is represented in Figure 5. Let {X n , n = 1, 2, ..., N} denote the samples of a memoryless i.i.d. source. This source is encoded at an average rate of r bits per sample (bps) per channel using a multiple description encoder (the bitrates used in the results section VI-B are 5, 4, and 3 bps), producing two correlated where k = 1, . . . , N, t = 1, . . . , r. It is calculated as the difference between the a posteriori LLR and the a priori LLR. We only describe the transfer of information from the first decoder to the second decoder. The probability distribution for the bits that constitute the second description can be calculated from the extrinsic LLR of the first description: The samples being i.i.d., the conditional probabilities do not depend on k. Therefore, we can write, ∀k ∈ {1, . . . , N},  (8) and (9), (7) can be expressed as Finally, the LLRs for the second description are obtained from (10) and (11): These LLRs are used as a priori information for the second decoder which, in turn, generates extrinsic log-likelihoods for the first decoder. The transfer of information back to the first decoder is carried out in a similar fashion. For a given bitrate for the parity bits, this cross-decoding, where an MAP decoding is performed at each step for each decoder is carried out until the probability of having a bit error does not change anymore or the number of iterations reaches a certain threshold (the results shown in section VI-B were obtained for a threshold set to 18), in which case more parity bits are requested by the decoder. An interleaver before the encoding of one of the descriptions is necessary to make sure that the information contained in one description is not correlated with the information contained in the other description for a given bitrate. Similarly, the same procedure can be applied to other near-capacity channel codes like LDPC accumulate codes [31] (see [32] for more details).  Figure 6: Two-description coding scheme with SI and channel cross-decoding at the central decoder.

Optimal Inverse Quantization
After the indexes are perfectly decoded, they have to be combined to recover the coefficients. We now derive the equations to perform an optimal inverse quantization in the presence of an SI. We consider the case of two correlated memoryless Gaussian sources X and Y . The correlation model is defined as X = Y + Z where Z is a Gaussian noise with zero mean and variance σ 2 Z . Let Q be the number of quantization intervals and z 0 < z 1 < . . . < z Q the quantization intervals of the source x. Since we are minimizing the mean-square error, the optimal estimate x opt of the source x (both at the central and side receivers) is given by where p Z (·) is the probability density function (pdf ) of Z. The number K of quantization intervals for a given x depends on the number of descriptions received and the number of diagonals in the index assignment matrix. At the central decoder, K = 1. At the side decoders, K is the number of nonempty cells in the line or column pointed out by the received indexes in the index assignment matrix. Given the expression of the correlation noise pdf between X and Y , we finally get where a = z k i+1 − y/σ Z √ 2 and b = z k i − y/σ Z √ 2.

Experimental Results
The results were obtained for 100 sequences of 1584 input samples of a zero-mean Gaussian source of unit variance for Y . X is defined as X = Y + Z, where Z has a Gaussian distribution with pdf p Z (n) ∼ N (0, σ 2 Z ). The samples of X are first processed by an MDSQ encoder, which consists of a Lloyd-Max quantizer that generates 32, quantization intervals, followed by an index assignment performed with the matrices shown in Figure 3, with 1, 3, and 5 diagonals, corresponding , respectively, to 5, 4 and, 3 bits per output symbol i and j. The index assignment matrices were built using an embedded index assignment strategy [33] that provides improved RD performances when not all the bitplanes are received. Some symbols were removed by hand to keep a fixed number of quantization levels, which means that the matrices are slightly suboptimal. However, EURASIP Journal on Advances in Signal Processing the nonoptimality of the MDSQ does not deflect from the central focus of this paper.
Each description was coded using a turbo encoder that consists of two 1/2 convolutional codes, implemented in a recursive systematic form. The code is the same as the one used in [34]. 18 iterations of the MAP algorithm are performed by each decoder. The parity bits stored in two buffers are transmitted in small amounts upon the decoders request via the feedback channel. When the estimated bit error rate (BER) at the output of the decoders exceeds a given threshold, extra parity bits are requested. This amounts to controlling the rate of the codes by selecting different puncturing patterns at the output of the turbo codes. The BER is estimated from the LLR on the output bits of the turbo decoders [35]. This a posteriori LLR is defined as where u (k−1)r+t is the tth bitplane of the kth index in the description s currently being decoded and y k is the SI. For each k, if the absolute value of this a posteriori information is lower than a certain threshold (fixed at 4.6), then the bit u (k−1)r+t is considered erroneous. When all the bits in a bitplane have been decoded, the BER is estimated by the number of bits incorrectly decoded divided by the total number of bits. If the BER is greater than a threshold (fixed at 10 −3 ), the decoding is considered to be a failure and more parity bits are requested from the encoder. The performance can be considered to be the same at both side decoders (balanced MDC scheme). In the following, the side performance will be represented by the average performances  obtained for both side decoders. The WZC scheme is a single description coding scheme where the sequence of quantized values of X is directly encoded by a turbo code. . Figures 7 and 8 show the performance obtained by the WZC and the MD-WZC schemes for 10 Correlation Signal-to-Noise Ratio (CSNR = 10 log 10 (σ 2 Y /σ 2 Z )) (CSNR) values. An SNR value identified by a point on a curve in Figure 8 is achieved by sending parity bits at a rate provided by the same point on the corresponding curve in Figure 7. Solid and dotted curves correspond to schemes that use the SI during the reconstruction step, whereas dashed curves were obtained with schemes that do not use the SI at this step. As one can see in Figure 8, when the SI is taken into account during the reconstruction, the SNR values remain the same for WZC, all MDC-WZC techniques at the central decoder, and for MD-WZC with d = 0 at the side decoders. Note that here the quantizer is a Lloyd-Max quantizer adapted to the pdf of the distribution of X and not optimized for p Z . The SI is only taken into account in the inverse quantization step (see (13)). This explains the fact that when the CSNR is low, the SNR performance of the side decoder without the SI for d = 0 is slightly better than the SNR with SI, but gets worse when the CSNR increases. The CSNR has a much greater impact on the performance at the side decoders for d = {1, 2}, especially for From [12], we know that the minimum number of bits per symbol one can achieve when compressing a source X when only the decoder has access to a correlated source Y is R X ≥ H(X|Y ). For the WZC scheme, this limit is given by R X ≥ H(X Q | Y ) where X Q is the quantized version of X; for the MD-WZC schemes, it corresponds to R X ≥ H(I | Y ) + H(J | Y ) when the descriptions are decoded separately. Figure 7 shows the rates obtained by the various schemes. For all the three index assignments considered, we plotted the corresponding minimum number of bits per symbol for the case when the decoding of the descriptions is done separately. As expected, when we increase the number of diagonals, the redundancy introduced by the MDSQ becomes smaller and the bitrate becomes closer to the one we get with the WZC scheme. Note that the impact of the CSNR values on the bitrate diminishes when the number of diagonals becomes larger. This is due to the fact that the correlation between Y and the descriptions I, J not only depends on the CSNR but also on the number of diagonals. This effect is clearly visible in Figure 7 when the two curves that correspond to the MD-WZC schemes for d = 1 and d = 2 cross each other at the highest CSNR values. The same effect is observed with the proposed scheme: when d becomes larger, the rate becomes smaller, except for d = 2 and CSNR values greater than 15 dB, where the MD-WZC scheme with d = 1 performs better. Figure 9 displays the theoretically achievable SNR given by the Theorem 1 for the MD-WZC and WZC cases using the rates in Figure 7. The theoretical limit is the same for the WZC scheme and the side decoder of the MD-WZC scheme with d = 0. One can see that for the WZC scheme  and the MD-WZC scheme with d = 0, the achievable SNR decreases when the CSNR increases, whereas the achievable SNR remains almost stable for d = 1 and increases for d = 2. Knowing from Figure 8 that the SNR at the central decoders of all schemes is almost stable with the increase of the CSNR, this shows that the SI is more useful with lower values of d. Observe as well that for the central decoder of the MD-WZC scheme with d = 2, the SNR reaches its theoretical bound but only for the lowest CSNR values.

Cross-Decoding of Multiple Descriptions with SI.
We now study the influence of using turbo cross-decoding at the central decoder. Figure 10 compares the WZC and MD-WZC with turbo cross-decoding schemes for different values of d. These results show that the benefit of using crossdecoding improves as d decreases. For d = 0, the crossdecoding can offer a bitrate saving up to 2 bps at the lowest CSNR values, whereas for d = 1 and d = 2, the saving is at most 0.65 and 0.13 bps, respectively. This is consistent with the fact that the more correlated the descriptions are, the more important will be the impact of circulating the information across the decoders. Note that for d = 0, the bitrate becomes lower than the theoretical bitrate for the case without crossdecoding given in Figure 7. This shows that by exploiting the correlation between I and J at the decoder, the central bitrate can get lower than H(I | Y ) + H(J | Y ). Figures 11 and 12 show the RD curves at the central and side decoders for a CSNR value of 10 dB. Each point on the curves was obtained for a different number of bitplanes EURASIP Journal on Advances in Signal Processing  perfectly decoded, that is, the first point corresponds to the most significant bit (MSB) perfectly decoded, the second to the MSB and the second bitplane, and so forth. The bitrates were calculated from the number of parity bits that were received by the decoder to decode the bitplanes. The bitplanes that were not decoded were replaced with the corresponding bitplanes of the SI on which we applied the same MDSQ. Since the transmitted descriptions are decoded bit-by-bit, the central decoder may generate invalid indexes corresponding to the empty cells of index assignment. When that happens, all the quantization intervals in the row and column indicated by the two indexes are used in (13). The number of points on each curve corresponds to the number of bits needed to represent the indexes (5 for WZC and d = 0, 4 for d = 1, 3 for d = 2). The central and side curves for the MD-WZC scheme with d = 0 are exactly the same. For low bitrates, when not all the bitplanes are perfectly decoded, the central decoders can become inferior in RD performance to the side decoders. Due to the cross-decoding, the central RD performance increases and the amount of redundancy has less influence on the RD performance, especially at very low bitrates. We made the decision to use the same number of quantization intervals for the quantization of X such that the correlation between X and Y remains the same for all schemes. This explains why, in the results, the scheme that introduces the least redundancy usually performs better at all decoders whereas, in a real case scenario, this scheme would be less efficient at the side decoders.

Discussion and Future Work
In this paper, we presented a balanced two-description coding scheme with decoder-only SI where the SI is the same for all decoders. Simulation results show that the proposed approach can be used to improve the RD performance of MDC schemes, without sacrifying their robustness. Indeed, it has been shown that when the correlation with the SI is high, the quality of the signal reconstructed by the side decoders can be improved while not proportionally increasing the overall rate. Furthermore, by using channel cross-decoding, one can exploit the correlation between the descriptions and reduce the bitrate at the central decoder. The approach is currently being applied to robust video coding. The side information is in this case extracted by interpolation or extrapolation of previously decoded frames. Contrary to predictive video coding, where the application of MDC can result in prediction mismatch between encoder and decoder or the so called drift effect when there are packet losses, the proposed MDC technique with side information offers an inbuilt robustness to drift.