- Research
- Open access
- Published:
Two-description distributed video coding for robust transmission
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 76 (2011)
Abstract
In this article, a two-description distributed video coding (2D-DVC) is proposed to address the robust video transmission of low-power capturers. The odd/even frame-splitting partitions a video into two sub-sequences to produce two descriptions. Each description consists of two parts, where part 1 is a zero-motion based H.264-coded bitstream of a sub-sequence and part 2 is a Wyner-Ziv (WZ)-coded bitstream of the other sub-sequence. As the redundant part, the WZ-coded bitstream guarantees that the lost sub-sequence is recovered when one description is lost. On the other hand, the redundancy degrades the rate-distortion performance as no loss occurs. A residual 2D-DVC is employed to further improve the rate-distortion performance, where the difference of two sub-sequences is WZ encoded to generate part 2 in each description. Furthermore, an optimization method is applied to control an appropriate amount of redundancy and therefore facilitate the tuning of central/side distortion tradeoff. The experimental results show that the proposed schemes achieve better performance than the referenced one especially for low-motion videos. Moreover, our schemes still maintain low-complexity encoding property.
1. Introduction
The increasing demand for friendly up-link communication of low-power video captures has generated a lot of research interests in developing video codec of low-complexity encoding. As a new video coding framework, distributed video coding (DVC) [1–3] also called Wyner-Ziv (WZ) video coding makes the low-complexity video encoding a reality, in that DVC shifts the most time-consuming motion estimation to the decoder from the encoder side.
On the other hand, robust DVC methods are desired especially when the video is transmitted over wireless networks. DVC itself takes on inherent robustness because of the error-correcting channel decoding algorithm adopted. However, this robustness is achieved at the cost of compression efficiency. Typically, DVC assumes a correlation existing between the source to be encoded and its side information (SI) available at the decoder. The compression comes from the correlation and the stronger correlation the higher compression efficiency, or vice versa. However, in case of high packet loss rate, such correlation may be destroyed due to poor reconstruction of SI at the decoder, which in turn degrades the coding performance. In some other related studies, WZ coding is used as forward error correction to protect the video transmission. For example, Girod et al. [1, 4] provided a systematic lossy error protection (SLEP) method based on WZ coding, which is two-layer scalable in the sense of having one base layer with MPEG encoder and the corresponding WZ bits as the enhancement layer. MPEG stream is firstly decoded and the corrupt data are reconstructed using error concealment, and then the reconstructed signal is used to generate the SI to decode the WZ-encoded data. WZ bits refine the reconstruction thus protecting MPEG stream against the channel packet loss to some degree. However, SLEP scheme still applies motion estimation in its MPEG encoder which sacrifices the desired property of low-complexity. Also, error propagation in the MPEG-encoded stream may negatively impact the quality of SI in WZ coding, which degrades the robustness of the system especially when the packet loss rate is high [5]. To improve the robustness of SLEP, Crave et al. [5] proposed a distributed multiple-description coding (DMDC), which can be seen as a two-description adaption of SLEP. Nevertheless, the encoding is still of high complexity due to the motion compensation temporal filtering involved in the encoder. In addition, Rane et al. [6] proposed multiple embedded WZ descriptions coding as an extension of SLEP.
MDC has emerged as an attractive framework for robust transmission over unreliable channels. MDC encodes the source message into several bit streams (called descriptions) carrying different but correlated information which then can be transmitted over different channels. When description loss occurs, the decoder can get acceptable reconstruction from the received descriptions. It is the path diversity that makes MDC successful in robust transmission. In view of their desired and complementary features, the robustness problem of DVC is addressed here by combining DVC with MDC. In this article, we attempt to design a robust two description DVC (2D-DVC) under the constraint of low-complexity encoding. It is just the emphasizing on both low-complexity encoding and better robustness that makes our scheme different from those in [5, 6]. In our scheme, the input video is first split into two sub-sequences to create two descriptions with each consisting of two parts, i.e., part 1 is a low-complexity encoding bitstream of the corresponding sub-sequence, and part 2 is a WZ bitstream of the other sub-sequence. This is just the so-called 2D-DVC where the WZ bitstream controls the amount of redundancy. However, in [7], it is shown that the residual WZ coding can achieve better rate-distortion performance than the non-residual schemes. This is because that residual WZ technique exploits a second SI accessible at both encoder and decoder. Then, we extend this idea to our 2D-DVC and propose a residual 2D-DVC, where the difference of two sub-sequences is WZ encoded and replaces part 2 of each description. Furthermore, an optimization scheme is employed to introduce the appropriate amount of redundancy. The experimental results show that the proposed schemes achieve better or comparatively rate-distortion performance compared with the referenced scheme; while, they maintain low-complexity encoding property.
This article is organized as follows. Section 2 introduces the basic idea and the related techniques, and Section 3 presents the proposed residual 2D-DVC and the optimization scheme in detail. Section 4 provides some experimental results, and Section 5 concludes the article.
2. Basic idea and related techniques
2D-DVC is designed to generate and encode two descriptions with correlation exploited only at the decoder, which can support low-complexity encoding and robustness against packet loss.
2.1. 2D-DVC scheme
Figure 1 shows the encoding structure of 2D-DVC. Considering a video sequence, its odd and even frames are first split into two sub-sequences. In a conventional MDC scheme of Figure 1a, each sub-sequence produces a description sent to separate channels. When one description is lost, the lost sub-sequence will be estimated by the received one. However, the estimation is normally coarse due to the lack of the other part of information, especially for video sequence with large motion. Figure 1b shows the encoding structure of 2D-DVC, where a WZ stream from the other sub-sequence generates part 2 of each description. The coarse estimation results can act as the SI for WZ decoding. In case of description loss, the SI is refined by WZ stream to recover a better version of the lost sub-sequence. This encoding framework is a two-description adaption of the SLEP like that in [5] but with low-complexity encoding property. We consider a simple encoding scheme for part 1. WZ encoding supports low-complexity encoding because it exploits the correlation of the two sub-sequences only at the decoder. In case of no loss, only part 1 is used to recover the original video, where WZ bitstream is redundant. To further improve the rate-distortion performance, the residual coding and an iterative optimization method are adopted in the scheme.
2.2. SW-SPIHT coding
WZ coding [8] refers to lossy compression with SI at the decoder. WZ coding aims to achieve almost the same coding performance by exploiting the correlation only at the decoder as at both the encoder and the decoder. As shown in Figure 2, WZ coding generally consists of the following steps, namely, transform coding, quantization, Slepian-Wolf (SW) coding, as well as the generation and reconstruction of SI at the decoder side. Transform coding and quantization first compress the source to generate the binary sequence, which is then compressed by SW coding. SW coding [9] is generally realized using channel coding, where the binary sequence of the source is first encoded with channel coding and only the syndrome bits are sent to the decoder. In general, the sent syndrome bits are less than that of the source so compression is achieved. At the decoder, SI is also transformed and quantized to generate the SI. The received syndrome bits and the SI are to recover the original source bits by error-correcting channel decoding.
Among some WZ coding schemes, SW set partitioning in hierarchical tree (SW-SPIHT) coding approach [10–12] performs very well in term of its scalability and rate-distortion performance. The process of SW-SPIHT is shown in Figure 3, assuming the source X and the correlated SI Y. For X, after discrete wavelet transformation (DWT), the traditional SPIHT coding is implemented and we get its binary tree distribution information SD, significant information SP, sign information SS, and the refinement information SR. The tree information SD is sent to the decoder after arithmetic coding. SP, SS, and SR are encoded by channel coding where the syndrome bits are sent to the decoder. At the decoder, the received SD is first decompressed. Then according to SD, the side binary sequences SP y , SS y , and SR y of Y are obtained by SPIHT encoding. Next, SP y , SS y , SR y , and the received syndrome bits are used to recover the main binary sequence SP, SS, and SR by channel decoding. Finally, the wavelet coefficients of X will be reconstructed according to
where W'' is the final wavelet coefficient, Vmax and Vmin are the possible maximal and minimal value of W'' if SPIHT decoding is implemented to all bit-planes.
3. Proposed residual 2D-DVC and optimization
The proposed residual 2D-DVC scheme is sketched in Figure 4, including encoding and decoding processes. Part 1 of each description is generated by zero-motion based H.264 encoding. Part 2, for example, part 2 of description 1, is from the SW-SPIHT stream of description 2. The details are explained as follows.
3.1. Zero-motion-based H.264 coding
Zero-motion-based H.264 denoted as H.264 0-mv is employed in our scheme to meet the demands of low-complexity encoding. Zero-motion-based H.264 means that only the previous frame is used as the referenced frame for the inter-coding with motion searching region set to zero, which is therefore similar to the differential pulse coding modulation (DPCM). Because DPCM exploits the temporal correlation between adjacent frames, zero-motion-based H.264 normally outperforms the intra-frame coding in term of rate-distortion performance. With no motion estimation at the encoder, its encoding process is greatly simplified. Typically, in our experiments, the encoding time of zero-motion-based H.264 inter-coding is always shorter than that of the intra-frame in H.264 JM 9.0 program.
3.2. Residual-based encoding
In the single-description-based DVC, it has been shown in [7] that the pixel-domain residual WZ coding achieves better rate-distortion performance than non-residual scheme. Here, we extend this idea to our two-description DVC to further improve the rate-distortion performance efficiently. In residual 2D-DVC encoding, SW-SPIHT encodes the difference D = X - Xre to generate part 2, where Xre is a simple estimation to X. In non-residual 2D-DVC encoding, X is directly input to SW-SPIHT to produce part 2. Besides, it is D y = Y - Xre that acts as SI in the residual 2D-DVC, while Y is SI in non-residual 2D-DVC. Residual scheme achieves better performance than the non-residual one mainly due to the use of Xre. In the residual case, Xre can be seen as a second SI accessible to both encoder and decoder [7].
Since Xre and X are correlated given Y, using Xre at the encoder and the decoder amounts to adding an excess condition in encoding X. The rate of X will ideally approach the conditional entropy H(X|(Y, Xre)) which is not greater than the condition entropy H(X|Y) like that in non-residual 2D-DVC, H(X|(Y, Xre)) ≤ H(X|Y).
In the scheme shown in Figure 4, for description 1, D1 = X2 - Xre2, Dy 1= Y2 - Xre2; for description 2,D2 = X1 - Xre1, Dy 2= Y1 - Xre1. Xre1, Xre2, as well as Y1 and Y2 are generated in the interpolating process. There are two interpolating methods used, one is the simple average interpolation used both at the encoder and the decoder to generate Xre1 and Xre2, while the other is the complex motion-compensated interpolation used only at the decoder to generate Y1 and Y2 when only one description is received. For example, Xre2,iand Y2,ifor i th frame in description 1 are generated according to the following formulae respectively,
where and are the adjacently decoded frames in X1; (x, y) is the coordinates of the interpolated frame; [dx b , dy b ] and [dx f , dy f ] are the backward and forward motion vectors between and , respectively, which may be obtained by the half-pixel motion estimation similar to literature [3].
Due to the use of Xre, some correlation between two descriptions is exploited at the encoder in residual 2D-DVC. However, for the excess encoding complexity over non-residual 2D-DVC, the computation of subtracting and average interpolation operation is very low, which still preserves the low-complexity encoding in the residual 2D-DVC.
In this study, a channel code of low-density parity check with accumulation (LDPCA) [13] is used in SW-SPIHT coding with a feedback. At each bit-plane of SPIHT, the encoder sends a certain amount of syndrome information stage-by-stage on the demand of feedback. If the receiver cannot decode correctly, the encoder will send additional syndrome information.
3.3. Decoding
If only one description, for example, description 1 is received, its part 1 is first reconstructed by zero-motion-based H.264 decoding and the interpolation will generate Xre2 and Y2 based on Equations 2 and 3. Then, SW-SPIHT decoding reconstructs the difference using the received syndrome and SI Dy 1, and we can obtain as . Finally, and are merged to obtain the video in side decoding 1.
When both descriptions are received, the central decoding works without motion estimation. Part 1 of each description is first decoded by the zeromotion-based H.264 decoding, and then the resulting and are refined by WZ bits received. Concretely, Xre1 and Xre2 are interpolated according to (2) by and . The difference Dy 1and Dy 2are generated as follows, , . Then, SW-SPIHT decoding recovers using Dy 1(or Dy 2) and the received syndrome bits. The refined version of are obtained, i.e., , . Finally, X1 and X2 are merged to recover the video V'.
3.4. Redundancy optimization
We know that the redundancy will affect the correlation between two descriptions as well as the consequent rate-distortion of central and side coding. In general, when redundancy is more, correlation between two descriptions will be higher thus producing better quality from side decoder, while the central quality drops with the increasing of redundancy. Moreover, too much redundancy may even degrade the side quality. Therefore, optimization is desired to maximize the rate-distortion performance of non-residual and residual 2D-DVC proposed.
Let d0(v, N) and d1(v, N)(or d2(v, N)) denote the mean squared errors (MSE) from the central and side decoder for the input video v, respectively, given the amount of WZ bits is N. Let R(v, N) be the rate for two descriptions, while R1(v, N) and R2(v, N) be the rates for the two balanced descriptions 1 and 2, respectively. Our goal is to find the optimal parameter N in solving the following optimization problem:
subject to
condition 1:
condition 2:
where Rbudget is the available total bit rate to encode two descriptions and dbudget is the maximum distortion acceptable for central decoder reconstruction. The encoding optimization module is based on the above function. With the constrain on the total bit rate and the central distortion, N is adjusted accordingly to minimize the side distortion.
The optimization for the problem is carried out in an iterative way. The basic algorithm shown in Figure 5 is to make use of the monotonicity of R and d as the function of N. After initialization, a smallest N is searched to minimize d1 subject to conditions 1 and 2. Specially, in this study, SW-SPIHT coding generates the redundancy and its rate will affect the performance of 2D-DVC and residual 2D-DVC. For easy realization, the optimization to the bits of SW-SPIHT is just implemented iteratively based on the number of BP (bit plane), nBP, i.e., we adjust nBP based on the above function given the QP (quantization parameter) of H.264 0-mv for part 1. Finally, an optimized combining of nBP and QP is chosen, as shown in the following section.
4. Experimental results
For fair comparison, we use four standard video sequences to test the performance. They are Foreman, Hall, Carphone, and Mother-daughter QCIF@15 Hz. There are totally four MDC methods included, the DMDC in [5], zero-motion-based H.264 MDC without any extra WZ bits as shown in Figure 1a, the proposed non-residual 2D-DVC and residual 2D-DVC. The bit rate denotes the total of two descriptions. In the proposed schemes, we obtain four points, Q1, Q2, Q3, and Q4 according to the above optimization process. The optimized combinations of QP and nBP are shown in Table 1. We use the LDPCA codes proposed in [13] for all the simulations. A small amount of feedbacks for additional syndrome bits of the LDPCA code are allowed to achieve the successful LDPCA decoding.
4.1. Performance comparison
Figure 6 shows the rate-distortion curves, where the referenced DMDC curves for Foreman and Hall sequence are the best results of [5]. Experimental results show that the residual 2D-DVC outperforms the non-residual 2D-DVC confidently because the efficient residual WZ coding reduces the encoding rate. Specifically, for low motion Hall and Mother-daughter sequences, residual 2D-DVC achieves 0.5-1.8 dB side-quality and 0.2-1.7 dB central-quality improvement. For high motion Foreman sequence, residual schemes obtain 0.2-1.7 dB side-quality improvement with the comparable central quality. Besides, compared with the best results in [5], residual 2D-DVC achieves about 2-3 dB side-quality and 0.5-2 dB central-quality improvement for Hall sequence. The non-residual 2D-DVC achieves about 1-2 dB improvement in side quality with a central-quality decreasing of 0.2-0.7 dB for Hall sequence. Residual 2D-DVC has comparable efficiency at high rate for high-motion Foreman sequence. However, non-residual 2D-DVC is not efficient for Foreman due to the incapability of zero-motion based H.264 in the high-motion cases. For Carphone and Mother-daughter sequences, we compare our scheme with the MDC with zero-motion-based H.264. It is evident that the proposed scheme has comparable rate-distortion performance with MDC zero-motion based H.264. However, the advantage of 2D-DVC over zero-motion-based H.264 lies in the quality consistency when loss occurs, which is shown by the following experimental results.
Figure 7 shows the frame side-PSNR comparison at Q1 point of three MDC schemes, MDC zero-motion-based H.264, residual 2D-DVC and non-residual 2D-DVC. Here, in Figure 7a for Hall sequence, MDC H.264 0-mv is with 56kbps, non-residual 2D-DVC with extra 29kbps and residual 2D-DVC with extra 7 kbps; in Figure 7b for Foreman sequence, MDC H.2640-mv with 297 kbps, 2D-DVC with extra 35 kbps and residual 2D-DVC with extra 29 kbps; in Figure 7(c) for Carphone sequence, MDC H.264 0-mv with 148 kbps, non-residual 2D-DVC with extra 28 kbps and residual 2D-DVC with extra 17 kbps; Figure 7(d) for Mother-daughter sequence, MDC H.264 0-mv with 63 kbps, 2D-DVC with extra 15 kbps and residual 2D-DVC with extra 5 kbps. It is evident that residual 2D-DVC and non-residual 2D-DVC get better quality consistency than MDC zero-motion-based H.264 due to the WZ bitstream added. Moreover, the residual 2D-DVC performs the best.
For further comparison, we compute the variance values of frame side-PSNR for each rate-distortion point according to
where PSNR(i) is the PSNR value of i th frame, and n is the total frame number. E(PSNR) is the average value of PSNR on all frames. Table 2 shows the variance value for all rate-distortion points. It can be seen that the residual and non-residual 2D-DVC achieve smaller variance value, which means they have better consistency in frame side-PSNR.
4.2. Encoding complexity
The proposed 2D-DVC schemes have lower complexity encoding compared with the scheme in [5]. In our schemes, each description consists of two parts, zero-motion-based H.264 encoding and SW-SPIHT encoding. Since each part has computation less than or similar to intra-frame coding, 2D-DVC's encoding complexity is similar to the conventional intra-frame model. However, in [5], each description also has two parts, where both of them apply motion compensated temporal filter so the encoding complexity of the system is similar to the conventional inter-frame model.
Finally, the encoding time is measured in millisecond (ms). The hardware used is an HP notebook nx6330, Intel 2 processor at 1.66 GHz with 1.0 GB of RAM. The software condition is Windows XP operative system with VC6.0 released version. The average encoding time of each frame for Hall sequence are 162, 175, 183, and 199 ms for Q1, Q2, Q3, and Q4, respectively.
5. Conclusion
This article has proposed two 2D-DVC schemes for robust transmission with low-complexity encoding. The video is first separated into two subsequences by odd/even frame splitting. Then in the first 2D-DVC, each description is composed of two parts, part 1 being a zero-motion-based H.264 stream and part 2 a WZ stream, which maintain some redundancy to produce an acceptable quality reconstruction when one description is lost. In the second scheme, a residual 2D-DVC is proposed to reduce the redundancy, where the difference of the two sub-sequences is WZ encoded and used as part 2 in each description. The amount of redundancy can be controlled using an optimization scheme. The experimental results have shown that the proposed schemes can achieve better or comparable rate-distortion performance compared with the referenced one especially when the motion is low. Moreover, our schemes maintain low-complexity encoding so they are suitable for applications of portable video communication devices with very limited power and storage, such as mobile cameras, wireless low-power surveillance devices.
References
Girod B, Aaron A, Rane S, Rebollo-Monedero D: Distributed video coding. Proc IEEE 2005,93(1):71-83.
Puri R, Ramchandran K: PRISM: an uplink-friendly multimedia coding paradigm. Proc International Conference on Acoustics, Speech, and Signal Processing, Hongkong, China 2003, 856-859.
Artigas X, Ascenso J, Dalai M, Klomp S, Kubasoy D, Ouaret M: The DISCOVER codec: architecture, techniques and evaluation. Proc Picture Coding Symposium, Lisbon, Portugal 2007, 1950-1953.
Rane S, Aaron A, Girod B: Systematic lossy forward error protection for error-resilient digital video broadcasting--a WZ coding approach. Proc International Conference on Image Processing, Singapore 2004, 609-612.
Crave O, Guillemot C, Pesquet-Popescu B, Tillier C: Distributed temporal multiple description coding for robust video transmission. EURASIP J Wirel Commun Netw 2008. Article ID 183536
Rane S, Aaron A, Girod B: Error-resilient video transmission using multiple embedded Wyner-Ziv descriptions. IEEE International Conference on Image Processing (ICIP) 2005, 2: 666-669.
Aaron A, Varodayan D, Girod B: Wyner-Ziv residual coding of video. Proc Picture Coding Symposium, Beijing, China 2006, 28-32.
Wyner A, Ziv J: The rate-distortion function for source coding with side information at the decoder. IEEE Trans Inf Theory 1976,22(1):1-10. 10.1109/TIT.1976.1055508
Slepian D, Wolf JK: Noiseless coding of correlated sources. IEEE Trans Inf Theory 1973,19(4):471-480. 10.1109/TIT.1973.1055037
Tang C, Cheung N, Ortega A, Raghavendra C: Efficient inter-band prediction and wavelet based compression for hyperspectral imagery: a distributed source coding approach. Proc Data Compression Conference 2005, 437-446.
Guo X, Lu Y, Wu F, Gao W, Li S: Wyner-Ziv video coding based on set partitioning in hierarchical tree[C]. Proceedings of International conference on Image Processing (ICIP) 2006, 601-604.
Wang A, Zhao Y, Pan JS: Residual distributed video coding based on LQR-hash. Chin J Electron 2009,18(1):109-112.
Varodayan D, Aaron A, Girod B: Rate-adaptive distributed source coding using low-density parity-check codes. EURASIP Signal Process 2006, 86: 3123-3130.
Acknowledgements
This study was supported in part by the Sino-Singapore JRP(2010DFA11010), the National Natural Science Foundation of China (No. 61073142, No. 60903066), the Doctor Startup Foundation of TYUST (No. 20092011), the International Cooperative Program of Shanxi Province (No. 2011081055), The Shanxi Provincial Foundation for Leaders of Disciplines in Science (20111022), the Beijing Natural Science Foundation (No. 4102049), and the New Teacher Foundation of State Education Ministry (No. 20090009120006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Wang, A., Li, Z., Zhao, Y. et al. Two-description distributed video coding for robust transmission. EURASIP J. Adv. Signal Process. 2011, 76 (2011). https://doi.org/10.1186/1687-6180-2011-76
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1687-6180-2011-76