Standard-Compliant Multiple Description Video Coding over Packet Loss Network
© Huihui Bai et al. 2010
Received: 12 October 2009
Accepted: 11 March 2010
Published: 22 April 2010
An effective scheme of multiple description video coding is proposed for transmission over packet loss network. Using priority encoding transmission, we attempt to overcome the limitation of specific scalable video codec and apply FEC-based multiple description to a common video coder, such as the standard H.264. Firstly, multiple descriptions can be generated using temporal downsampling and the frame with high motion changing is duplicated in each description. Then according to different motion characteristics between frames, each description can be divided into several messages, so in each message better temporal correlation can be maintained for better estimation when information losses occur. Based on priority encoding transmission, unequal protections are assigned in each message. Furthermore, the priority is designed in view of packet loss rate of channels and the significance of bit streams. Experimental results validate the effectiveness of the proposed scheme with better performance than the equal protection scheme and other state-of-the-art methods.
With the explosion of the Internet, video transmission has become increasingly popular in the recent years and will continue to flourish in the future. However, network congestion and delay sensibility impose tremendous challenge on video communications. Due to network congestion, random bit errors and packet losses may cause substantial quality degradation of the compressed video sequence. In the case of real-time video application, delay sensibility has made the retransmission of corrupted data impossible. Therefore, this creates a need for coding approaches combining high compression efficiency and robustness. Multiple description coding (MDC) has emerged as an attractive framework for robust transmission over unreliable channels. It can effectively combat packet loss without any retransmission thus satisfying the demand of real time services and relieving the network congestion .
Multiple description coding encodes the source message into several bit streams (descriptions) carrying different information which can then be transmitted over separate channels. Each description can be individually decoded to guarantee a minimum fidelity which is measured by side distortion. More description received can be combined to yield a higher fidelity reconstruction. In a simple architecture of two channels, the distortion generated by two received descriptions is called central distortion . There are two environments for MDC. One is the on-off channels and the other is packet loss network. In the on-off MDC environment, if a channel link is broken, the description passing through that channel is lost and if it is working properly, the description is transmitted error-free. In the packet loss network, packet loss occurs in each description and all the descriptions have to be used at the decoder.
During the past years, several MDC algorithms have been proposed for the on-off channels. Based on the principle of MD scalar quantizer , an MD scheme for video coding is proposed in  while MD correlation transform is also employed to design motion compensated MD video coding . Although the above methods have shown good performance, they are incompatible with widely used standard codecs, such as and MPEG-x.
To overcome the limitation, subsampling technique is applied, such as the MD video coder based on spatial sub-sampling  and the MD video coder based on temporal sub-sampling . Furthermore, a new approach to MDC is proposed in , suitable for block-transform coders, which are the basis of current video coding standards. In , multiple scalable descriptions are generated from a single SVC-compliant bitstream by mapping scalability layers of different frames to different descriptions. And the new schemes of MD video coding are also presented in [10, 11] based on . In view of packet loss network, an unequal packet loss protection scheme is designed in  for robust bitstream transmission, which can achieve higher PSNR values and better user perceived quality than the equal loss protection scheme. In  the proposed MD system uses an overdetermined filter bank to generate multiple descriptions and allows for exact signal reconstruction in the presence of packet losses, which is reported to be competitive compared with other spatial sub-sampling scheme.
For transmission over packet loss network, FEC-based multiple description (FEC-MD) is an attractive approach. The basic idea is to partition a source bit stream into segments with different importance, and protect these segments using different amounts of FEC channel codes, so as to convert a prioritized bit stream into multiple nonprioritized segments. However, this method currently is limited to the scalable video coders [14–16]. In , the scheme can be independent from any specific scalable application. However, as the important factor for the amount of added redundancy, the priority is not optimized to satisfy the channel characteristics.
Inspired by , in this paper, we attempt to overcome the limitation of specific scalable video codec and apply FEC-MD to a common video coder, such as the standard H.264. According to different motion characteristics between frames, an original video sequence is divided into several subsequences as messages, so in each message better temporal correlation can be maintained for better estimation when information losses occur. Based on priority encoding transmission, unequal protections are assigned in each message. Furthermore, the priority is designed in view of packet loss rate of channels and the significance of bit streams.
The rest of this paper is organized as follows. In Section 2, an overview of the proposed MD coding scheme is given. In Section 3, the design of priority is presented in detail. The performance of the proposed scheme is examined in Section 4. We conclude the paper in Section 5.
2. Overview of the Proposed Scheme
Step 1 (Temporal downsampling).
In this paper, multiple descriptions can be generated using temporal downsampling. Here, take two descriptions as a simple example. In the conventional method, odd and even frames can be separated to produce two descriptions. However, for the frames with high motion changing, simple splitting may result in difficult estimation of lost information at the decoder. Therefore, in the proposed scheme these frames are duplicated in each description to maintain the temporal correlations when the original video is downsampled.
For any two neighboring frames, the motion vector for each macroblock (MB) is computed and the maximal motion vector can be obtained. Here, . The change of can be used as the measure to determine the motion between the frames. For any three neighboring frames denoted by , and , if , high motion change is considered between frames and . For keeping temporal correlations between the frames, frames and are duplicated in each description. Here, the threshold is an experience value, which can be determined according to many experimental results.
Suppose in a video with 10 frames high motion exists between the 5th frame and 6th frame. As a result, two generated descriptions are as follows: Description 1 is organized by frame 1, 3, 5, 6, 7, 9 and Description 2 is frames 2, 4, 5, 6, 8, 10. It can be seen frame 5 and 6 are duplicated in each descriptions. At the same time, the serial number of the frames can be adopted to distinguish the position of the frames with motion changing, that is, the continuous frame sequence such as frames 5 and 6 can mean that high motion occurs.
Step 2 (Message construction).
In view of the motion characteristics between the frames, each description can be divided into messages at the position of high motion. In the above example, Description 1 can be divided into two messages, that is, frames 1, 3, 5 as one message and frames 6, 7, 9 as the other message. Similarly in Description 2, the two messages are frames 2, 4, 5 and frames 6, 8, 10, respectively.
Message construction may lead to some improvements. Firstly, flexible group of picture (GOP) is available due to different message construction. The first frame in each message can be intraencoded as I frame and the encoding structure of GOP is chosen according to the length of the message. It is noted that the threshold value can influence flexible GOP structure, because the length of each message may become longer with increasing, which leads to the change of GOP structure. Next, Unequal error protection can be applied to both levels of intra- and intermessage. Firstly, unequal amounts of FEC bytes can be assigned to each frame due to different significance of I, P, B frames, which can produce the intra-message unequal protection. Furthermore, different amounts of I, P, B frames exist in the distinct GOP, which can lead to the unequal protection on the intermessage level.
Step 3 (Standard encoder).
Each message can be encoded to bit streams using current standard codec. Here, H.264 encoder is chosen and obviously the proposed scheme is compatible with the standard codec. It is noted that in each message flexible group of picture (GOP) is employed which is helpful to refresh intraframe adaptively. Compared with the uniform period of intraframe, adaptive refreshment can keep up with the motion change between frames, so better temporal correlation can be maintained to achieve better error concealment if frame loss occurs in one message at the decoder.
Step 4 (Priority encoding transmission (PET)).
Priority encoding transmission is an algorithm that assigns unequal amounts of FEC bytes to different segments of the message according to specified priorities. Priorities are expressed by percentage of packets needed to reconstruct the original information. When the priority is high, corresponding message segment can be recovered using few packets received by the decoder. At the same time, low priority means that more packets are needed to recover the message segments. For the message segments with FEC, as long as the number of lost packets is less than or equal to the number of FEC bytes, the entire recover will be achieved .
Step 5 (Decoder design with error concealment).
In the on-off channel environment, two cases for decoding should be taken into account, that is, the design of central decoder and side decoder. Since the two descriptions are generated by odd and even means, at the central decoder, the two video subsequences after standard decoding can be interleaved firstly. Then according to the serial number of the frames, the duplicated frames will be removed to obtain the central reconstruction. If only one channel works, the side decoder is employed to estimate the lost information. The widely-used method of motion compensation interpolation (MCI) based on the piecewise uniform motion assumption is performed by bidirectional motion estimation, which may produce overlapped pixels and holes in the estimated frame.
In packet loss network, due to both descriptions received suffering from packet losses, only central decoder should be designed. After standard decoding, the two generated video subsequences are interleaved by odd and even means firstly to produce a video with redundant frames. At the decoder the segments whose priorities are not higher than the fraction of packet received can be recover totally. Otherwise, the segment of higher priorities cannot remedy the error data due to packet loss, which may turn to the frames loss. In this case, error concealment should be used to estimate the lost frames. For lost I frame or P frame within one message, the last I frame or P frame that has been decoded correctly can be adopted for forward prediction using motion compensation extrapolation. For lost B frame within one message, its forward and backward I frame or P frame can be used for bidirectional prediction using motion compensation interpolation. Lastly, the duplicated frames are removed to obtain the central reconstruction.
3. The Design of Priority
In the algorithm of priority encoding transmission, the priority percent of segments in each message is a significant factor that can determine the amount of the FEC bytes added, so the design of priority aims to achieve better reconstruction at the cost of fewer FEC bytes. Therefore, we can assign the priority to each segment according to their contribution for the improvement of the message reconstruction. In order to estimate the contribution of the segment, a decoding process with error concealment is simulated at the encoder. To facilitate the following, some notations are defined in the following.
The design of priority needs the decoding process with error concealment which increases the encoding computational complexity to some extent. However, we use frame level-based error concealment to lower the complexity. When computing PSNR(I), frame duplication is a fast algorithm to reconstruct the lost frames. When computing , the motion between frames is considered uniform. Therefore multiple lost frames can be estimated by calculating once motion compensation interpolation. When computing , we can utilize the reconstructed reference frames in H.264 encoding process to reduce the complexity. Moreover, the decision of priority can be implemented offline which guarantees the real time of the whole system.
4. Experimental Results
In this section, the proposed scheme is tested using some standard video sequence in CIF-YUV or QCIF-YUV 4:2:0 format. The frame rate is 30 fps. As for the video codec, we have employed H.264 encoder  and the software version is JM10.2. Firstly, we present the proposed message construction and the performance of flexible GOP. Next, the advantage of the proposed scheme exhibits compared with the equal protection scheme. Lastly, in view of different MD environments, that is, the on-off channel environment and packet loss network, the experiments are performed to evaluate the efficiency of the proposed scheme with respect to state-of-the-art methods.
4.1. Message Construction
The code structure of each description for "Coastguard.qcif".
4.2. Flexible GOP
4.3. Equal Protection Scheme Comparison
According to Figure 4, the 300 frames of "Coastguard.qcif" can be split into two descriptions and each description has 13 messages. The quantization parameters are chosen as QP ( : 25, : 30, : 30). And the coding structure of each message is shown in Table 1. After H.264 encoder, the total bit rate of all the messages is 124.59 kbps. The following experiments are based on such compressed bit streams.
4.4. On-Off Channel Environment
4.5. Packet Loss Network
An MD video coding scheme using priority encoding transmission has been developed in the paper. Effective design of priority has been accommodated in the proposed system to achieve better performance against the packet loss rate. For the message construction, different motion characteristics between frames are taken into account, so in each message, better temporal correlation can be maintained for better estimation when information losses occur. Furthermore, in view of the compatibility with the standard video codec, the proposed scheme may be a worthy choice for the MD coding.
The authors would like to thank the anonymous reviewers for their valuable comments that greatly improved this paper. Additionally, this work was supported in part by National Natural Science Foundation of China (no. 60903066, no. 60972085, and no. 60776794), Natural Science Foundation of Beijing (no. 4102049), New Teacher Foundation of State Education Ministry (no. 20090009120006), 973 program (no. , 863 program (no. 2007AA01Z175), and Research Foundation of BJTU .
- Wang Y, Reibman AR, Lin S: Multiple description coding for video delivery. Proceedings of the IEEE 2005, 93(1):57-69.View ArticleGoogle Scholar
- Goyal VK: Multiple description coding: compression meets the network. IEEE Signal Processing Magazine 2001, 18(5):74-93. 10.1109/79.952806View ArticleGoogle Scholar
- Vaishampayan V: Design of multiple description scalar quantizers. IEEE Transactions on Information Theory 1993, 39(3):821-834. 10.1109/18.256491View ArticleMATHGoogle Scholar
- Vaishampayan VA, John S: Balanced interframe multiple description video compression. Proceedings of the IEEE International Conference on Image Processing (ICIP '99), October 1999, Kobe, Japan 3: 812-816.View ArticleGoogle Scholar
- Reibman AR, Jafarkhani H, Wang Y, Orchard MT, Puri R: Multiple description coding for video using motion compensated prediction. Proceedings of the IEEE International Conference on Image Processing (ICIP '99), October 1999, Kobe, Japan 3: 837-841.View ArticleGoogle Scholar
- Wang D, Canagarajah N, Redmill D, Bull D: Multiple description video coding based on zero padding. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '04), 2004, Vancouver, Canada 2: II205-II208.Google Scholar
- Bai H, Zhao Y, Zhu C: Multiple description video coding using adaptive temporal sub-sampling. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '07), July 2007, Beijing China 1331-1334.Google Scholar
- Conci N, De Natale FGB: Multiple description video coding using coefficients ordering and interpolation. Signal Processing: Image Communication 2007, 22(3):252-265. 10.1016/j.image.2006.12.009Google Scholar
- Abanoz TB, Tekalp AM: SVC-based scalable multiple description video coding and optimization of encoding configuration. Signal Processing: Image Communication 2009, 24(9):691-701. 10.1016/j.image.2009.07.003Google Scholar
- Radulovic I, Frossard P, Wang YK, Hannuksela MM, Hallapuro A: Multiple description video coding with H.264/AVC redundant pictures. IEEE Transactions on Circuits and Systems for Video Technology 2010, 10(1):144-148.View ArticleGoogle Scholar
- Su C-C, Chen HH, Yao JJ, Huang P: H.264/AVC-based multiple description video coding using dynamic slice groups. Signal Processing: Image Communication 2008, 23(9):677-691. 10.1016/j.image.2008.07.002Google Scholar
- Zhang X, Peng X: An unequal packet loss protection scheme for H.264/AVC video transmission. Proceedings of the International Conference on Information Networking (ICOIN '09), January 2009 1-5.Google Scholar
- Bernardini R, Rinaldo R, Tonello A, Vitali A: Frame based multiple description for multimedia transmission over wireless networks. Proceedings of the 5th International Symposium on Wireless Personal Multimedia Communications (WPMC '04), July 2004, Abano Terme, Italy 2: 529-532.Google Scholar
- Mohr AE, Riskin EA, Ladner RE: Graceful degradation over packet erasure channels through forward error correction. Proceedings of the Data Compression Conference (DCC '99), March 1999, Snowbird, Utah, USA 92-101.Google Scholar
- Stankovic V, Hamzaoui R, Xiong Z: Packet loss protection of embedded data with fast local search. Proceedings of the IEEE International Conference on Image Processing (ICIP '02), September 2002, Rochester, NY, USA 2: II165-II168.Google Scholar
- Dumitrescu S, Wu X, Wang Z: Globally optimal uneven error-protected packetization of scalable code streams. IEEE Transactions on Multimedia 2004, 6(2):230-239. 10.1109/TMM.2003.822793View ArticleGoogle Scholar
- Leicher C: Hierarchical encoding of MPEG sequences using priority encoding transmission (PET). ICSI; 1994.Google Scholar
- H.264 standard JVT-G050, 7th meeting, Pattaya, Thailand, March 2003Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.