- Research Article
- Open Access
Standard-Compliant Multiple Description Video Coding over Packet Loss Network
EURASIP Journal on Advances in Signal Processing volume 2010, Article number: 987164 (2010)
An effective scheme of multiple description video coding is proposed for transmission over packet loss network. Using priority encoding transmission, we attempt to overcome the limitation of specific scalable video codec and apply FEC-based multiple description to a common video coder, such as the standard H.264. Firstly, multiple descriptions can be generated using temporal downsampling and the frame with high motion changing is duplicated in each description. Then according to different motion characteristics between frames, each description can be divided into several messages, so in each message better temporal correlation can be maintained for better estimation when information losses occur. Based on priority encoding transmission, unequal protections are assigned in each message. Furthermore, the priority is designed in view of packet loss rate of channels and the significance of bit streams. Experimental results validate the effectiveness of the proposed scheme with better performance than the equal protection scheme and other state-of-the-art methods.
With the explosion of the Internet, video transmission has become increasingly popular in the recent years and will continue to flourish in the future. However, network congestion and delay sensibility impose tremendous challenge on video communications. Due to network congestion, random bit errors and packet losses may cause substantial quality degradation of the compressed video sequence. In the case of real-time video application, delay sensibility has made the retransmission of corrupted data impossible. Therefore, this creates a need for coding approaches combining high compression efficiency and robustness. Multiple description coding (MDC) has emerged as an attractive framework for robust transmission over unreliable channels. It can effectively combat packet loss without any retransmission thus satisfying the demand of real time services and relieving the network congestion .
Multiple description coding encodes the source message into several bit streams (descriptions) carrying different information which can then be transmitted over separate channels. Each description can be individually decoded to guarantee a minimum fidelity which is measured by side distortion. More description received can be combined to yield a higher fidelity reconstruction. In a simple architecture of two channels, the distortion generated by two received descriptions is called central distortion . There are two environments for MDC. One is the on-off channels and the other is packet loss network. In the on-off MDC environment, if a channel link is broken, the description passing through that channel is lost and if it is working properly, the description is transmitted error-free. In the packet loss network, packet loss occurs in each description and all the descriptions have to be used at the decoder.
During the past years, several MDC algorithms have been proposed for the on-off channels. Based on the principle of MD scalar quantizer , an MD scheme for video coding is proposed in  while MD correlation transform is also employed to design motion compensated MD video coding . Although the above methods have shown good performance, they are incompatible with widely used standard codecs, such as and MPEG-x.
To overcome the limitation, subsampling technique is applied, such as the MD video coder based on spatial sub-sampling  and the MD video coder based on temporal sub-sampling . Furthermore, a new approach to MDC is proposed in , suitable for block-transform coders, which are the basis of current video coding standards. In , multiple scalable descriptions are generated from a single SVC-compliant bitstream by mapping scalability layers of different frames to different descriptions. And the new schemes of MD video coding are also presented in [10, 11] based on . In view of packet loss network, an unequal packet loss protection scheme is designed in  for robust bitstream transmission, which can achieve higher PSNR values and better user perceived quality than the equal loss protection scheme. In  the proposed MD system uses an overdetermined filter bank to generate multiple descriptions and allows for exact signal reconstruction in the presence of packet losses, which is reported to be competitive compared with other spatial sub-sampling scheme.
For transmission over packet loss network, FEC-based multiple description (FEC-MD) is an attractive approach. The basic idea is to partition a source bit stream into segments with different importance, and protect these segments using different amounts of FEC channel codes, so as to convert a prioritized bit stream into multiple nonprioritized segments. However, this method currently is limited to the scalable video coders [14–16]. In , the scheme can be independent from any specific scalable application. However, as the important factor for the amount of added redundancy, the priority is not optimized to satisfy the channel characteristics.
Inspired by , in this paper, we attempt to overcome the limitation of specific scalable video codec and apply FEC-MD to a common video coder, such as the standard H.264. According to different motion characteristics between frames, an original video sequence is divided into several subsequences as messages, so in each message better temporal correlation can be maintained for better estimation when information losses occur. Based on priority encoding transmission, unequal protections are assigned in each message. Furthermore, the priority is designed in view of packet loss rate of channels and the significance of bit streams.
The rest of this paper is organized as follows. In Section 2, an overview of the proposed MD coding scheme is given. In Section 3, the design of priority is presented in detail. The performance of the proposed scheme is examined in Section 4. We conclude the paper in Section 5.
2. Overview of the Proposed Scheme
Figure 1 illustrates our scheme and a step-by-step recipe is explained as follows.
Step 1 (Temporal downsampling).
In this paper, multiple descriptions can be generated using temporal downsampling. Here, take two descriptions as a simple example. In the conventional method, odd and even frames can be separated to produce two descriptions. However, for the frames with high motion changing, simple splitting may result in difficult estimation of lost information at the decoder. Therefore, in the proposed scheme these frames are duplicated in each description to maintain the temporal correlations when the original video is downsampled.
For any two neighboring frames, the motion vector for each macroblock (MB) is computed and the maximal motion vector can be obtained. Here, . The change of can be used as the measure to determine the motion between the frames. For any three neighboring frames denoted by , and , if , high motion change is considered between frames and . For keeping temporal correlations between the frames, frames and are duplicated in each description. Here, the threshold is an experience value, which can be determined according to many experimental results.
Suppose in a video with 10 frames high motion exists between the 5th frame and 6th frame. As a result, two generated descriptions are as follows: Description 1 is organized by frame 1, 3, 5, 6, 7, 9 and Description 2 is frames 2, 4, 5, 6, 8, 10. It can be seen frame 5 and 6 are duplicated in each descriptions. At the same time, the serial number of the frames can be adopted to distinguish the position of the frames with motion changing, that is, the continuous frame sequence such as frames 5 and 6 can mean that high motion occurs.
Step 2 (Message construction).
In view of the motion characteristics between the frames, each description can be divided into messages at the position of high motion. In the above example, Description 1 can be divided into two messages, that is, frames 1, 3, 5 as one message and frames 6, 7, 9 as the other message. Similarly in Description 2, the two messages are frames 2, 4, 5 and frames 6, 8, 10, respectively.
Message construction may lead to some improvements. Firstly, flexible group of picture (GOP) is available due to different message construction. The first frame in each message can be intraencoded as I frame and the encoding structure of GOP is chosen according to the length of the message. It is noted that the threshold value can influence flexible GOP structure, because the length of each message may become longer with increasing, which leads to the change of GOP structure. Next, Unequal error protection can be applied to both levels of intra- and intermessage. Firstly, unequal amounts of FEC bytes can be assigned to each frame due to different significance of I, P, B frames, which can produce the intra-message unequal protection. Furthermore, different amounts of I, P, B frames exist in the distinct GOP, which can lead to the unequal protection on the intermessage level.
Step 3 (Standard encoder).
Each message can be encoded to bit streams using current standard codec. Here, H.264 encoder is chosen and obviously the proposed scheme is compatible with the standard codec. It is noted that in each message flexible group of picture (GOP) is employed which is helpful to refresh intraframe adaptively. Compared with the uniform period of intraframe, adaptive refreshment can keep up with the motion change between frames, so better temporal correlation can be maintained to achieve better error concealment if frame loss occurs in one message at the decoder.
Step 4 (Priority encoding transmission (PET)).
Priority encoding transmission is an algorithm that assigns unequal amounts of FEC bytes to different segments of the message according to specified priorities. Priorities are expressed by percentage of packets needed to reconstruct the original information. When the priority is high, corresponding message segment can be recovered using few packets received by the decoder. At the same time, low priority means that more packets are needed to recover the message segments. For the message segments with FEC, as long as the number of lost packets is less than or equal to the number of FEC bytes, the entire recover will be achieved .
Here, the message segments in each bit stream can be composed of three types, that is, I frames, P frames, and B frames after H.264 encoding. In view of different significance, I frames have the highest priority and P frames have higher priority than B frames. In this paper, the packet loss rate of channels is also taken into account to design the priority, which will be discussed in Section 3 in detail. Figure 2 depicts a simple example of priority encoding transmission. For the message with 19 bytes, there are three segments whose priorities are 40%, 60%, and 100% respectively. Firstly, according to the demanded packet size 6 bytes and the priorities, each segment can be divided into blocks with appropriate FEC bytes. Here, since the priority of segment 1 is 40%, each block with 2 bytes is added by 3 FEC bytes. In the same way, each block with 3 bytes in segment 2 is protected by 2 bytes according to the priority 60%. It is noted that since the length of segment 2 is 10 bytes which cannot be divided averagely by 3 bytes, the length of block 3 has one more byte than the other two blocks. Additionally, for the priority 100% of segment 3, no FEC bytes are needed. Then the new message which includes the original data and FEC bytes will be mapped into 5 packets shown as Figure 2. These packets can be transmitted to the receiver over channels.
Step 5 (Decoder design with error concealment).
In the on-off channel environment, two cases for decoding should be taken into account, that is, the design of central decoder and side decoder. Since the two descriptions are generated by odd and even means, at the central decoder, the two video subsequences after standard decoding can be interleaved firstly. Then according to the serial number of the frames, the duplicated frames will be removed to obtain the central reconstruction. If only one channel works, the side decoder is employed to estimate the lost information. The widely-used method of motion compensation interpolation (MCI) based on the piecewise uniform motion assumption is performed by bidirectional motion estimation, which may produce overlapped pixels and holes in the estimated frame.
For convenience, we denote by the estimated frame between frame and frame and by the motion vector for the pixel location . To avoid the holes in the estimated frame, we can compute a preliminary reconstruction as background
Furthermore, the forward and backward motion compensation can be performed for frames and , respectively. To solve the overlapped problem of MCI, the mean values of overlapped pixels are adopted for motion compensation. Then the preliminary background may be replaced by the MCI-based reconstruction according to
In packet loss network, due to both descriptions received suffering from packet losses, only central decoder should be designed. After standard decoding, the two generated video subsequences are interleaved by odd and even means firstly to produce a video with redundant frames. At the decoder the segments whose priorities are not higher than the fraction of packet received can be recover totally. Otherwise, the segment of higher priorities cannot remedy the error data due to packet loss, which may turn to the frames loss. In this case, error concealment should be used to estimate the lost frames. For lost I frame or P frame within one message, the last I frame or P frame that has been decoded correctly can be adopted for forward prediction using motion compensation extrapolation. For lost B frame within one message, its forward and backward I frame or P frame can be used for bidirectional prediction using motion compensation interpolation. Lastly, the duplicated frames are removed to obtain the central reconstruction.
3. The Design of Priority
In the algorithm of priority encoding transmission, the priority percent of segments in each message is a significant factor that can determine the amount of the FEC bytes added, so the design of priority aims to achieve better reconstruction at the cost of fewer FEC bytes. Therefore, we can assign the priority to each segment according to their contribution for the improvement of the message reconstruction. In order to estimate the contribution of the segment, a decoding process with error concealment is simulated at the encoder. To facilitate the following, some notations are defined in the following.
Let us assume that frame, frame, and frame are the three types of segments in the description. denotes the reconstruction quality of the message when only frames can be decoded correctly. For simplicity the lost or frame can be reconstructed as the copy of frame. then represents the recovery when both and frames can be received correctly. Here, the lost frame can be estimated using motion compensation interpolation. And is the entirely reconstruction with no losses. Obviously, PSNR(I,) . Therefore, we can consider that the improvement due to frame is PSNR . In the same way, the improvement from frame is . and are the preliminary priorities of frame, frame, and frame, respectively. As a result, we can compute the following priorities:
Here, the constant parameter can be adjusted to satisfy the bit rate. The above formulas provide the basic relationship between three priorities, so , and can be computed from one to the other. That is,
If the largest packet loss rate of channels is taken into account and the acceptable lowest reconstruction quality is PSNR, the formulas can be modified as follows.
If , then the priorities can be updated as and , and can be computed according to their relationship.
Similarly, if , then the priorities will be modified as and can also be computed from .
If , that is, the entire recovery should be achieved, then .
The design of priority needs the decoding process with error concealment which increases the encoding computational complexity to some extent. However, we use frame level-based error concealment to lower the complexity. When computing PSNR(I), frame duplication is a fast algorithm to reconstruct the lost frames. When computing , the motion between frames is considered uniform. Therefore multiple lost frames can be estimated by calculating once motion compensation interpolation. When computing , we can utilize the reconstructed reference frames in H.264 encoding process to reduce the complexity. Moreover, the decision of priority can be implemented offline which guarantees the real time of the whole system.
4. Experimental Results
In this section, the proposed scheme is tested using some standard video sequence in CIF-YUV or QCIF-YUV 4:2:0 format. The frame rate is 30 fps. As for the video codec, we have employed H.264 encoder  and the software version is JM10.2. Firstly, we present the proposed message construction and the performance of flexible GOP. Next, the advantage of the proposed scheme exhibits compared with the equal protection scheme. Lastly, in view of different MD environments, that is, the on-off channel environment and packet loss network, the experiments are performed to evaluate the efficiency of the proposed scheme with respect to state-of-the-art methods.
4.1. Message Construction
Figure 3 shows of maximal motion vectors for the standard video sequence "Coastguard.qcif". Here, the threshold . In the original video "Coastguard.qcif", according to the position of high motion 18 frames are duplicated and then 13 messages are constructed in each description shown in Figure 4.
According to the frame amounts of messages in Figure 4, the coding structure for each description, that is, the coding type for each frame is designed and shown in Table 1. From Table 1, the flexible GOP to satisfy the different message can be seen. Furthermore, the frames with high motion are encoded as frame, which turns to be assigned important protection using PET algorithm.
4.2. Flexible GOP
To substantiate the improvement of the proposed scheme with flexible GOP, the following experiment is performed. The first 100 frames of the standard video "Coastguard.qcif" are selected to produce two descriptions directly by frame splitting. Then the generated descriptions are encoded by H.264 with flexible GOP and fixed GOP. Lastly, the same error concealment is applied to reconstruct the lost frames when only one description is received. Here, the flexible GOP is as Table 1 for "Coastguard.qcif" and the fixed GOP is I-P-B-B-P-B-B. From Figure 5(a), it can be seen that better rate distortion performance is achieved by the flexible GOP than fixed GOP. Furthermore, in Figure 5(b) the standard video "Foreman.qcif" is also tested to obtain the same results.
4.3. Equal Protection Scheme Comparison
According to Figure 4, the 300 frames of "Coastguard.qcif" can be split into two descriptions and each description has 13 messages. The quantization parameters are chosen as QP (: 25, : 30, : 30). And the coding structure of each message is shown in Table 1. After H.264 encoder, the total bit rate of all the messages is 124.59 kbps. The following experiments are based on such compressed bit streams.
After priority encoding transmission, the total bit rate of information data and FEC is 177.98 kbps for 300 frames. To make a fair comparison, the quantization parameters, coding structure, and error concealment are the same in the equal protection scheme. Figure 6 shows the performance of the proposed scheme against the equal protection over packet loss network at the same total bit rate. It is noted that in equal protection scheme, the same amounts of FEC bytes are assigned to the segments, which means the same priorities when using PET algorithm. From Figure 6(a), we can see that at the low packet loss rate (<30%), the performance of equal protection surpasses the proposed scheme and the largest gap between the two schemes is less than . However, at the packet loss rate 35%, the proposed scheme can degrade gracefully while the equal protection has a sharp transition. Here, the largest gap that the propose scheme surpasses the equal protection is about 6 . This is because in equal protection scheme the priorities of all the frames are 30%, that is, at the packet loss rate 35% almost all the frames cannot be decoded correctly, which results in the sharp degrade of the quality.
Figure 6(b) shows the performance of each frame at the packet loss rate of 35%. Obviously, the proposed scheme has taken more advantages than the equal protection. We have also investigated the visual subjective quality of the proposed scheme compared with equal protection scheme. In Figure 7 it can be seen that in equal protection scheme, substantial distortion exists from the 27th frame to the 30th frame and the frame quality is significantly improved by the proposed scheme.
4.4. On-Off Channel Environment
In the following experiment, the rate distortion performance is compared between the proposed with the scheme based on H.264  in the on-off channel environment. For a fair comparison, the first 150 frames of "Mobile.cif" are selected and the coding structure is IPPP… without frames. Figure 8 shows the central and side distortion performance of the proposed scheme against the scheme  at the same bit rate. From the figures, we can clearly see that the proposed scheme outperforms the tested scheme by about 0.3–0.8 dB in side distortion and 0.7–1.8 dB in central distortion. From Figure 8(b), we can see that with the bit rate increasing, PSNR gain becomes better due to better performance of error concealment at the decoder.
4.5. Packet Loss Network
Firstly, the 100 frames of "New.cif" are selected and the GOP structure is I BBBB P BBBB P BBBB P BBBB I for a fair comparison. From Figure 9, it can be seen that for the packet loss rates 1% and 10%, the proposed scheme performs better than the scheme in , which may result from better temporal correlation in the proposed scheme to estimate the lost information.
Next, the 100 frames of "Paris.cif" have been encoded in an IPBPB… structure for the comparison with the unequal protection scheme in . The packet loss rate is tuned from 1% to 20%. Figure 10 shows the better performance of the proposed scheme than the compared one . In  only three kinds of RS code are used for unequal protection, which may be a limitation of the performance.
An MD video coding scheme using priority encoding transmission has been developed in the paper. Effective design of priority has been accommodated in the proposed system to achieve better performance against the packet loss rate. For the message construction, different motion characteristics between frames are taken into account, so in each message, better temporal correlation can be maintained for better estimation when information losses occur. Furthermore, in view of the compatibility with the standard video codec, the proposed scheme may be a worthy choice for the MD coding.
Wang Y, Reibman AR, Lin S: Multiple description coding for video delivery. Proceedings of the IEEE 2005, 93(1):57-69.
Goyal VK: Multiple description coding: compression meets the network. IEEE Signal Processing Magazine 2001, 18(5):74-93. 10.1109/79.952806
Vaishampayan V: Design of multiple description scalar quantizers. IEEE Transactions on Information Theory 1993, 39(3):821-834. 10.1109/18.256491
Vaishampayan VA, John S: Balanced interframe multiple description video compression. Proceedings of the IEEE International Conference on Image Processing (ICIP '99), October 1999, Kobe, Japan 3: 812-816.
Reibman AR, Jafarkhani H, Wang Y, Orchard MT, Puri R: Multiple description coding for video using motion compensated prediction. Proceedings of the IEEE International Conference on Image Processing (ICIP '99), October 1999, Kobe, Japan 3: 837-841.
Wang D, Canagarajah N, Redmill D, Bull D: Multiple description video coding based on zero padding. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '04), 2004, Vancouver, Canada 2: II205-II208.
Bai H, Zhao Y, Zhu C: Multiple description video coding using adaptive temporal sub-sampling. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '07), July 2007, Beijing China 1331-1334.
Conci N, De Natale FGB: Multiple description video coding using coefficients ordering and interpolation. Signal Processing: Image Communication 2007, 22(3):252-265. 10.1016/j.image.2006.12.009
Abanoz TB, Tekalp AM: SVC-based scalable multiple description video coding and optimization of encoding configuration. Signal Processing: Image Communication 2009, 24(9):691-701. 10.1016/j.image.2009.07.003
Radulovic I, Frossard P, Wang YK, Hannuksela MM, Hallapuro A: Multiple description video coding with H.264/AVC redundant pictures. IEEE Transactions on Circuits and Systems for Video Technology 2010, 10(1):144-148.
Su C-C, Chen HH, Yao JJ, Huang P: H.264/AVC-based multiple description video coding using dynamic slice groups. Signal Processing: Image Communication 2008, 23(9):677-691. 10.1016/j.image.2008.07.002
Zhang X, Peng X: An unequal packet loss protection scheme for H.264/AVC video transmission. Proceedings of the International Conference on Information Networking (ICOIN '09), January 2009 1-5.
Bernardini R, Rinaldo R, Tonello A, Vitali A: Frame based multiple description for multimedia transmission over wireless networks. Proceedings of the 5th International Symposium on Wireless Personal Multimedia Communications (WPMC '04), July 2004, Abano Terme, Italy 2: 529-532.
Mohr AE, Riskin EA, Ladner RE: Graceful degradation over packet erasure channels through forward error correction. Proceedings of the Data Compression Conference (DCC '99), March 1999, Snowbird, Utah, USA 92-101.
Stankovic V, Hamzaoui R, Xiong Z: Packet loss protection of embedded data with fast local search. Proceedings of the IEEE International Conference on Image Processing (ICIP '02), September 2002, Rochester, NY, USA 2: II165-II168.
Dumitrescu S, Wu X, Wang Z: Globally optimal uneven error-protected packetization of scalable code streams. IEEE Transactions on Multimedia 2004, 6(2):230-239. 10.1109/TMM.2003.822793
Leicher C: Hierarchical encoding of MPEG sequences using priority encoding transmission (PET). ICSI; 1994.
H.264 standard JVT-G050, 7th meeting, Pattaya, Thailand, March 2003
The authors would like to thank the anonymous reviewers for their valuable comments that greatly improved this paper. Additionally, this work was supported in part by National Natural Science Foundation of China (no. 60903066, no. 60972085, and no. 60776794), Natural Science Foundation of Beijing (no. 4102049), New Teacher Foundation of State Education Ministry (no. 20090009120006), 973 program (no. , 863 program (no. 2007AA01Z175), and Research Foundation of BJTU .
About this article
Cite this article
Bai, H., Zhao, Y. & Zhang, M. Standard-Compliant Multiple Description Video Coding over Packet Loss Network. EURASIP J. Adv. Signal Process. 2010, 987164 (2010). https://doi.org/10.1155/2010/987164
- Packet Loss
- Packet Loss Rate
- Scalable Video Coder
- Error Concealment
- Multiple Description