Multiple Description Wavelet Coding of Layered Video Using Optimal Redundancy Allocation

We present a wavelet-based framework for the encoding of video in multiple descriptions. Using the proposed methodology, the generation of multiple descriptions is performed so that drift is eliminated at the decoder regardless of the number of received descriptions. Moreover, the proposed framework is ﬂexible in the sense that it allows the encoding of video into an arbitrary number of descriptions. We also present a thorough analysis of rate allocation issues and propose three algorithms for the optimal allocation of redundancy. Experimental results for the transmission of video using two descriptions demonstrate the e ﬃ ciency of the proposed method.


INTRODUCTION
Multiple description (MD) coding [1,2] offers an attractive framework for the transmission of multimedia over heterogeneous networks.In MD coding, a source is encoded into multiple independently decodable bitstreams which are mutually refining and equally important.At the decoder side, the reconstruction quality is dependent on the number of descriptions that was errorlessly received.Due to its flexibility, multiple description coding is considered a very robust and reliable tool for information transmission.
Multiple description coding has been investigated for image [3][4][5] and video transmission [6][7][8][9][10][11].In the particular case of video transmission, the study of MD systems becomes more complicated due to the uncertainty about the information that will be available at the decoder of an MD system.
In [12], a methodology was presented for the design of two-channel orthonormal filter banks based on the Lagrangian optimization of the redundancy rate-distortion performance of MD subband coding.In [7], an MD predictive quantization system was introduced, appropriate for the encoding of correlated information sources such as video and speech.The proposed system was used to construct a balanced twin-description interframe MD video coder, and performance results are presented using two packetization strategies.A review on MD coding was recently presented in [13].
In [6], MD video coders were proposed which use motion-compensated prediction.These systems utilize MD transform coding, three separate prediction paths, and side information in order to accommodate all possible scenarios at the decoder.For this reason, three different algorithms for redundancy allocation were implemented, and experimental results were presented.An improved algorithm based on the same principles was presented in [10] where the encoding of the side information was modified in order to be useful even if no drift occurs.In [14], a novel scheme for doubledescription coding was proposed, which is built in the H.263 coder and replicates some selected DCT coefficients in both descriptions.The selection is based on a threshold determined using rate-distortion techniques.In [8], a novel way to deal with redundancy was devised.Temporal redundancy was used to control the tradeoff between drift and redundancy.However, this method does not inherently eliminate drift, that is, the cumulative distortion which occurs whenever the reference frames used at the decoder are not identical to the ones used by the encoder.
In [9], a drift-free wavelet-based MDC video coding scheme was proposed.However, the redundancy allocation algorithm did not take into consideration the impact of the temporal redundancy into the design of the system, thus resulting in suboptimal coding.The above problem was dealt with in [15], where an improved version of the method in [9] was presented.
In [16], a multiple description coding method for video streaming was presented.The method in [16] was based on a 3D discrete wavelet transform.Redundancy was allocated by applying Lagrangian optimization techniques for the appropriate selection of subband quantizers.In [17], an MDC scheme for video coding was presented based on a spatiotemporal multiresolution analysis.Correlation between the two descriptions was introduced in the temporal domain by using an oversampled motion-compensated filter bank.
In the present paper, the intraframe and the motion compensated prediction residual frames are wavelet-coded and divided into a redundant and an enhancement part with the redundant part encoded in all descriptions and the enhancement part distributed in several descriptions.The "repeat or split" strategy was chosen over other proposed techniques, such as that presented in [2] since, in our case, drift-free reconstruction is straightforward.Using the above framework, we present and evaluate two techniques for the multiple description coding of video sequences.
(i) In the first technique, only the redundant part is used for the construction of reference frames and thus the resulting video coding scheme is able to perform drift-free reconstruction.Since the quality of the reference frame affects the coding efficiency of the system, an algorithm incorporating the impact of temporal correlation is also presented for the allocation of redundancy among multiple descriptions.
(ii) In the second technique, both the redundant and the nonredundant parts of the stream are used for the creation of the reference frame.This technique uses high-quality reference frames but the reconstructed video suffers from drift in case of transmission over channels with severe loss.
Additionally, in the present paper the problem of optimal redundancy allocation, that is, the appropriate selection of the redundant and the enhancement parts for each frame, is investigated.Specifically, this problem is formulated as the maximization of the average video quality under the constraint of a target total rate.Three variations of an optimization algorithm are proposed and evaluated in terms of their complexity.It should be noted here that, in our system, the compression and the optimization steps are distinct.In this manner, our redundancy allocation algorithm is applied directly to compressed source layers, that is, the algorithm actually parses the compressed stream to multiple descriptions.This clearly differentiates our algorithm from the method in [16] in which the generation of descriptions is performed by application of appropriate quantizers to the transform coefficients.
The structure of the paper is as follows.In Section 2, the proposed framework for multiple description coding of video is presented.Section 3 describes the wavelet coding of intraframes and motion compensation residuals.In Section 4, the exploitation of temporal correlation during the optimization process is discussed.In Section 5, the redundancy allocation problem is formulated.The complexity of the redundancy allocation algorithm is studied in Section 6, and a faster algorithm is presented in Section 7 based on the Equivalent Continuous Problem.In Section 8, experimental results are presented and finally conclusions are drawn in Section 9.

PROPOSED FRAMEWORK FOR MULTIPLE DESCRIPTION GENERATION
The proposed system for the generation of multiple descriptions is depicted in Figures 1 and 2. Initially, the available bit budget is evenly allocated to the frames in a group of pictures (GOP).The first frame in each GOP is intra-coded using block-based wavelet coding.The resulting coded stream is distributed over a number of descriptions.A portion of the bitstream is redundant in all descriptions.The correlation between consecutive frames is subsequently removed using overlapped block motion compensation (OBMC) [18].The reference frames used to calculate motion vectors are the original frames in order to ensure good precision in the estimation of the motion vectors.Motion vectors are losslessly coded using the techniques in [19] and are included in all descriptions.
Using the previously estimated half-pixel accurate motion vectors, the procedure for the generation of multiple descriptions for the interframes continues as follows: initially, the first interframe is compensated.No intra-coding is used in interframes.We employ two different mechanisms for the derivation of reference frames that are used during motion compensation.In the first, a version of the I-frame, reconstructed using only the redundant part of the bitstream so far coded, is used as reference for the compensation process.In the second, both redundant and nonredundant parts are used for the derivation of reference frames in motion compensation.The prediction error is derived by subtracting the compensated prediction from the original interframe.The prediction error is wavelet transformed and coded into multiple descriptions.A version of the error frame is reconstructed using either the redundant part or both redundant and nonredundant information of the coded bitstream depending on which of the two mechanisms described above is used.The reconstructed error frame is added to the compensated frame.The resulting interframe (instead of the original) will serve as the reference frame for the compensation of the next interframe.The same procedure is iterated until all frames in a GOP are treated.
Using the above methodology, the proposed multiple description video coding scheme is able to produce an arbitrary number of descriptions at the cost of reduced compression efficiency whenever the number of descriptions is large.In each description, there is a redundant part, which is always used for the derivation of the reference frame in the motion compensation process, and a complementary refinement part, which is used to improve the quality of each description and may or may not be used for the derivation of the reference frame.When both redundant and nonredundant information is used, reference frames of high quality are available.When only the redundant part is used, the motion compensation process performed at the encoder can be identically  replicated at the decoder even if only one description is received.This is a very important feature of our coder since, if the decoder is unable to use the same reference frames, errors will accumulate in the decoded video sequence causing the aforementioned drift distortion [20].With the proposed methodology, which relies only on the redundant part for motion compensation, the possibility of facing drift at the decoder is eliminated and thus a reconstructed sequence of high quality is obtained even if only some (or even a single) descriptions are received.The determination of the portion of the bitstream that is redundant in all descriptions is performed after the wavelet coding of the intra and the residual error frames.The wavelet coefficients are coded using a simple bitplane encoder, based on the context models in [21].Specifically, the decomposed frame is divided into blocks of equal dimensions.Each block may be included in some or all descriptions.Thus, some blocks may appear in all descriptions whereas some other blocks appear in only one of the descriptions.The inclusion of blocks in one or more descriptions is done so as to maximize the average quality at the decoder, subject to a total rate constraint, and attain fairly equal bitrate and fairly equal quality descriptions.Such an assignment is depicted in Figure 3(a).A representation of the redundant and nonredundant part of the coded bitstream for a two-description system is shown in Figure 3(b).The generation of descriptions can be achieved by including appropriate blocks of wavelet coefficients in one or both of the descriptions.In the case of two descriptions, this is achieved by using the checkerboard pattern which we originally proposed in [9].This approach bears some resemblance with the flexible macroblock ordering (FMO) approach in H.264 (see, e.g., [22]).However, there are fundamental differences between FMO and our approach which arise from the fact that our method operates in the wavelet domain whereas FMO is applied in the spatial domain.Since the FMO approach uses spatial blocks, the loss of a block would mean complete loss of information for that spatial region.This is why in FMO at least a coarsely quantized version of a chess-block need be included in each description.Clearly, this means that using FMO there is much less control over redundancy since information about all blocks need be encoded in both descriptions.Moreover, since redundancy is introduced by the use of different quantizers, and not by explicitly including the same portion of the bitstream in all descriptions, the elimination of drift is not a trivial task.Finally, in FMO there is a need for error concealment in case the reconstructed quality in a spatial region is not good.Unlike the FMO approach, in our system, a loss of a wavelet block (due to the loss of the description in which the block is encoded) causes only the loss of some detail in the reconstructed frame.Moreover, in our method, most wavelet blocks are included in only one of the descriptions and only a few important blocks are included in both descriptions.This is possible since the wavelet transform compacts the important information in a few blocks (subbands) of transform coefficients.This strategy seems to be naturally more suitable for MD coding since it allows better manipulation of redundancy and generally achieves lower redundancy levels.
Throughout our manuscript we assume that no B-frames are encoded (see Figure 4).However, this assumption does not affect the significance of our work, which can also be applied when using B-frames.Suppose that we have an intra-coded frame, several (unidirectionally predicted) interframes, and some other frames that are to be bidirectionally predicted using the intra-and interframes.Apparently, our MD generation methodology is directly applicable to the sequence of intra-and interframes.In each description,  bidirectionally predicted frames could be encoded based on the reconstructions of intra-and interframes which are achieved using the bitstream in the same description.Note that, since B-frames do not propagate errors and do not cause drift, the reconstructed versions of intra-and interframes can be obtained using not only the redundant part of the description but also using the nonredundant part as well.An interesting and desirable result of this strategy is that, as these reconstructions will be different in the two descriptions, the associated residuals of the bidirectionally predicted frames will be inherently different in the two descriptions.This is perfectly consistent with the MD coding principle of encoding different versions of the information in each description.
In the ensuing section, the complete wavelet coding method, used for both intra-and interframes, is described.

BLOCK-BASED WAVELET CODING OF MOTION COMPENSATION RESIDUALS
The intra-frame and the motion-compensated residuals are decomposed using a wavelet transform based on the 9-7 biorthogonal filter bank [23].The maximum absolute coefficient in each subband is placed in the image header.All subband maxima are arithmetically encoded.The transmission of information takes place in a bitplane-wise manner starting from the most significant bit (MSB) to the least significant bit (LSB).Within each bitplane, subbands are encoded in a predefined scanning order from the lowest to the highest resolution.
Each subband is divided into a set of blocks.The default block size is (W/2 L+1 ) × (H/2 L+1 ), where W, H are the width and height of the frame, respectively, and L is the maximum level of the wavelet decomposition.For each block, first the coefficients whose most significant bit is on the bitplane currently coded are identified by comparison to a threshold T = 2 n , where n is the index of the bitplane that is being coded.If a coefficient becomes significant, that is, it is found to be greater than or equal to T for the first time, then its sign is coded.This process is often called significance identification [24] and the compressed significance map for a block is termed significance layer.Similarly, the refinement layer is defined as the one containing the nth bitplane of coefficients (in a block) found significant in previous passes.In our coder, refinement layers for the nth bitplane are transmitted immediately after the transmission of significance layers for the same bitplane.Note that each layer contains significant or refinement information for a single block and that the even-tual allocation of layers in descriptions is performed by taking into consideration the fact that the decoding of a layer is possible only when all its predecessor layers in the same block are also included in the description.
The nth bit in the binary representation of a coefficient f in subband B is coded if the maximum coefficient in the subband B is greater than or equal to the current threshold ( The deployment of the above rule reduces drastically the number of coefficients whose significance is tested during the coding of a significance identification layer.For this reason, subband maxima are included in all descriptions.However, in order to further reduce the number of symbols that have to be coded during the layer coding stage, a single bit is initially coded to indicate whether all coefficients in a block are insignificant.A value of "1" of this bit indicates that the block contains no significant coefficients and no further information is coded for this block.
The symbol streams described above are coded using adaptive arithmetic codes [25].The context modelling strategy in [21] is followed for the coding of significance identification layers.Refinement bits are entropy coded using a single adaptive arithmetic model.The max frequency count of the arithmetic coder was set equal to 512 in order to allow fast adaptation of the coder to the statistics of the incoming symbol stream.
In order to apply an efficient redundancy allocation algorithm that takes into account the actual rate-distortion characteristics of the compressed stream, the distortion decrease achieved by the transmission of each bitplane should be calculated [21,26] for each layer.The distortion decrease caused by the transmission of the ith layer is given by where n is the index of the bitplane included in the layer, t is the coefficient index, and c, c denote the original and the reconstructed wavelet coefficients, respectively.Each layer corresponding to a specific block of wavelet coefficients cause different reduction in the distortion.Analytical expressions for the distortion reduction caused by the transmission of layers can be found in [26].Let R i be the number of bits required for the coding of the ith layer.When all pairs (D i , R i ) are determined, the redundancy allocation algorithm can be applied.This is examined in the following sections.

TEMPORAL CORRELATION COMPUTATION
An optimization algorithm should take into consideration the temporal correlation linking adjacent video frames.
Modelling the dependency of adjacent frames in a video sequence is a nontrivial problem.In this paper, in order to deal with this issue, we introduce a temporal correlation coefficient a i , 0 ≤ a i < 1, meant to incorporate the effect of temporal correlation of layer i into the optimization algorithm.Specifically, we assume (a similar conclusion was EURASIP Journal on Applied Signal Processing drawn in [27]) that the distortion reduction in frame m + 1 is a i D i , where m is the frame index.In the same manner, the additional distortion reduction a i D i in frame m + 1 stimulates additional distortion reduction a j (a i D i ) in frame m + 2, a k (a j (a i D i )) in frame m+3 and so on, where a j , a k , . . .are the temporal correlation coefficients for frames m + 1, m + 2, . . .correspondingly.We further assume that a i , a j , a k are approximately equal for all frames in a GOP since the dependency between consecutive frames in the same GOP is not expected to exhibit significant variations.In general, the distortion reduction in frame n caused by the transmission of the ith layer in frame m, m < n, is a n−m i D i .Thus, as the temporal distance n−m between m and n increases the additional distortion reduction decreases exponentially.Assuming that the total number of frames in a GOP is M, the total distortion decrease is given by where a i D i is the distortion reduction caused in the m + 1 frame, a2 i D i is the distortion reduction in the m + 2 frame, and so forth.The above quantity is equivalently written as the sum where the first term is the distortion reduction in the current frame and the second term denotes the distortion reduction in all subsequent frames.If the total distortion reduction caused by the transmission of the ith layer in the mth frame can now be expressed as where D i C i is the cumulative distortion reduction1 that is caused in the subsequent frames due to the higher quality of the current (reference) frame m.Clearly, with this formulation, layers in frames lying in the beginning of a GOP are more important than layers of frames at the end of the GOP since the quality of the former affects the quality of the latter.The coefficients a i , and hence C i , which quantify the impact of the current frame on the quality of subsequent frames were calculated using the methods in [27].

FORMULATION OF THE REDUNDANCY ALLOCATION PROBLEM
In order to address the problem of optimal allocation in MD video coding, it is important to derive expressions for the average video quality at the decoder and the total rate used in terms of the assignment strategy.Although in the experimental results section we consider the average PSNR over the entire sequence, in this section we will attempt to maximize the distortion reduction incurred by each frame of the GOP separately.This simplification will not significantly affect the optimality of the strategy derived here, while it will serve in addressing the problem of optimal assignment in a more rigorous way and in providing useful insight into the optimization procedure.
Let us assume that each frame is coded into L layers, each using R i bits and contributing a reduction of distortion equal to D i relative to the quality of the current frame and C i D i , i = 1, . . ., L, to the quality of the next frames in the GOP, 2  when used for motion compensation for the next frames.We further assume that the curve appearing in Figure 5(a) is concave, namely, This assumption is generally valid for the case of our coder (a curve based on real data is shown in Figure 5(b)).We further note that lower-indexed layers correspond to coarse image information whereas high-indexed layers correspond to detail information.Between adjacent frames, coarse information is much more correlated than detail information.Thus, a i is fully expected to decrease with i.Since C i is obviously a monotone function of a i , this implies that: an observation which is also verified experimentally.This ensures that (7) will still hold, if we replace the D i 's with We wish to encode the initial video sequence into K descriptions, each of which will either provide a coarse reconstruction of the initial sequence by itself or improve a reconstruction based on one of the other descriptions.To this end, for every frame in the GOP we will assign a number of layers to each description in a way so as to maximize the distortion reduction incurred under a limited-rate constraint.We will consider the case of double-description coding (K = 2).The general case is studied in Appendix B.
Let I = {1, . . ., L} denote the set of the possible values that the layer indices may assume.The problem of providing two descriptions for each frame in the GOP is equivalent to assigning a set of layer indices I 1 ⊂ I to the first and a set I 2 ⊂ I to the second description.Subsequently, the two descriptions will be transmitted over two communication links to the decoder.If A k represents the event that description k reaches the decoder and p denotes the probability that each stream is successfully delivered to the decoder (i.e., , four events exist for each frame: no descriptions are delivered.The probability of each of these events may be easily derived if we make the reasonable assumption that the events A 1 and A 2 are independent: ), d(B 0 ) denote, respectively, the distortion reduction at the decoder for the current frame when each of the events B 1 , B 2 , B 12 , and B 0 occurs.Their values may be calculated as Moreover, when at least one of the descriptions arrives at the decoder, the layers common to all descriptions will be used for the motion compensation of the next frame in the GOP, incurring an additional distortion reduction of C i D i for each layer.Let B 1|2 B c 0 denote the event that at least one description reaches the decoder and I ∩ I 1 ∩ I 2 denote the set of indices common to both descriptions.Then, Pr{B 1|2 } = p(2 − p) and the corresponding distortion reduc-tion will be Consequently, the expected distortion reduction, D e (I 1 , I 2 ), incurred at the decoder, when the index-assignment policy (I 1 , I 2 ) is used, will be and after some simple manipulations we arrive at where I (I 1 ∪ I 2 ) \ I ∩ is the set of indices contained in exactly one of the descriptions.
The total rate, R(I 1 , I 2 ), used by the two streams is and may also be expressed as Assuming that the total rate used may not exceed a predefined rate budget R B , our purpose is to identify the indexassignment sets I 1 and I 2 , which do not violate the rate constraint and maximize the expected distortion reduction at the decoder max It is clear from ( 14) and ( 16) that the expected distortion reduction and total rate depend upon the sets I ∩ and I .Furthermore, the factor p in the expected distortion reduction ( 14) may be ignored for the optimization procedure for the sake of simplicity.Therefore, the maximization problem may be rephrased as

Maximization problem
Find disjoint sets I ∩ , I ⊂ I maximizing subject to the constraint The solution of the above problem will yield the optimal sets I ∩ and I , where I ∩ will contain the indices of the layers assigned to both streams and I will contain the indices assigned only to one of the streams.In order to obtain the optimal I 1 , I 2 , we need to further partition I into two disjoint index-assignment sets, one for each stream.It is clear from (14), however, that any such partition will yield sets I 1 , I 2 , inducing the same expected distortion reduction at the decoder; hence, the partition of I may be arbitrary (we may even assign the whole set I to only one of the streams).However, since balanced MD coding is sought, an acceptable partitioning should result in fairly equal total rates of I 1 and I 2 .In order to achieve this, the indices in I may be ordered in terms of decreasing corresponding rates R i and be assigned alternately to each stream.

COMPLEXITY ANALYSIS
If we were to solve the maximization problem (17) by exhaustively examining all possible realizations of I 1 and I 2 , this would involve 2 2L possibilities, since there are 2 L subsets of the index set I. Clearly, the optimal solution will be achieved by choosing any pair of sets I 1 and I 2 resulting in the same sets I * ∩ and I * , which solve the maximization problem described by (18) and (19).Hence, we only need to examine all possible realizations of disjoint sets I ∩ , I ⊂ I.
Note that since there are 2 L possible subsets of the index set I, any subset A ⊂ I may be expressed as the binary max D = 0 (maximum distortion originally 0) I * ∩ = I * = 0 (optimal sets originally empty) for I ∩ = 0, . . ., 2 L − 1 (all possible realizations of I ∩ ) for I = 0, . . ., 2  (1) and I * (2) .The optimal index assignment is given by 2) .
representation of a number between 0 and 2 L − 1, with the ith bit being 1, if i ∈ A and 0 otherwise.An exhaustive search algorithm which will determine the optimal solution I * ∩ , I * to the maximization problem is shown in Algorithm 1.
Although this algorithm will always produce an optimal solution, the number of possible realizations of I ∩ and I , over which the search will be performed, is 3 L , still prohibitive even for moderate values of L. The NP-completeness of the maximization problem described by (18) and (19) can also be shown by formulating it as an integer (0-1) programming problem as shown in Appendix A.
In view of these remarks, it would be desirable to establish some optimality results that will narrow the number of possible candidate solutions or devise techniques that would search through a smaller set of possible near-optimal solutions.To this end, the following will prove helpful.Lemma 1.If I ∩ and I are fixed and j ∈ I ∩ or j ∈ I , replacing layer j with layers of higher indices, such that their total rate does not exceed R j , would result in smaller expected distortion reduction.
Proof.Assume that j ∈ I ∩ (the proof for j ∈ I is similar) and j 1 , . . ., If I ∩ is replaced by the set I ∩ (I ∩ \ { j}) ∪ { j 1 , . . ., j k }, then the rate constraint (19) would still be satisfied and the expected distortion reduction (18) would decrease by Using and (20) it is straightforward to show that the outcome of ( 21) is nonnegative; hence, this replacement would prove inefficient.
The same also holds if we were to replace more than one lower-indexed layers with higher-indexed ones of smaller total rate.In other words, Lemma 1 suggests that, if possible (i.e., if the rate constraint is not violated), we should replace higher-indexed layers with lower-indexed ones with appropriate total rate.However, Lemma 1 might mislead us to assume that the optimal solution would consist of sets I * ∩ and I * comprising the lower-indexed layers, that is, This would not be true in case the rate margin R M R B − 2 i∈I∩ R i − i∈I R i can be filled by replacing one (or more) of the lower-indexed layers j with one or more higherindexed layers It is possible that in this case the resulting expected distortion reduction actually be larger, as shown in the example below.
Counterexample 1.Let R B = 21.5, p = 0.8, C i = 0, i = 1, . . ., L, and R i , D i given by the following table: ) resulting in total rate 20.5 and expected distortion reduction 2.61.There is, however, a rate margin R M = R B − 20.5 = 1 that may be taken advantage of, if I ∩ or I is properly chosen.In fact, if the sets I ∩ = {2, 4} and I = {1, 4, 5} are used, the total rate matches the rate budget R B and the expected distortion reduction increases slightly to 2.62.This counterexample verifies that the optimal solution will not always be of the form (22); however, extensive experimentation showed that in most cases the sets I ∩ and I given by ( 22) provide a near-optimal solution, as was indeed the case in the previous example.
An improved exhaustive search algorithm, which stems from this remark, would consider only sets I ∩ , I of the form (22).The number of possible candidates may be further reduced based on the following lemmas.

Lemma 2. L * cannot exceed any certain value beyond which the sum
Proof.This lemma is a direct consequence of the total rate constraint (19) for L * ∩ = 0.
Lemma 3. L * cannot be smaller than any value for which the sum max D = 0 (maximum distortion originally 0) L * ∩ = L * = 0 (optimal sets originally empty) Proof.
Lemma 4. For a given L * , the optimal value of L * ∩ is the largest integer l ≤ L * , for which the total rate for I ∩ does not exceed the remaining available rate, Proof.It is straightforward to prove that the more layers I ∩ comprises, the better the distortion reduction will be.Therefore, we should try to "fit" as many layers as possible in the remaining available rate.
Lemmas 2-4 may be used to narrow down the exhaustive search space.In particular, Lemmas 2 and 3 suggest that we should examine values of L * , in a set {L 1 , . . ., L 2 }, while Lemma 4 suggests that for each of these values of L * there is a unique optimal value of L * ∩ ; hence, it suffices to examine only L 2 − L 1 + 1 < L cases.In view of these results, we can describe the improved exhaustive search procedure in Algorithm 2.
The while loop in this algorithm searches for the maximum value of L ∩ fitting in the rate margin, since, as can be easily verified, the corresponding value of L ∩ for L + 1 will be smaller than that for L (the previous value of L ∩ ).Hence, the search is performed over L 2 − L 1 + 1 possible values of L * and L 1 possible values of L * ∩ and the complexity of the algorithm will be linear in L.
In general, the improved exhaustive search algorithm will result in sets I * ∩ and I * , which do not exactly meet the rate constraint.In this case, there will be a rate margin R M R B − 2 i∈I * ∩ R i − i∈I * R i , which can be "filled" with smaller segments outside I * ∩ or I * .A further improvement would search for possible augmentations of I * ∩ or I * , so that the total rate be closer to the rate budget R B .
As already stated, this algorithm will, in general, yield suboptimal yet near-optimal solutions to the maximization problem.A further (and more important) disadvantage of this algorithm is that, when applied in the general case of K > 2 descriptions, its complexity will be even higher.If we are to construct a low-complexity algorithm for the general case, we may resort to heuristics emanating from a continuous-case consideration of the problem.This is explored in the next section.

EQUIVALENT CONTINUOUS PROBLEM
By examining closely the discrete maximization problem described by ( 18) and ( 19), we first note that the sums i∈I∩ D i (1 + C i ), i∈I∩ R i and i∈I D i , i∈I R i are the distortion reduction and rate "measures" of I ∩ and I respectively.A further restriction arises from the requirement that I ∩ and I have to comprise intervals dictated by the available blocks and that partial blocks may not be used.If we relax this restriction, we may formulate a corresponding Continuous Maximization Problem, which is easier to solve.
Assume that the curve appearing in Figure 5 represents a continuous, differentiable, nondecreasing, and concave function D(R) of the rate R. Then the derivative D (R) will be a well-defined, continuous, positive, and decreasing function of R, for every R ∈ R + .In a similar fashion, assume that the fraction of distortion reduction due to motion compensation is provided by a continuous decreasing function c(R) and that the curve corresponding to the products D i C i defines a function C(R) with derivative C (R) = D (R)c(R), which will have properties similar to those of D (R). 3 For any rate interval [r 1 , r 2 ], let μ R , μ D , μ C denote the following quantities: In practice, the number of intervals of the form [r 1 , r 2 ] is always finite (with an upper bound equal to the number of bits in the compressed bitstream).Obviously, measure of a union of a finite number of disjoint intervals of the form [r 1 , r 2 ] would equal the sum of the measures of these intervals.Thus, a continuous version of the discrete maximization problem described by ( 18) and ( 19) may now correspondingly be formulated as follows.

Continuous maximization problem
Find disjoint sets S ∩ , S ⊂ R + maximizing subject to the constraint With the further reasonable assumption that S ∩ and S are unions of closed intervals, properties stronger than Lemma 1 may be established for the continuous problem, leading to optimal solutions.Lemma 5.If S is fixed, the optimal S ∩ comprises the "smallest-rate region" of the remaining space R + \ S , that is, for some positive rate R ∩ .
Proof.We will outline the general concept behind (26).Assume that ( 26) does not hold.Then there exist δ > 0 and (remove the second interval and add the first), then the rate constraint will still be met and the increase in expected distortion reduction (24) will be where (α) results from r 2 − r 1 > 0 and the fact that D (•) and c(•) are decreasing and (β) involves a simple change of integration variable.It follows, therefore, that S ∩ will not be optimal (since it is outperformed by S ∩ ) unless it is given by ( 26) for some R ∩ .
In a similar manner, it is possible to establish an equivalent property for S .Lemma 6.If S ∩ is fixed, the optimal S comprises the "smallest-rate region" of the remaining space R + \ S ∩ , that is, Furthermore, concavity of D(•) implies the following.
Proof.This is true because the contribution of S ∩ in the expected distortion reduction (24) involves the factor 2 − p > 1 and the function C(R) ≥ D(R), R ∈ R + .Hence, incorporating the smaller-rate interval [r 1 , r 1 + δ] in S ∩ and the higherrate interval [r 2 , r 2 + δ] in S will yield smaller expected distortion, as is easily be verified.
Lemmas 5, 6, and 7 suggest that the jointly optimal sets S * ∩ , S * will be intervals of the form for some R ≥ R ∩ ≥ 0.
In terms of the original maximization problem, (28) would provide the optimal solution if the (0-1) constraint for x is relaxed, namely, if assignment of partial blocks is allowed.
In view of ( 28), the equivalent continuous problem may be restated as follows.

Continuous maximization problem
This is a simple Lagrangian maximization problem with optimal solution R * ∩ , R * R satisfying the constraint (30) at the boundary.The optimal R * ∩ should satisfy which after some simple manipulations translates to the condition Observe that, since D (•) and c(•) are decreasing, φ(•) will be continuous and increasing in the interval [0, R B /2] and the continuous maximization problem will not involve local maxima.Also, the smallest value of φ(•) will be φ(0 0) and the largest value will be the optimal value for R * ∩ will be φ −1 (1 − p).Otherwise (32) does not have a solution and optimality is achieved either at 0 or R B /2.In general, we can write while R * = R B − R * ∩ .Returning to the discrete maximization problem, it is reasonable to assume that a near-optimal solution will resemble that of the equivalent continuous maximization problem, especially for large values of L. This means that a nearoptimal choice for the index assignment sets would be I ∩ = {1, . . ., L * ∩ }, I = {L * ∩ + 1, . . ., L * }, where L * ∩ and L * would be such that This consideration suggests Algorithm 3 above.The advantage of this algorithm lies in that it involves fewer calculations and terminates sooner that the improved exhaustive search algorithm.It is clear, however, that the price paid for its reduced complexity, which is important in cases of real-time applications, is its inferior performance compared to the exhaustive search algorithms.
Let us also note that the implementation of the fast search algorithm involves serial search through all values from 0 to the terminating, estimated optimal, value of L * ∩ .A further improvement would involve a binary search modification of this algorithm, according to the actual values of φ(L ∩ , L ) at the boundaries of the binary-search interval.

EXPERIMENTAL RESULTS
The proposed multiple description video coding scheme was experimentally evaluated for the transmission of the Y component (15 frames/second) of the standard test sequence "Foreman" over two channels.Each frame was coded in two descriptions.Motion vector information was duplicated in both descriptions.The proposed redundancy allocation Algorithm 3 of the preceding section was applied for video transmission over two channels of total capacity 128 Kbps and for three different probabilities of description arrival: p = 0.8, 0.9, 0.95, or equivalently three probabilities of description loss equal to 20%, 10%, 5%.The number of frames in each GOP was chosen with respect to p as suggested in [28].The target rate R B for each frame was determined by allocating to intra-frames a rate equal to four times the rate allocated to interframes.The resulting descriptions, as shown in Table 1 for the first five frames of the sequence, are remarkably "balanced," that is, they have approximately equal size and yield almost equal reconstruction qualities.
In the present work, we assume that descriptions that arrive at the decoder do not contain bit errors.We examine two types of transmission scenarios: in the first scenario, we assume that the channels retain their status during the entire transmission.In this case, the parameter p serves as a means to control the redundancy and is not directly associated with the condition of the channel.In the second scenario, we assume that the channels go on and off during transmission.In the latter scenario, it is possible that both descriptions of a frame are lost.In such a case, the decoder uses the most recent reference frame that is available.For each frame, the peak-signal-to-noise-ratio is used as a measure of the reconstruction quality (in dB) Following the approach adopted in [29,30], the reported mean PSNR values are computed by averaging decoded MSE values and then converting the mean MSE to the corresponding PSNR value rather than averaging the PSNR values directly.
In the first transmission scenario, the coding of the "Foreman" sequence into two descriptions is simulated under the respective assumption that the channels are available or unavailable during the entire transmission.As expected, the central distortion in the proposed scheme that allows drift accumulation, which we will term multiple description wavelet video coder (MDWVC), is superior in comparison to the proposed drift-free system, termed DF-MDWVC.This was expected since when both descriptions are available, drift is eliminated anyway.On the other hand, the side distortion appears to be lower in the drift-free system.The performance of MDWVC is shown in Figure 6.The redundancy rate-distortion performance of our coders is shown in Figure 7.As seen, DF-MDWVC and MDWVC reach similar performances for redundancy greater than 15%.For lower redundancies, the drift-free system performs worse due to the very low quality of the reference frames.
In the second simulation, in which the channels may go on and off from frame to frame, we tested our systems under identical description loss patterns.For each frame, one, two, or none of the descriptions was lost.As seen from Figure 8 and Tables 2 and 3, the drift-free system is much more reliable and demonstrates no abrupt changes in its performance, contrary to MDWVC which demonstrates significant variations in the video quality it delivers.In addition, both schemes demonstrate significant gains over the single description scheme which appears to collapse very frequently due to description losses.In Figure 8(d), we report the performance of a scheme that is based on H.264 and uses the FMO for transmission of video over two channels.This scheme uses P-frames and two FMO slices.As seen, despite the fact that the H.264-based scheme uses advanced error concealment techniques at the decoder, the reconstruction quality it delivers exhibits significant variations in comparison to the quality achieved by our drift-free scheme.
Reconstructed frames obtained by simulating the transmission of 180 frames of the "Foreman" sequence at 15 frames/second over two channels of total capacity 128 Kbps and probability of description arrival equal to 0.9 using the above systems are displayed in Figure 9.The reconstruction displayed in Figure 9(c), achieved using the driftfree system, is qualitatively more pleasant than the reconstruction using MDWVC.This proves that, in practical cases, the drift-free system can be a better choice even though MDWVC operates better at low error rates.The image reconstructed using the single description scheme exhibits the worst performance.
In Figure 10, we present the reconstruction quality obtained using the drift-free system for the case of transmission over four channels of total capacity 128 Kbps and probabilities of description loss equal to 20%.

CONCLUSIONS
We presented a wavelet-based framework for the encoding of video in multiple descriptions.The generation of multiple descriptions was performed so that drift is eliminated at the decoder side.The proposed framework is flexible and allows the encoding of video into an arbitrary number of descriptions.The resulting framework is endowed with the capability for drift-free reconstruction regardless of the number of descriptions that arrived at the decoder.Three algorithms were also presented for the optimal allocation of

APPENDICES A. INTEGER (0-1) PROGRAMMING FORMULATION
Then, the sets I ∩ and I are determined by the vectors x ∩ [x ∩ 1 , . . ., x ∩ L ] T and x [x 1 , . . ., x L ] T , respectively, where A T denotes the transpose of matrix A. If we adopt this notation, (18) may be written as and constraint (19) as with r [R 1 , . . ., R L ] T .Property I ∩ ⊂ I may be written as where 1 L is the L × 1 unity vector and inequalities involving vectors are meant in the percomponent sense.
In order to find the optimal solution, it suffices to find binary-valued vectors x ∩ and x minimizing (A.1) subject to the constraints (A.2) and (A.3).This is an integer (0-1) programming problem and can be formulated by defining  where I L is the L × L identity matrix, x and d are 2L × 1 vectors, C is a (L + 1) × L matrix, and b is a (L + 1) × 1 vector.In view of these definitions, the maximization problem may be expressed as an integer-programming problem.

Integer (0-1) programming problem
Find (0-1)-valued vector x such that Although several techniques exist for the solution of integer-programming problems, it is well known that integer-programming problems are, in general, NP-complete and, most of the times, exhaustive search over all possible realizations of binary-valued vector x is the only procedure that guarantees optimal solution.Even if a cutting-plane or branch-and-bound technique is used, it does not guarantee that the number of operations will be less than exponential in L.

B. THE GENERAL MULTIPLE DESCRIPTION PROBLEM
In the general case, the original frame comprises L layers and we need to form K ≥ 2 descriptions so that a rate constraint is met and the expected distortion reduction at the decoder is maximized.Conforming to the notation used for the double-description case, we define the index sets I k , k = 1, . . ., K, where each I k describes the assignment of layers to description k, and the events A k = {Description k reaches the decoder}, k = 1, . . ., K.
The index-assignment sets I k , k = 1, . . ., K define 2 K disjoint subsets of the index set I = {1, . . ., L}, which can be written as where the subscript x = [x 1 , . . ., x K ] T is a (K × 1) binaryvalued vector and For every x ∈ {0, 1} K , the set J x comprises the indices belonging to the sets I j with x j = 1.The original indexassignment sets I k , k = 1, . . ., K can then be expressed in terms of the collection { J x } x∈{0,1} K as x k denote the weight of the binaryvalued vector x and for every index set representing the total rate and distortion reduction of the layers with indices in A. The total rate sent to the decoder can be expressed as where (α) comes from (B.3) and the fact that the sets J x are mutually disjoint and we can derive (β) by observing that each sum i∈ Jx R i appears exactly w(x) times in the previous expression of the total rate.For a given x ∈ {0, 1} K , assume that x j1 = • • • = x jw(x) = 1 and the rest are zero.In order to express the expected distortion reduction at the decoder in terms of the collection { J x } x∈{0,1} K , we observe that the distortion at the decoder will improve by D( J x ) (layers with indices in J x will be used) whenever the event A x {description j 1 description j 2 or • • • description j w(x) is delivered} occurs, that is,  of the descriptions reaches the decoder) whose probability is 1 − (1 − p) K .Therefore, the overall expected distortion reduction at the decoder will be At this juncture, observe that both the total rate (B.5) and the expected distortion reduction (B.8) can be expressed as linear functions of the {R( J x )} x∈{0,1} K and {D( J x )} x∈{0,1} K , respectively, with coefficients depending only on the weight of the index vector x.Therefore, we can group all sets J x with the same weight and define the new (fewer) sets J k = x∈{0,1} K :w(x)=k J x , k = 0, . . ., K, (B.9) each set J k containing the layer indices assigned to exactly k descriptions.Also, observe that the set J 0 = J 0 has a zero coefficient in both (B.5) and (B.8); hence, it does not contribute to the total rate or expected distortion reduction.By reformulating (B.5) and (B.8), the maximization problem for the general multiple description case may be stated as follows.

General maximization problem
Find disjoint sets J 1 , . . ., J K ⊂ I maximizing D J 1 , . . ., J K = 1 − (1 − p) K  The integer-programming formulation of the general maximization problem would involve K binary-valued L × 1 vectors x k , k = 1, . . ., K, with and the requirement that the J k , k = 1, . . ., K be disjoint can be written as Let us define where x and d K are KL × 1 vectors, C K is a (L + 1) × KL matrix, b K is a (L + 1) × 1 vector and the L × 1 vectors r, d, c are those defined in the double-description integerprogramming formulation.Then, the integer-programming formulation of the general multiple description problem will be as follows.

General integer (0-1) programming problem
Find (0-1)-valued vector x such that max d As is clear from the integer-programming formulation, the complexity of the general maximization problem may be as high as 2 KL .Heuristics similar to those proposed for the double-description case may be used for an estimate of the optimal index-assignment scheme, based on the general equivalent continuous problem, which can be easily formulated from (B.10) and (B.11).It is reasonable to conjecture that the heuristics stemming from the equivalent continuous general maximization problem will provide solutions deviating from the optimal one even more as K increases.

Figure 2 :
Figure 2: Block diagram of the decoder.

Figure 3 :
Figure 3: (a) Assignment of the blocks of a wavelet representation for the case of two descriptions.The bitstreams corresponding to the blocks may be included in one or more descriptions.(b) Representation of redundant and nonredundant part of the stream for the case of two descriptions.

Figure 5 :
Figure 5: (a) Comprising layers and induced distortion reduction, (b) distortion reduction as a function of rate for a frame of "Akiyo" using the source coder of Section 3.

Figure 8 :
Figure 8: Reconstruction quality for the "Foreman" sequence when the channels go on and off during transmission and a probability of error equal to (a) 5%, (b) 10%, (c) 20%, and (d) transmission based on H.264 using flexible macroblock ordering.

6 )= 1 = 1 −
Assuming that the events A k , k = 1, . . ., K are independent and Pr{Ak } = 1 − Pr{A c k } = p, we can calculate its probability Pr A x = 1 − Pr A c x (1 − p) w(x) .(B.7)If we also define C(A) i∈A D i C i for A ⊂ I, the distortion reduction due to motion compensation based on the layers common to all descriptions will be C( J 1K ), 1 K being the (K × 1) unity vector.The distortion reduction due to motion compensation is conditional on the event A 1K (at least one EURASIP Journal on Applied Signal Processing

Figure 9 :
Figure9: Reconstructed frame for the transmission of the "Foreman" sequence, p = 0.9, over two channels of total capacity 128 Kbps: (a) original "Foreman" frame, (b) reconstructed using the coder without drift control (25.84 dB), (c) reconstructed using the drift-free coder (28.81 dB), and (d) reconstructed using the single description coder (25.78 dB).

Figure 10 :
Figure10: Reconstruction quality obtained using the drift-free system with four descriptions transmitted over channels with probability of loss equal to 20%.

1 −
(1 − p) k D J k (B.10) subject to the constraint R J 1 , . . ., J K = K k=1 kR J k ≤ R B .(B.11) Michael G. Strintzis received the Diploma in electrical engineering from the National Technical University of Athens, Athens, Greece, in 1967 and the M.A. and Ph.D. degrees in electrical engineering from Princeton University, Princeton, NJ, in 1969 and 1970, respectively.He joined the Electrical Engineering Department, University of Pittsburgh, Pittsburgh, PA, where he served as an Assistant Professor from 1970 to 1976 and an Associate Professor from 1976 to 1980.During that time, he worked in the area of stability of multidimensional systems.Since 1980, he has been a Professor of electrical and computer engineering at the Aristotle University of Thessaloniki, Thessaloniki, Greece.He has worked in the areas of multidimensional imaging and video coding.Over the past ten years, he has authored over 100 journal publications and over 200 conference presentations.In 1998, he founded the Informatics and Telematics Institute, currently part of the Centre for Research and Technology Hellas, Thessaloniki.He was awarded the Centennial Medal of the IEEE in 1984 and the Empirikeion Award for Research Excellence in Engineering in 1999.
L − 1 (all possible realizations of I ) if I ∩ AND I = 0 (check if sets are disjoint) into two fairly equal-rate subsets I * *

Table 1 :
Description size (bytes) ratio and ratio of the two descriptions, for several frames of the sequence "Foreman" (p = 0.9, R total = 128 Kbps).

Table 3 :
Performance comparison.Standard deviation of reconstruction quality is reported.