Optimal Erasure Protection Assignment for Scalable Compressed Data with Small Channel Packets and Short Channel Codewords

We are concerned with the e ﬃ cient transmission of scalable compressed data over lossy communication channels. Recent works have proposed several strategies for assigning optimal code redundancies to elements in a scalable data stream under the assumption that all elements are encoded onto a common group of network packets. When the size of the data to be encoded becomes large in comparison to the size of the network packets, such schemes require very long channel codes with high computational complexity. In networks with high loss, small packets are generally more desirable than long packets. This paper proposes a robust strategy for optimally assigning elements of the scalable data to clusters of packets, subject to constraints on packet size and code complexity. Given a packet cluster arrangement, the scheme then assigns optimal code redundancies to the source elements subject to a constraint on transmission length. Experimental results show that the proposed strategy can outperform previously proposed code redundancy assignment policies subject to the above-mentioned constraints, particularly at high channel loss rates.


INTRODUCTION
In this paper, we are concerned with reliable transmission of scalable data over lossy communication channels.For the last decade, scalable compression techniques have been widely explored.These include image compression schemes, such as the embedded zerotree wavelet (EZW) [1] and set partitioning in hierarchical trees (SPIHT) [2] algorithms and, most recently, the JPEG2000 [3] image compression standard.Scalable video compression has also been an active area of research, which has recently led to MPEG-4 fine granularity scalability (FGS) [4].An important property of a scalable data stream is that a portion of the data stream can be discarded or corrupted by a lossy communication channel without compromising the usefulness of the more important portions.A scalable data stream is generally made up of several elements with various dependencies such that the loss of a single element might render some or all of the subsequent elements useless but not the preceding elements.
For the present work, we focus our attention on "erasure" channels.An erasure channel is one whose data, prior to transmission, is partitioned into a sequence of symbols, each of which either arrives at the destination without error, or is entirely lost.The erasure channel is a good model for modern packet networks, such as Internet protocol (IP) and its adoption, general packet radio services (GPRS), into the wireless realm.The important elements are the network's packets, each of which either arrives at the destination or is lost due to congestion or corruption.Whenever there is at least one bit error in an arriving packet, the packet is considered lost and so discarded.A key property of the erasure channel is that the receiver knows which packets have been lost.
In the context of erasure channels, Albanese et al. [5] pioneered an unequal error protection scheme known as priority encoding transmission (PET).The PET scheme works with a family of channel codes, all of which have the same codeword length N, but different source lengths, 1 ≤ k ≤ N. We consider only "perfect codes" which have the key property that the receipt of any k out of the N symbols in a codeword is sufficient to recover the k source symbols.The amount of redundancy R N,k = N/k determines the strength of the code, where smaller values of k correspond to stronger codes.We define a scalable data source to consist of groups of symbols, each of which is referred to as the "source element" Ᏹ q having L q symbols.Although in our experiment, each symbol corresponds to one byte, the source symbol is not restricted to a particular unit.Given a scalable data source consisting of source elements Ᏹ 1 , Ᏹ 2 , . . ., Ᏹ Q having uncoded lengths L 1 , L 2 , . . ., L Q and channel code redundancies R N,k1 ≥ R N,k2 ≥ • • • ≥ R N,kQ , the PET scheme packages the encoded elements into N network packets, where source symbols from each element Ᏹ q occupy k q packets.This arrangement guarantees that the receipt of any k packets is sufficient to recover all elements Ᏹ q with k q ≤ k.The total encoded transmission length is q L q R N,kq , which must be arranged into N packets, each having a packet size of S bytes.Figure 1 shows an example of arranging Q = 4 elements into N = 5 packets.Consider element Ᏹ 2 , which is assigned a (5, 3) code.Since k 2 = 3, three out of the five packets contain the source element's L 2 symbols.The remaining N − k 2 = 2 packets contain parity information.Hence, receiving any three packets guarantees recovery of element Ᏹ 2 and also Ᏹ 1 .
Given the PET scheme and a scalable data source, several strategies have been proposed to find the optimal channel code allocation for each source element under the condition that the total encoded transmission length is no greater than a specified maximum transmission length L max = NS [6,7,8,9,10,11,12].The optimization objective is an expected utility U which must be an additive function of the source elements that are correctly received.That is, where U 0 is the amount of utility at the receiver when no source element is received and P N,kq is the probability of recovering element Ᏹ q , which is assigned an (N, k q ) code.This probability equals the probability of receiving at least k q out of N packets for k q > 0. If a source element is not transmitted, we assign the otherwise meaningless value of k q = 0 for which R N,kq = 0 and P N,kq = 0.As an example, for a scalable compressed image, −U might represent the mean square error (MSE) of the reconstructed image, while U q is the amount of reduction in MSE when element Ᏹ q is recovered correctly.In the event of losing all source elements, the reconstructed image is "blank" so −U 0 corresponds to the largest MSE and is equal to the variance of the original image.The term U 0 is included only for completeness; it plays no role in the intuitive or computational aspects of the optimization problem.
Unfortunately, these optimization strategies rely upon the PET encoding scheme.This requires all of the encoded source elements to be distributed across the same N packets.Given a small packet size and large amount of data, the encoder must use a family of perfect codes with large values of N. For instance, transmitting a 1 MB source using ATM cells with a packet size of 48 bytes requires N = 21, 000.This imposes a huge computational burden on both the encoder and the decoder.
In this paper, we propose a strategy for optimally assigning code redundancies to source elements under two constraints.One constraint is transmission length, which limits the amount of encoded data being transmitted through the channel.The second constraint is the length of the channel codewords.The impact of this constraint depends on the channel packet size and the amount of data to be transmitted.In Sections 2 and 3, we explore the nature of scalable data and the erasure channel model.We coin the term "cluster of packets" (COP) to refer to a collection of network packets whose elements are jointly protected according to the PET arrangement illustrated in Figure 1.Section 4 reviews the code redundancy assignment strategy under the condition that all elements are arranged into a single COP; accordingly, we identify this as the "UniCOP assignment" strategy.
In Section 5, we outline the proposed strategy for assigning source elements into several COPs, each of which is made up of at most N channel packets, where N is the length of the channel codewords.Whereas packets are encoded jointly within any given COP, separate COPs are encoded independently.The need for multiple COPs arises when the maximum transmission length is larger than the specified COP size, NS.We use the term "MultiCOP assignment" when referring to this strategy.Given arrangement of source elements into COPs together with a maximum transmission length, we find the optimal code redundancy R N,k for each source element so as to maximize the expected utility U. Section 6 provides experimental results in the context of JPEG2000 data streams.

SCALABLE DATA
Scalable data is composed of nested elements.The compression of these elements generally imposes dependencies among the elements.This means that certain elements cannot be correctly decoded without first successfully decoding certain earlier elements.Figure 2 provides an example of dependency structure in a scalable source.Each "column" of elements Ᏹ 1,y , Ᏹ 2,y , . . ., Ᏹ X,y has a simple chain of dependencies, which is expressed as Ᏹ 1,y ≺ Ᏹ 2,y ≺ • • • ≺ Ᏹ X,y .This means that the element Ᏹ 1,y must be recovered before the information in element Ᏹ 2,y can be used and so forth.Since each column depends on element Ᏹ 0 , this element must be recovered prior to the attempt to recover the first element of every column.There is, however, no dependency between the columns, that is, Ᏹ x,y ⊀ Ᏹ x, ȳ and Ᏹ x,y Ᏹ x, ȳ , y = ȳ.Hence, the elements from one column can be recovered without having to recover any elements belonging to other columns.An image compressed with JPEG2000 serves as a good example, since it can have a combination of dependent and independent elements.Dependencies exist between successive "quality layers" within the JPEG2000 data stream, where an element which contributes to a higher quality layer cannot be decoded without first decoding elements from lower quality layers.JPEG2000 also contains elements which exhibit no such dependencies.In particular, subbands from different levels in the discrete wavelet transform (DWT) are coded and represented independently within the data stream.Similarly, separate colour channels within a colour image are also coded and represented independently within the data stream.
Elements of the JPEG2000 compressed data stream form a tree structure, as depicted in Figure 2. The data stream header becomes the "root" element.The "branches" correspond to independently coded precincts, each of which is decomposed into a set of elements with linear dependencies.

CHANNEL MODEL
The channel model we use is that of an erasure channel, having two important properties.One property is that packets are either received without any error or discarded due to corruption or congestion.Secondly, the receiver knows exactly which packets have been lost.We assume that the channel packet loss process is i.i.d., meaning that every packet has the same loss probability p and the loss of one packet does not influence the likelihood of losing other packets.To compare the effect of different packet sizes, it is useful to express the probability p in terms of a bit error probability or bit error rate (BER) .To this end, we will assume that packet loss arises from random bit errors in an underlying binary symmetric channel.The probability of losing any packet with size S bytes is then p = 1 − (1 − ) 8S .The probability of receiving at least k out of N packets with no error is then Figure 3 shows an example of the relationship between P N,k and R N,k for the case p = 0.3.Evidently, P N,k is monotonically increasing with R N,k .Significantly, however, the curve is not convex.It is convenient to parametrize P N,k and R N,k by a single parameter and to assume N implicitly for simpler notation so that for r = 1, . . ., N. It is also convenient to define The parameter r is more intuitive than k since r increases in the same direction as P(r) and R(r).The special case r = 0 means that the relevant element is not transmitted at all.

UNICOP ASSIGNMENT
We review the problem of assigning an optimal set of channel codes to the elements of a scalable data source, subject to the assumption that all source elements will be packed into the same set of N channel packets, where N is the codeword length.The number of packets N and packet size S are fixed.This is the problem addressed in [6,7,8,9,10,11,12], which we identified earlier as the UniCOP assignment problem.Puri and Ramchandran [6] provided an optimization technique based on the method of Lagrange multipliers to find the channel code allocation.Mohr et al. [7] proposed a local search algorithm and later a faster algorithm [8] which is essentially a Lagrangian optimization.Stankovic et al. [11] also presented a local search approach based on a fast iterative algorithm, which is faster than [8].All these schemes assume that the source has convex utility-length characteristic.Stockhammer and Buchner [9] presented a dynamic programming approach which finds an optimal solution for convex utility-length characteristics.However, for general utility-length characteristics, the scheme is close to optimal.Dumitrescu et al. [10] proposed an approach based on a global search, which finds a globally optimal solution for both convex and nonconvex utility-length characteristics with similar computation complexity.However, for convex sources, the complexity is lower since it need not take into account the constraint from the PET framework that the amount of channel code redundancy must be nonincreasing.
The UniCOP assignment strategy we discuss below is based on a Lagrangian optimization similar to [6].However, this scheme not only works for sources with convex utility-length characteristic but also applies to general utility-length characteristics.Unlike [10], the complexity in both cases is about the same and the proposed scheme does not need to explicitly include the PET constraint since the solution will always satisfy that constraint.Most significantly, the UniCOP assignment strategy presented here serves as a stepping stone to the "MultiCOP assignment" in Section 5, where the behaviour with nonconvex sources will become important.Suppose that the data source contains Q elements and each source element Ᏹ q has a fixed number of source symbols L q .We assume that the data source has a simple chain of dependencies This dependency will in fact impose a constraint that the code redundancy of the source elements must be nonincreasing, that the recovery of the element Ᏹ q guarantees the recovery of the elements Ᏹ 1 to Ᏹ q−1 .Generally, the utility-length characteristic of the data source can be either convex or nonconvex.To impart intuition, we begin by considering the former case in which the source utility-length characteristic is convex, as illustrated in Figure 4.That is, We will later need to consider nonconvex utility-length characteristics when extending the protection assignment algorithm to multiple COPs even if the original source's utilitylength characteristic was convex.Nevertheless, we will defer the generalization to nonconvex sources for the moment until Section 4.2 so as to provide a more accessible introduction to ideas.

Convex sources
To develop the algorithm for optimizing the overall utility U, we temporarily ignore the constraint r 1 ≥ • • • ≥ r Q , which arises from the dependence between source elements.We will show later that the solution we obtain will always satisfy this constraint by virtue of the source convexity.Our optimization problem is to optimize the utility function given in (1), Example of convex utility-length characteristic for a scalable source consisting of four elements with a simple chain of dependencies.
subject to the overall transmission length constraint This constrained optimization problem may be converted to a family of unconstrained optimization problems parametrized by a quantity λ > 0. Specifically, let U (λ) and L (λ) denote the expected utility and transmission length associated with the set {r (λ) q } 1≤q≤Q , which maximize the functional We omit the term U 0 since it only introduces an offset to the optimization expression and hence does not impact its solution.Evidently, it is impossible to increase U beyond U (λ) without also increasing L beyond L (λ) .Thus if we can find λ such that L (λ) = L max , the set {r (λ) q } will form an optimal solution to our constrained problem.In practice, the discrete nature of the problem may prevent us from finding a value λ such that L (λ) is exactly equal to L max , but if the source elements are small enough, we should be justified in ignoring this small source of suboptimality and selecting the smallest value of λ such that L (λ) ≤ L max .The unconstrained optimization problem decomposes into a collection of Q separate maximization problems.In particular, we seek r (λ)  q which maximizes Figure 5: Elements of a convex hull set are the vertices { j 0 , j 1 , . . ., j 5 } which lie on the convex hull of the P(r) versus R(r) characteristic.
for each q = 1, 2, . . ., Q. Equivalently, r (λ) q is the value of r that maximizes the expression where λ q = λL q /U q .This optimization problem arises in other contexts, such as the optimal truncation of embedded compressed bitstreams [13,Section 8.2].It is known that the solution r (λ) q must be a member of the set Ᏼ C which describes the vertices of the convex hull of the P(r) versus R(r) characteristic [13, Section 8.2], as illustrated in Figure 5.Then, if 0 = j 0 < j 1 < • • • < j I = N is an enumeration of the elements in Ᏼ C , and are the "slope" values on the convex hull, then S C (0 The solution to our optimization problem is obtained by finding the maximum value of j i ∈ Ᏼ C , which satisfies Specifically, Given λ, the complexity of finding a set of optimal solutions q } is ᏻ(IQ).Our algorithm first finds the largest λ such that L (λ) < L max and then employs a bisection search to find λ opt , where L (λ opt ) L max .The number of iteration required to search for λ opt is bounded by the computation precision, and the bisection search algorithm typically requires a small number of iterations to find λ opt .In our experiments, the number of iterations is typically fewer than 15, which is usually much smaller than I or Q.It is also worth noting that the number of iterations required to find λ opt is independent of other parameters, such as the number of source elements Q, the packet size S, and the codeword length N.
All that remains now is to show that this solution will always satisfy the necessary constraint To this end, observe that our source convexity assumption implies that L q /U q ≤ L q+1 /U q+1 so that (15)

Nonconvex sources
In the previous section, we restricted our attention to convex source utility-length characteristics, but did not impose any prior assumption on the convexity of the P(r) versus R(r) channel coding characteristic.As already seen in Figure 3, the P(r) versus R(r) characteristic is not generally convex.We found that the optimal solution is always drawn from the convex hull set Ᏼ C and that the optimization problem amounts to a trivial element-wise optimization problem in which r (λ)  q is assigned to the largest element j i ∈ Ᏼ C whose slope S C (i) is no smaller than λL q /U q .
In this section, we abandon our assumption on source convexity.We begin by showing that in this case, the optimal solution involves only those protection strengths r which belong to the convex hull Ᏼ C of the channel code's performance characteristic.We then show that the optimal protection assignment depends only on the convex hull of the source utility-length characteristic and that it may be found using the comparatively trivial methods previously described.

Sufficiency of the channel coding convex hull Ᏼ C
Lemma 1. Suppose that {r (λ) q } 1≤q≤Q is the collection of channel code indices which maximizes J (λ) subject to the ordering constraint r (λ) q ∈ Ᏼ C for all q.More precisely, whenever there is r q / ∈ Ᏼ C yielding J(λ), there is always another r q ∈ Ᏼ C , which yield J (λ) ≥ J(λ).
For convenience, we define j I+1 = ∞ so that the last of these sets Ᏺ I is well defined.The objective of the proof is to show that all of these sets Ᏺ i must be empty.To this end, suppose that some Ᏺ i is nonempty and let r1 < r2 • • • < rZ be an enumeration of its elements.For each rz ∈ Ᏺ i , let Ūz and Lz be the combined utilities and lengths of all source elements which were assigned r (λ)  q = rz .That is, For each z < Z, we could assign the alternate value of rz+1 to all of the source elements with r (λ) q = rz without violating the ordering constraint on r(λ) q .This adjustment would result in a net increase in J (λ) of By hypothesis, we already have the optimal solution, so this alternative must be unfavourable, meaning that Similarly, for any z ≤ Z, we could assign the alternate value of rz−1 to the same source elements (where we identify r0 with j i for completeness) again without violating our ordering constraint.The fact that the present solution is optimal means that Proceeding by induction, we must have monotonically decreasing slopes It is convenient, for the moment, to ignore the pathological case i = I.Now since rz / ∈ Ᏼ C , we must have as illustrated in Figure 6.So, for any given z ≥ 1, we must have meaning that all of the source elements which are currently assigned r (λ) q = rz could be assigned r (λ) q = j i+1 instead without decreasing the contribution of these source elements to J (λ) .Doing this for all z simultaneously would not violate the ordering constraint, meaning that there is another solution, which is at least as good as the one claimed to be optimal, in which Ᏺ i is empty.
For the case i = I, the fact that r1 / ∈ Ᏼ C and that there are no larger values of r which belong to the convex hull means that (P(r 1 ) − P( j i ))/(R(r 1 ) − R( j i )) ≤ 0 and hence (P(r z ) − P(r z−1 ))/(R(r z )−R(r z−1 )) ≤ 0 for each z.But this contradicts (19) since λ( Lz / Ūz ) is strictly positive.Therefore, Ᏺ I is also empty.
Figure 6: The parameters r1 , . . ., rZ between j i and j i+1 are not part of convex hull points and have decreasing slopes.

Sufficiency of the source convex hull Ᏼ S
In the previous section, we showed that we may restrict our attention to channel codes belonging to the convex hull set, that is, r ∈ Ᏼ C , regardless of the source convexity.In this section, we show that we may also restrict our attention to the convex hull of the source utility-length characteristic.
Then, elements that are assigned at least j 0 = 0 correspond to all the six r's and so t 0 = 6.Similarly, elements that are assigned at least j 1 = 1 correspond to the first five r's and so, t 1 = 5.Performing the same computation as above for the remaining j i produces Evidently, the thresholds are ordered according to I .The r (λ) q values may be recovered from this threshold description according to Using the same example above, given the channel code convex hull points {0, 1, 2, . . ., 6} and a set of thresholds (25), possible threshold values for Ᏹ 1 are (t 0 , t 1 , . . ., t 5 ) and so, r 1 = 5.Similarly, possible threshold values for Ᏹ 2 are (t 0 , . . ., t 3 ) and so, r 2 = 3. Performing the same computation as above for the remaining elements will produce the original code (24).Now, the unconstrained optimization problem from (8) may be expressed as where If we temporarily ignore the constraint that the thresholds must be properly ordered according to t (λ) I , we may maximize J (λ) by maximizing each of the terms O (λ)  i separately.We will find that we are justified in doing this since the solution will always satisfy the threshold ordering constraint.Maximizing O (λ)  i is equivalent to finding t (λ)  i , which maximize where λi = λ Ṙi / Ṗi .The same problem arises in connection with optimal truncation of embedded source codes 1 [13, Section 8.2].It is known that the solutions t (λ) i must be drawn from the convex hull set Ᏼ S .Similar to Ᏼ C , Ᏼ S contains vertices lying on the convex hull curve of the utility-length characteristic.Let 0 = h 0 < h 1 < • • • < h H = Q be an enumeration of the elements of Ᏼ S and let be the monotonically decreasing slopes associated with Ᏼ S .Then (31) 1 In fact, this is the same problem as in Section 4.1 except that P(r) and R(r) are replaced with t q=1 U q and t q=1 L q .
Finally, observe that Monotonicity of the channel coding slopes S C (i) implies that S C (i) ≥ S C (i + 1) and hence λ/S C (i) ≤ λ/S C (i + 1).Then, It follows that (34) Therefore, the required ordering property In summary, for each j i ∈ Ᏼ C , we find the threshold t (λ) and then assign (26).The solution is guaranteed to be at least as good as any other channel code assignment, in the sense of maximizing J (λ) subject to r (λ) Q , regardless of the convexity of the source or channel codes.The computational complexity is now ᏻ(IH) for each λ.Similar to the convex sources case, we employ the bisection search algorithm to find λ opt .

MULTICOP ASSIGNMENT
In the UniCOP assignment strategy, we assume that either the packet size S or the codeword length N can be set sufficiently large so that the data source can always fit into N packets.Specifically, the UniCOP assignment holds under the following condition: Recall from Figure 1 that NS is the COP size.The choice of the packet size depends on the type of channel that the data is transmitted through.Some channels might have low BERs allowing the use of large packet sizes with a reasonably high probability of receiving error-free packets.However, wireless channels typically require small packets due to their much higher BER.Packaging a large amount of source data into small packets requires a large number of packets and hence long codewords.This is undesirable since it imposes a computational burden on both the channel encoder and, especially, the channel decoder.
If the entire collection of protected source elements cannot fit into a set of N packets of length S, more than one COP must be employed.When elements are arranged into COPs, we no longer have any guarantee that a source element with a stronger code can be recovered whenever a source element with a weaker code is recovered.The code redundancy assignment strategy described in Section 4 relies upon this property in order to ensure that element dependencies are satisfied, allowing us to use (1) for the expected utility.

Code redundancy optimization
Consider a collection of C COPs {Ꮿ 1 , . . ., Ꮿ C } characterized by {(s 1 , f 1 ), . . ., (s C , f C )}, where s c and f c represent the indices of the first and the last source elements residing in the COP Ꮿ c .We assume that the source elements have a simple chain of dependencies prior to recovering an element Ᏹ q , all preceding elements Ᏹ 1 , . . ., Ᏹ q−1 must be recovered first.Within each COP Ꮿ i , we can still constrain the code redundancies to satisfy and guarantee that no element in COP Ꮿ i will be recovered unless all of its dependencies within the same COP are also recovered.The probability P(r fi ) of recovering the last element Ᏹ fi thus denotes the probability that all elements in COP Ꮿ i are recovered successfully.Therefore, any element Ᏹ q in COP Ꮿ c , which is correctly recovered from the channel, will be usable if and only if the last element of each earlier COP is recovered.This changes the expected utility in (1) to Our objective is to maximize this expression for U subject to the same total length constraint L max , as given in (7), and subject also to the constraint that for each COP Ꮿ c .Similar to the UniCOP assignment strategy, this constrained optimization problem can be converted into a set of unconstrained optimization problems parametrized by λ.Specifically, we search for the smallest λ such that L (λ) ≤ L max , where L (λ) is the overall transmission length associated with the set {r (λ) q } 1≤q≤Q , which maximizes This new functional turns out to be more difficult to optimize than that in (8) since the product terms in U (λ) couple the impact of code redundancy assignments for different elements.In fact, the optimization objective is generally multimodal exhibiting multiple local optima.Nevertheless, it is possible to devise a simple optimization strategy, which rapidly converges to a local optimum, with good results in practice.Specifically, given an initial set of {r q } 1≤q≤Q and considering only one COP, Ꮿ c , at a time, we can find a set of code redundancies {r sc , . . ., r fc } which maximizes J (λ) subject to all other r q 's being held constant.The solution is sensitive to the initial {r q } set since the optimization problem is multimodal.However, as we shall see shortly in Section 5.2, since we build multiple COPs out of one COP, it is reasonable to set the initial values of {r q } equal to those obtained from the UniCOP assignment of Section 4. The UniCOP assignment works under the assumption that all encoded source elements can fit into one COP.This algorithm is guaranteed to converge as we cycle through each COP in turn, since the code redundancies for each COP either increase J (λ) or leave it unchanged, and the optimization objective is clearly bounded above by q U q .The optimal solution for each COP is found by employing the scheme developed in Section 4. Our optimization objective for each COP Ꮿ c is to maximize a quantity while keeping code redundancies in other COPs constant.
The last element Ᏹ fc in COP Ꮿ c is unique since its recovery probability appears in the utility term of succeeding elements Ᏹ fc+1 , . . ., Ᏹ Q which reside in COPs Ꮿ c+1 , . . ., Ꮿ C .This effect is captured by the term Γ c can be considered as an additional contribution to the effective utility of Ᏹ fc .Evidently, Γ c is nonnegative, so it will always increase the effective utility of the last element in any COP Ꮿ c , c < C.Even if the original source elements have a convex utility-length characteristic such that the optimization of Apart from the last element q = f c , U q is a scaled version of U q involving the same scaling factor c−1 i=1 P(r (λ) i ) for each q.However, the last element Ᏹ fc has an additional utility Γ c which can destroy the convexity of the source effective utilitylength characteristic.This phenomenon forms the principle motivation for the development in Section 4 of code redundancy assignment strategy, which is free from the assumption of convexity on the source or channel code characteristic.
In summary, the code redundancy assignment strategy for multiple COPs involves cycling through the COPs one at a time, holding the code redundancy for all COPs constant, and finding the values of r (λ)  q , s c ≤ q ≤ f c , which maximize , is achieved by using the strategy developed in Section 4, replacing each element's utility U q with its current effective utility U q .Specifically, for each COP Ꮿ c , we find a set of {t (λ)  i } which must be drawn from the convex hull set Ᏼ (c) S of the source effective utility-length characteristic.Since U fc is affected by {r fc+1 , . . ., r Q }, elements in S may vary depending on these code redundancies and thus must be recomputed at each iteration of the algorithm.Then, where The solution r (λ) q may be recovered from t (λ) i using (26).As in the UniCOP case, we find the smallest value of λ such that the resulting solution satisfies L (λ) ≤ L max .Similar to the UniCOP assignment of nonconvex sources, for each COP Ꮿ c , the computation complexity is ᏻ(IH c ), where H c is the number of elements in Ᏼ (c) S .Hence, in each iteration, it requires ᏻ(IH) computations, where H = C c=1 H c .For some λ > 0, it typically requires fewer than 10 iterations for the solution to converge.

COP allocation algorithm
We are still left with the problem of determining the best allocation of elements to COPs subject to the constraint that the encoded source elements in any given COP should be no larger than NS.When L max is larger than NS, the need to use multiple COPs is inevitable.The proposed algorithm starts by allocating all source elements to a single COP Ꮿ 1 .Code redundancies are found by applying the UniCOP assignment strategy of Section 4. COP Ꮿ 1 is then split into two parts, the first of which contains as many elements as possible ( f 1 as large as possible) while still having an encoded length L Ꮿ1 no larger than NS.At this point, the number of COPs is C = 2 and Ꮿ 2 does not generally satisfy L Ꮿ2 ≤ NS.
The algorithm proceeds in an iterative sequence of steps.At the start of the tth step, there are C t COPs, all but the last of which have encoded lengths no larger than NS.In this step, we first apply the MultiCOP code redundancy assignment algorithm of Section 5.1 to find a new set of {r sc , . . ., r fc } for each COP Ꮿ c maximizing the total expected utility subject to Figure 7: Case 1 of the COP allocation algorithm.At step t, L Ꮿc exceeds NS and hence is truncated.Its trailing elements and the rest of source elements are allocated to one COP, Ꮿ Ct+1 .the overall length constraint L max .The new code redundancies produced by the MultiCOP assignment algorithm may cause one or more of the initial C t −1 COPs to violate the encoded length constraint of L Ꮿc ≤ NS.In fact, as the algorithm proceeds, the encoded lengths of source elements assigned to all but the last COP tend to increase rather than decrease, as we shall argue later.The step is completed in one of two ways depending on whether or not this happens.
Case 1 (L Ꮿc > NS for some c < C t ).Let Ꮿ c be the first COP for which L Ꮿ c > NS.In this case, we find the largest value of f ≥ s c such that f q=s c L q R(r q ) ≤ NS.COP Ꮿ c is truncated by setting f c = f and all of the remaining source elements Ᏹ f +1 , Ᏹ f +2 , . . ., Ᏹ Q are allocated to Ꮿ c +1 .The algorithm proceeds in the next step with only C t+1 = c + 1 ≤ C t COPs, all but the last of which satisfy the length constraint.Figure 7 illustrates this case.
Case 2 (L Cc ≤ NS, for all c < C t ).In this case, we find the largest value of f ≥ s Ct in order to satisfy f q=sC t L q R(r q ) ≤ NS, setting f Ct = f .If f = Q, all source elements are already allocated to COPs, satisfying the length constraint, and their code redundancies are already jointly optimized, so we are done.Otherwise, the algorithm proceeds in the next step with C t+1 = C t + 1 COPs, where Ꮿ Ct+1 contains all of the remaining source elements Ᏹ f +1 , Ᏹ f +2 , . . ., Ᏹ Q .Figure 8 demonstrates this case.
To show that the algorithm must complete after a finite number of steps, observe first that the number of COPs must be bounded above by some quantity M ≤ Q. Next, define an integer-valued functional where denotes the number of source elements allocated to COP Ꮿ c at the beginning of step t.This functional has the important property that each step in the allocation algorithm decreases Z t .Since Z t is always a positive finite integer, the algorithm must therefore complete in a finite number of steps.To see that each step does indeed decrease Z t , consider the two cases.If step t falls into Case 1, with Ꮿ c the COP whose contents are reduced, we have (49) If step t falls into Case 2, some of the source elements are moved from Ꮿ Ct to Ꮿ Ct+1 , where their contribution to Z t is reduced by a factor of Q, so Z t+1 < Z t .
The key property of our proposed COP allocation algorithm, which ensures its convergence, is that whenever a step does not split the final COP, it necessarily decreases the number of source elements contained in a previous COP Ꮿ c .The algorithm contains no provision for subsequently reconsidering this decision and moving some or all of these elements back into Ꮿ c .We claim that there is no need to revisit the decision to move elements out of Ꮿ c for the following reason.Assuming that we do not alter the contents of any previous COPs (otherwise, the algorithm essentially restarts from that earlier COP boundary), by the time the allocation is completed, all source elements following the last element in Ꮿ c should be allocated to COPs Ꮿ c , with indices c at least as large as they were in step t.Considering (44), the effective utilities of these source elements will tend to be reduced relative to the effective utilities of the source elements allocated to Ꮿ 1 through Ꮿ c .Accordingly, one should expect the source elements allocated to Ꮿ 1 through Ꮿ c to receive a larger share of the overall length budget L max , meaning that their coded lengths should be at least as large as they were in step t.While this is not a rigorous proof of optimality, it provides a strong justification for the proposed allocation algorithm.
In practice, as the algorithm proceeds, we always observe that the code redundancies assigned to source elements in earlier COPs either remain unchanged or else increase.

Remarks on the effect of packet size
Throughout this paper, we have assumed that P(r q ), the probability of receiving sufficient packets to decode an element Ᏹ q , depends only on the selection of r q = N − k q + 1, where an (N, k q ) channel code is used.The value of N is fixed, but as discussed in Section 3, the value of P(r q ) also depends upon the actual size of each encoded packet.We have taken this to be S, but our code redundancy assignment and COP allocation algorithms use S only as an upper bound for the packet size.If the maximum NS bytes are not used by any COP, each packet may actually be smaller than S. This, in turn, may alter the values of P(r q ) so that our assigned codes are no longer optimal.
Fortunately, if the individual source elements are sufficiently small, the actual size of each COP should be approximately equal to its maximum value of NS, meaning that the actual packet size should be close to its maximum value of S. It is true that allocating more COPs, each with a smaller packet size, can yield higher expected utilities.However, rather than explicitly accounting for the effect of actual packet sizes within our optimization algorithm, various values for S are considered in an "outer optimization loop."In particular, for each value of S, we compute the channel coding characteristic described by P(r q ) and R(r q ) and then invoke our COP allocation and code redundancy optimization algorithm.Section 6 presents expected utility results obtained for various values of S. One potential limitation of this strategy is that the packet size S is essentially being forced to take on the same value within each COP.We have not considered the possibility of allowing different packet sizes or even different channel code lengths N for each COP.

COMPARATIVE RESULTS
In this section, we compare the total expected utility of a compressed image at the destination whose code redundancies have been determined using the UniCOP and MultiCOP assignment strategies described in Sections 4 and 5.We select a code length N = 100, a maximum transmission length L max = 1 000 000 bytes, a range of BER , and packet sizes S. The scalable data source used in these experiments is a 2560 × 2048 JPEG2000 compressed image, decomposed into 6 resolution levels.The image is grayscale exhibiting only one colour component and we treat the entire image as one tile component.Each resolution level is divided into a collection of precincts with size 128×128 samples, resulting in a total of 429 precincts.Each precinct is further decomposed into 12 quality elements.Overall, there are 5149 elements, treating each quality element and the data stream header as a source element.It is necessary to create a large number of source elements so as to minimize the impact of the discrete nature of our optimization problem, which may otherwise produce suboptimal solutions as discussed in Section 4.1.We arrange the source elements into a linear sequence exhibiting a simple chain of dependencies with a convex utilitylength characteristic.For simplicity, we assume that in the event, where any part of any element is corrupted, the entire element will be rendered useless along with all subsequent elements which depend upon it. 2The UniCOP results were obtained by using the UniCOP assignment under the assumption that all source elements can be arranged into one COP.The encoded elements are then assigned to multiple COPs, whenever this is demanded by the constraint NS.The utility measure used here is a negated MSE, and the total expected utility is conveniently expressed in terms of peak signal-tonoise ratio (PSNR). 3An improvement in the expected utility is equivalent to an increase in PSNR.To get a reasonable approximation of the total expected utility for each value of the packet size parameter S, the number of experiments which we run to find the overall expected utility is adjusted according to the packet loss probability.The MultiCOP results were based on the MultiCOP assignment algorithm, which progressively allocates source elements to COPs and assigns code redundancies to source elements accordingly.Figures 9, 10, and 11 compare the UniCOP results with those obtained by using the MultiCOP assignment strategy.
If all of the coded source elements are able to fit inside a single COP subject to the constraints determined by N and S, the UniCOP assignment will be optimal.Moreover, in this case, the UniCOP and MultiCOP strategies produce identical solutions.Otherwise, source elements must be assigned to multiple COPs, violating some of the assumptions underlying the UniCOP assignment strategy.In particular, the 2 In practice, this assumption is excessively conservative since JPEG2000 decoders are able to recover well from some types of error.
3 Peak signal-to-noise ratio is defined as 10 log(P 2 /MSE), where P is the peak-to-peak signal amplitude.In this case, P = 255 since we are working with 8-bit images.recovery of any element Ᏹ q no longer guarantees the recovery of all the preceding elements Ᏹ 1 , . . ., Ᏹ q−1 whose code redundancies are at least equal to that of Ᏹ q .In this case, we would expect to see the MultiCOP assignment strategy providing superior performance.Figures 9,10,and 11 show that both UniCOP and MultiCOP assignment strategies produce higher PSNR when the packet sizes are small.This is due to the fact that for a given BER, the packet loss probability decreases with decreasing the packet size.Low packet loss probability allows the elements to be assigned with weak codes and hence to be encoded with lower amount of redundancy.Therefore, it is possible to transmit more encoded elements without exceeding the maximum length L max .
The PSNR values from the MultiCOP assignment strategy are always higher relative to the UniCOP case for a given packet size and BER.The MultiCOP assignment process described in Section 5 increases the expected utilities of the elements in the earlier COPs relative to those of the elements in the later COPs.This causes the same or a stronger code to be assigned to the elements in the earlier COPs.As a result, the elements in the earlier COPs are not corrupted as easily by packet loss.
The improvement in the PSNR for the MultiCOP assignment also depends on the BER .At high BER, the difference in PSNR could reach above 5 dB, but at low BER, the difference is at most 2 dB.The main reason for this is that the high BER, which requires the use of small packet sizes, produces a large number of COPs.Therefore, the code redundancies produced by the MultiCOP assignment differ much from those produced by the UniCOP assignment at high error rates.Accordingly, the MultiCOP assignment strategy is particularly very appealing for channels with high error rates, such as wireless channels.
Finally, the results show that for any given BER , the improvement in PSNR diminishes as the packet size decreases.This is due to the fact that fewer packets are lost since a decrease in packet size reduces packet loss probability.In turn, the likelihood of recovering source elements for both Multi-COP and UniCOP assignment strategies increases, resulting in similar overall expected utility.Of course, applications do not generally have the freedom of selecting packet sizes.The use of small packets increases the amount of the packet overhead, a fact which is not taken into account in the results presented here.

CONCLUSIONS
Although PET provides an excellent framework for optimal protection of scalable data sources against erasure, it suffers from a difficulty that all channel codes must span the entire collection of network packets.In many practical applications, the size of the data source is large and packet sizes must be relatively small, leading to the need for long and computationally demanding channel codes.Two solutions to this problem present themselves immediately.Small network packets can be concatenated forming larger packets, thereby reducing the codeword length of the channel codes.Unfortunately, an erasure channel model is required such that the larger packets must be considered lost if any of their constituent packets are lost.Clearly, this solution is unsuitable for channels with significant packet loss probability.
As an alternative, the code redundancy assignment optimized for the PET framework could be used with shorter channel codes representing smaller COPs.When data must be divided up into independently coded COPs with shorter channel codes, the MultiCOP assignment strategy proposed in this paper provides significant improvements in the expected utility (PSNR).Nevertheless, the need to use multiple COPs imposes a penalty of its own.One drawback of the multiple COP assignment strategy is the amount of computation required to determine optimal code redundancies at the transmitter.This is particularly significant when there are large numbers of source elements and/or the COP size is small.It is reasonable to expect exactly these conditions when transmitting a large compressed image over a wireless network.The development of fast algorithms for finding the MultiCOP assignment remains an active topic of investigation.
Including the codeword length as a parameter in the code redundancy assignment problem allows for flexibility in the choice of channel coding complexity.Since the channel decoder is generally more complex than the channel encoder, selecting short codewords will ease the computational burden at the receiver.This is particularly important for wireless mobile devices which have tight power and hence computation constraints.

Figure 1 :
Figure 1: An example of PET arrangement of source elements into packets.Four elements are arranged into N = 5 packets with size S bytes.Elements Ᏹ 1 to Ᏹ 4 are assigned k = {2, 3, 4, 5}, respectively.The white areas correspond to the elements' content while the shaded areas contain parity information.

Figure 2 :
Figure 2: Example of dependency structure of scalable sources.

Figure 3 :
Figure 3: Example of P N,k versus R N,k characteristic with N = 50 and p = 0.3.

Figure 8 :
Figure 8: Case 2 of the COP allocation algorithm.At step t, the last COP is divided into two, the first of which, Ꮿ Ct , satisfies NS.