A Packetized SPIHT Algorithm with Overcomplete Wavelet Coefﬁcients for Increased Robustness

This paper presents a wavelet-based image encoding scheme with error resilience and error concealment suitable for transmission over networks prone to packet losses. The scheme involves partitioning the data into independent descriptions of roughly equal lengths, achieved by a combination of packetization and modiﬁcations to the wavelet tree structure without additional redundancy. With a weighted-averaging-based interpolation method, our proposed encoding scheme attains an improvement of about 0.5–1.5 dB in PSNR over other similar methods. We also investigate the use of overcomplete wavelet transform coe ﬃ cients as side information for our encoding scheme to improve the error resilience when severe packet losses occur. Experiments show that we are able to achieve a high coding performance along with a good perceptual quality for the reconstructed image.


INTRODUCTION
Efficient delivery of images over data communication networks requires the maintenance of a balance between available bandwidth and perceptual quality of the received data, with minimum transmission delays. With recent increase in use of wireless communications and multimedia applications, error-resilient capabilities need to be incorporated into image coders with good compression performances. Packetswitched networks are very susceptible to transmission errors since network congestions and other issues can cause transient channel shutdowns leading to packet delays and losses. Even in situations when protocols such as TCP are used to ensure the delivery of packets, large delays occur due to the retransmission of lost packets.
The use of multiple description (MD) coding [1] for encoding and transmitting images across networks with packet losses is being currently investigated in the literature. MD coding involves the creation of different descriptions or packets of equal importance from the source data which are then separately transmitted over the network. Each description can be independently decoded so that the loss of some of the descriptions does not affect the decoding of the properly received ones. In the context of packet-based transmission, MD coding can be described as encoding of the source data into N ≥ 2 packets, such that the reconstruction obtained from any 0 < k ≤ N packets is also good.
Various MD coding schemes have been used to provide for robust transmission of images. The MD scalar quantizer developed by Vaishampayan [2] was applied to wavelet image coding in [3,4]. An unequal forward error correction technique to create multiple descriptions of images was suggested in [5]. These MD coding schemes for images are often applied to wavelet zerotree-based encoders like the EZW [6] and the SPIHT algorithm [7], which are fast, efficient, have low complexity, and provide high-quality images at extremely low bit rates. However, the progressive nature of these schemes results in an embedded data stream that can be easily corrupted by bit errors and packet losses. Such losses can cause severe distortions to the resulting output and make image reconstruction almost impossible in the absence of powerful channel codes or retransmission. Methods to improve the error resilience of zerotree-based image coders include the robust EZW (REZW) [8], packetized zerotree wavelet (PZW) algorithm [9], and dispersive packetization (DP) scheme [10]. Appropriate error concealment methods [11] are used in these coding schemes to minimize the error due to packet losses and recovery of the missing data.
In this paper, we present a new error-resilient, modified SPIHT wavelet image coding scheme that is suitable for transmission schemes where packet losses occur. Next, we incorporate certain overcomplete wavelet transform coefficients into our coding scheme to improve the robustness and provide a better compensation for packet losses. The 2 EURASIP Journal on Applied Signal Processing additional subbands obtained from the overcomplete representation are considered as side information and add a variable amount of redundancy to the encoded bit stream. The error-resilient coder creates descriptions based on proper packetization and partitioning of the wavelet coefficients. The method by itself introduces no extra redundancy into the signal. Error concealment is achieved by estimation of lost wavelet coefficients using an interpolation scheme that takes advantage of the data partitioning to minimize distortion.
The rest of the paper is organized as follows. Section 2 summarizes main features of SPIHT, explains the new errorresilient coding scheme along with the error concealment, and compares results to related coding schemes from the literature. Section 3 shows how overcomplete representations can be used to an advantage with our coding scheme when higher packet loss rates are encountered. Finally, conclusions are drawn in Section 4.

SPIHT coder
The SPIHT algorithm [7] works by testing ordered wavelet coefficients for significance in a decreasing bit plane order, and quantizing only the significant coefficients. The high coding efficiency obtained by this algorithm is due to group testing the coefficients that belong to a wavelet tree, that is, a set of wavelet coefficients across different scales with the same spatial information. The set of detail coefficients of a tree at each scale is referred to as a tile.
The trees are addressed based on the locations of the tree roots, which in turn are the approximation coefficients at the coarsest scale. The tree can be further classified as either horizontal, vertical, or diagonal tree based on the spatial orientation of the frequency information stored in the tree. Approximation coefficients that are not tree roots are called leaves. The combination of the three adjacent trees along with their roots and the adjacent leaf is called a square tree [12]. A square tree contains all the frequency information corresponding to a square block of the image in the pixel domain. The arrangement of wavelet trees, approximate coefficients, and the square tree is shown in Figure 1.
Group testing of the wavelet coefficients exploits the interband correlation that exists between the coefficients belonging to a tree. Based on the zerotree concept [6], if a wavelet coefficient at a given scale is found to be insignificant with respect to a given threshold, the 2 × 2 offspring of that coefficient at the next finer scale is also assumed to be insignificant. Thus, the encoding stops at the scale with the last significant coefficient.
The initial listing that determines the order in which significance tests are done is predetermined for both the approximation coefficients as well as the trees. Subsequent ordering of the coefficients is based on the partitioning of the sets and is encoded in the algorithm such that it can be reproduced at the decoder. At each bit plane significant coefficients with respect to the threshold are found and coded. Then the precision of each significant coefficient is enhanced by sending the next bit from the binary representation of the coefficient's value. The refinement allows for successive approximation quantization of the significant coefficients. The synchronized ordering information along with the refinement process leads to a progressive coding scheme, where even a truncated bit stream can be decoded to get a lower-rate image. It is this synchronization between the encoder and the decoder that makes images compressed with SPIHT susceptible to data loss, see Figures 2(a) and 2(b).

Packetization
To develop a robust implementation of the SPIHT algorithm, we create N different descriptions from the source data that are then transmitted separately. These descriptions are generated by a combination of packetization and partitioning of the wavelet coefficients so that an effective error concealment scheme can be obtained. We employ a packetization scheme that allocates the bits to each of the N packets such that they contain equally important information and can be independently decoded. To remove the dependencies between bits that exist due to the embedded nature of the coding algorithm, the packets are created so that they contain quantization information pertaining to only a certain subset of wavelet trees. Each packet is assigned an equal number of approximation coefficients and trees in a manner that they can be identified in the decoder by the packet number. A simple interleaving process ensures that neighboring approximation coefficients are assigned to different packets, thus preserving more neighbors for interpolation in case of a packet loss. The horizontal, vertical, and diagonal trees are each spread evenly among the different packets. This ensures that each packet contains coefficients Y. Sriraja and T. Karp that are from across the spatial-frequency domain. Figure 3 shows the allocation process, where each gray level corresponds to the underlying coefficients at those locations being assigned to a particular packet. The figure shows the interleaving of the approximate coefficients and the assignment of tiles in the detail subbands (that are a part of the corresponding tree) to different packets. It can be seen that the three (horizontal, vertical and diagonal) trees belonging to the same square tree are interleaved among themselves, so that no two of them end up in the same packet. In the figure, we observe 10 gray levels corresponding to N = 10 packets, with a zigzag interleaving scheme and an offset of one between the packet numbers of the horizontal, vertical, and diagonal trees of a square tree. The encoded bits corresponding to the constituent approximation coefficients and the trees in a packet are transmitted as a single independent description. The interleaving process yields packets with nearly equal number of bits after the encoding process. Each of the packets can be decoded separately irrespective of the order by which it is received at the decoder. Thus the distortion due to a packet loss is limited only to the data belonging to that packet and hence, depending on the spatial-frequency location of the packet's constituent coefficients, only a few areas on the image are damaged. However, since the packets contain whole wavelet trees, all edge information present in the spatial direction of the trees is lost for the corresponding area of the image. Further partitioning is done to prevent the total loss of edge information, and is described in the following subsection.

Shifted wavelet trees
To prevent the loss of all horizontal, vertical, or diagonal edge information for a spatial block in the image due to packet loss, we propose a modified way to build wavelet trees. The new wavelet trees are obtained by a process of directional shifting from scale to scale. The shifting modifies the tree structure such that each tile of detail coefficients becomes associated with the offspring of its neighboring tile along the corresponding orientation. Thus a tile belonging to a horizontal, vertical, or diagonal tree would be associated with a set that is shifted from its offspring, to the right, down, or right and down, respectively. By linking the tiles at each scale to a set of coefficients that are shifted from its offspring, we obtain a shifted tree structure. The linking process is shown in Figure 4, where tiles with the same gray level along the horizontal, vertical, and diagonal directions form the corresponding shifted wavelet trees.
The shifting is done in a cyclic manner so that tiles of coefficients at the edge of a subband are rolled over to the other end when the shift is applied to them. At the coarsest scale we do not perform any shift since the coefficients in those subbands are already interleaved during the packetization process described in Figure 3. The packetization is performed as described in the previous section with the new shifted wavelet trees replacing the corresponding traditional trees. This results in all the coefficients of a shifted wavelet tree being assigned to the same packet, Figure 4. All of the shifted wavelet tree coefficients are grouped, and tested together for  significance during the subsequent SPIHT encoding process. Note that the tiles in our new, shifted wavelet trees do not contain the same spatial information any longer. Thus, coding efficiency that is usually gained by exploiting the correlation of detail coefficients across scales is lost to a certain extent. However, the intraband correlation between neighboring tiles in the direction of the frequency information stored in that subband is usually high and therefore, by replacing a tile by its neighbor, the loss in coding efficiency is contained to a minimum.
Similar partitioning schemes of the wavelet coefficients have been proposed in [8,10]. However, in our method the partitioning is applied to tiles of detail coefficients rather than individual coefficients. Since the coefficients of each tile are encoded in the same packet as a group, such an arrangement minimizes the loss in coding efficiency due to partitioning. Further, by our method we are able to effectively partition both the square tree as well as the individual trees. The partitioning can be seen in Figures 3 and 4, by considering that all coefficients belonging to a packet are denoted by the same gray level, in both figures. The frequency information of a square wavelet tree from Figure 1 is dispersed throughout the packets. In case of a packet loss, neighboring information for lost approximate coefficients as well as significant edge information for a tree are still available in the other packets. This allows us to interpolate for missing coefficients based on the available information from the other correctly received packets.

Interpolation and recovery of lost data
The loss of a packet during transmission implies the loss of all the quantization information of its constituent coefficients. The interpolation scheme for the recovery of those missing coefficients is done in a two step manner: first, the lost detail coefficients are estimated. Then the lost approximation coefficients are interpolated.
(1) Detail coefficients: the number of detail coefficients lost in a wavelet tree depends on the size of the tile. Due to the shifted wavelet tree partitioning, a missing tile implies that detail coefficients in other scales belonging to the same spatial region of the image are generally available; see Figure 4. This allows us to exploit interband correlation in addition to intraband correlation for estimation of missing detail coefficients. We investigate three different approaches to recover lost detail coefficients.
(i) Set to zero: the simplest approach is to set all lost detail coefficients equal to zero. While we lose edge information in one scale, we still have edge information with the same spatial direction available from the remaining scales which reduces blurring. (ii) Intraband interpolation: this is applicable for the tiles lost in the coarsest scale. To estimate the lost 2 × 2 tiles, the least squares estimation with smoothness constraints on neighboring vertical (horizontal) coefficients proposed in [13] is applied to the subbands with vertical (horizontal) frequency information; see [13, Section B.1] for details. (iii) Interband interpolation: for all but the finest scale, the lost detail coefficients can be approximated by averaging the entries of the 2 × 2 offspring coefficients in the next finer scale.
(2) Approximation coefficients: estimation of the lost coefficients in the approximation band is usually done by averaging the available neighboring coefficients. This works fairly well for most cases, since the approximation coefficients are more correlated compared to the other bands. To further improve accuracy, we here propose an interpolation based on a weighted average of the neighboring coefficients. The weights for the neighboring coefficients are assigned based on the Y. Sriraja and T. Karp 5 values of the significant detail coefficients in the square tree to which the lost coefficient belongs. The horizontal, vertical, and the diagonal available neighbors of the lost coefficient are assigned weights that are proportional to the sum of absolute values of the coefficients of the tiles in the respective directions. The detail coefficients contain edge information pertaining to a specific orientation. We therefore interpolate for a lost coefficient by assigning more weight to those neighboring coefficients that lie along an edge than the other neighbors.
Let I x,y denote the missing approximation coefficient at position (x, y) and H, V , and D denote the sets of detail coefficients that belong to the coarsest scale tiles along the horizontal, vertical, and diagonal directions, respectively. Then, The weights along each direction are , (2)

Experimental results
A four-level wavelet decomposition with the Daubechie's 9/7 biorthogonal wavelet was applied to the 512×512, 8 bpp gray scale images used for the experiments. The images were encoded into 20 packets at different bit rates. To obtain the ratedistortion plots for the images with different packet loss and burst error scenarios, we consider a loss model where random packets are dropped independently. A number of packets are dropped out of a total of 20, based on the packet loss rates. Significant wavelet coefficients from the lost packets are estimated using the interpolation schemes described earlier.
Several Monte Carlo runs were performed to obtain average PSNR values. Table 1 shows the PSNRs' obtained for the Lena image compressed at 0.21 bpp at different packet loss rates for our shifted wavelet tree packetization scheme with the different interpolation schemes. It is observed that exploiting intraband or interband correlation to estimate lost detail coefficients at the coarsest level does not show any consistent advantage over just setting the lost detail coefficients to zero. However, applying the weighted averaging scheme as compared to simple nondirectional averaging for the approximation coefficients consistently improves the PSNR in all cases.
For the Y -component of the Lena image encoded at 0.21 bpp without any packet loss, a PSNR of 32.2 dB and a Packet loss rate (%) Our encoding scheme DP PZW Figure 5: PSNR of Lena image encoded at 0.21 bpp using dispersive packetization (DP) [10], packetized zerotree wavelet algorithm (PZW) [9], and our encoding scheme.
loss of 1.2 dB in coding efficiency compared to SPIHT were reported in [9,10]. With our encoding scheme we obtain a PSNR of 33.0 dB, a gain of about 0.8 dB over their methods. Figure 5 shows the PSNR improvements we obtain compared to the results reported in [9,10]. Figure 2(d) shows the decoded Lena image for our encoding scheme with 5% packet loss after interpolation of the lost approximation coefficients.

ROBUST CODING WITH OVERCOMPLETE WAVELET TRANSFORM
While our packetization scheme for SPIHT based on shifted wavelet trees combined with weighted averaging to estimate lost approximation coefficients performs well at low and moderate packet loss rates, performance deteriorates as packet losses increase; see Table 1. To further improve error 6 EURASIP Journal on Applied Signal Processing resilience, we perform overcomplete discrete wavelet transform (DWT) at the coarsest scale and encode the additional data in a parallel SPIHT encoder as redundancy. As opposed to those cases where error correction codes would be used, the redundancy yields a certain PSNR gain even when no losses occur and all the packets are received.

Overcomplete discrete wavelet transform
In general, 1D DWT is performed by passing an input data sequence x(n) through FIR lowpass and highpass filters with impulse responses h 0 (n) and h 1 (n), respectively, which depend on the choice of the wavelet [14,15]. Each filter is followed by a factor-of-two downsampling operation to yield critical subsampling. The downsampling is performed by retaining either the even-indexed coefficients, or zeroth type-1 polyphase component [16] denoted by y 00 (m) and y 10 (m) in Figure 6, or the odd-indexed coefficients, or first type-1 polyphase component denoted by y 01 (m) and y 11 (m) in Figure 6. The odd-and the even-indexed coefficients have a different set of values since the DWT is shift variant. However, they are redundant in the sense that an inverse DWT (IDWT) can be applied to any one of them to recover the original input sequence. In fact, each set of coefficients can be calculated from the other one [17]. Alternatively, one can also use both sets of coefficients for the IDWT, and rescale the reconstructed signal by dividing it by two. The original indexing however needs to be known during the IDWT process so that the coefficients can be placed at the proper locations after the upsampling; see Figure 7.
The perfect reconstruction property of the DWT is based on the fact that aliasing that occurs at the decomposition cancels during reconstruction of the signal. However, this only holds true if no signal processing is applied to the subbands, that is, y i, j (m), in Figure 7, equals y i, j (m), i, j = 0, 1, in Figure 6. Once the wavelet coefficients are quantized, aliasing of the quantization error does not cancel out any longer. By eliminating the subsampling, and keeping both polyphase components, we can prevent aliasing at the cost of doubling the data rate. It thus reduces the reconstruction error to quantization noise and improves the quality of the reconstructed signal.

Overcomplete coefficients as side information
In our implementation, we make use of the redundancy obtained by the overcomplete DWT in the following manner. At each level of decomposition of the 2D DWT, we choose the even-indexed coefficients both along the rows and the columns to obtain a regular DWT pyramid. At the highest level of decomposition an additional set of detail and approximation coefficients is also generated by keeping the oddindexed coefficients along the rows and columns. This oddindexed set of wavelet coefficients is encoded and transmitted as redundancy along with the original bitstream.
The number of bits for encoding the odd-indexed coefficients is allotted such that both the odd-indexed set and  the 4 corresponding subbands in the even-indexed set are quantized to the same bit plane. Since the odd-indexed set is obtained at the coarsest scale of the wavelet decomposition, where most of the signal energy is usually concentrated, it consumes a high percentage of the total bit budget for quantization. This percentage increases when images are encoded at low bit rates, where most of the bits are spent in encoding the coarse-scale subbands. Table 2 lists the redundancy required for different 512 × 512 images at various bit rates/compression ratios (CR). The Lena image requires a higher percentage of bits because the image has a significant amount of low frequency information compared to other images.
The odd-indexed coefficients are encoded into N packets using the same packetization scheme described earlier; see Figure 3. Since the four subbands represent a single level of wavelet decomposition, the partitioning that is applied to the finer scales is not required in this case. Packets from both sets which contain encoded information corresponding to the same spatial regions obtain the same number. Each of the N packets from the odd-indexed set is then appended to a packet, which is obtained by the encoding of the original wavelet decomposition subbands, but numbered differently. In case of packet losses, the same interpolation procedure as in the previous section is followed for both sets of subbands. Since the packets belonging to both sets are combined with an offset, the coefficients corresponding to the same spatial location in the wavelet subbands are lost in just one of the sets. Thus the other set always has a signal that is not interpolated. Using both sets of coefficients for the IDWT and combining the subbands as described earlier, and shown in Figure 7, leads to a reconstructed image that has a reduced distortion when compared with similar scenarios where the overcomplete representation is not used.

Results
We perform our experiments with the same number of wavelet decomposition steps and the same loss model as described in the previous section. The loss of a packet in this case, however, means that packet number k o from the odd-indexed set is lost along with packet number k e of the even-indexed set where, k o = k e + m, and m is the offset in numbering. Since adding the redundancy effectively increases the bit rate of the encoded image, we compare our results with the case when the image is being coded using our coding scheme without redundancy, but at an increased bit rate. Figure 8 shows the rate distortion curves for two images with the overcomplete information for different packet loss rates. As can be seen, at lower bit rates and low packet loss cases, preventing aliasing by using the overcomplete information is not enough to equal the gain obtained by using more bits for a finer quantization. However, when packet losses increase, using the overcomplete information as redundancy provides a definite improvement in performance over the case where only the packetization and interpolation are used for error concealment. The comparison with MD coding schemes such as in [3,18] is difficult due the different transforms used, and higher redundancy involved in those MD schemes. However, we expect to achieve a superior performance by using the overcomplete coefficients in an MD scenario, where different overcomplete sets of coefficients can be considered for constructing the different descriptions.

CONCLUSIONS
An error-resilient wavelet zerotree-based image coding method has been presented. The method is based on effective packetization obtained by partitioning and modifying the wavelet trees and a weighted averaging scheme for recovery of lost approximation coefficients. A high coding efficiency and low distortion for moderate packet loss rates are obtained without introducing any form of extra redundancy. Redundancy in the form of overcomplete wavelet coefficients has been introduced to improve the robustness of the coding method for higher packet loss rates. While maintaining the coding efficiency for low and moderate packet losses, the use of overcomplete coefficients as redundant information greatly improves the performance when higher packet loss rates are encountered.

EURASIP Journal on Applied Signal Processing
Experimental results indicate an improvement of 0.5-1.5 dB in PSNR with respect to other coding schemes over a range of packet loss rates. The perceptual quality of the reconstructed image is also suitably maintained suggesting that our method is suited for image transmission over lossy packet-switched networks.