 Research
 Open access
 Published:
Encoderdriven rate control and mode decision for distributed video coding
EURASIP Journal on Advances in Signal Processing volumeÂ 2013, ArticleÂ number:Â 156 (2013)
Abstract
To provide lowcomplexity encoding for video in unidirectional or offline compression scenarios, this paper proposes an efficient feedbackchannelfree distributed video coding architecture featuring a novel encoderdriven rate control scheme in tandem with a designated mode selection process. To this end, the encoder features a novel lowcomplexity motion estimation technique to approximate the sideinformation (SI) available at the decoder. Then, a SIdependent correlation channel estimation between the approximated SI and the original frames is used to derive the theoretically required rate for successful SlepianWolf (SW) decoding. Based on the evaluation of the expected tradeoff between the estimated required coding rate and the estimated distortion outcome, a novel encoderside mode decision module assigns a different coding mode to distinct portions of the coded frames. In this context, skip, intra and SW coding modes are supported. To reduce the effect of underestimation, the final SW rate is adjusted upwards using a novel rate formula. Additionally, a successive SI refinement technique is exploited at the decoder to decrease the number of SW decoding failures. Experimental results illustrate the benefit of the different coding options and show similar or superior compression performance with respect to the feedbackbased DISCOVER benchmark system. Finally, the lowcomplexity encoding characteristics of the proposed system are confirmed, as well as the beneficial impact of the proposed scheme on the decoding complexity.
1 Introduction
The fundamental work of Slepian and Wolf[1] proved that separate lossless encoding but joint decoding of independently and identically distributed (i.i.d.) discrete random sources X and Y can be as efficient as joint encoding and joint decoding. The former setting is known as SlepianWolf (SW) coding or distributed source coding. In a particular case of the former scenario, called asymmetric SW coding, one source, e.g. Y, is compressed to its proper entropy while the other source, X, is compressed separately to the conditional entropy H(XY). At the decoder, source Y is restored after which X is decoded in the presence of Y, called the sideinformation (SI). Extending the asymmetric SW coding setup, Wyner and Ziv[2] established the achievable lower rate bound under a distortion constraint when a single source is independently encoded but decoded in the presence of SI. The WynerZiv (WZ) theorem states that in such a coding scenario, a rate loss generally occurs compared to the setting where the encoder also has access to the SI.
However, a loss in compression performance is acceptable with respect to the benefit brought by adopting WZ coding. Independent encoding of information sources enables lowcomplexity encoding architectures since the removal of intersource redundancy is no longer an encoder task. Instead, the encoding operation is essentially reduced to quantization followed by asymmetric SW encoding, usually implemented using channel encoding which is of low complexity. WZ coding found its application in coding data under severe resource constraints[3], e.g. in terms of computational power or energy supply. Specifically, distributed video coding (DVC)[4, 5] 'essentially WZ coding for videoâ€™ offers lowcomplexity encoding architectures. In DVC, complex operations, like motion estimation and compensation, are performed at the decoder to create SI. As a result, DVC targets lightweight multimedia applications[6], e.g. wireless capsule endoscopy[7, 8].
In practical WZ coding of video, rate control poses a major challenge. Namely, what is the required SW or channel rate to ensure successful channel decoding in the presence of the SI? Because of the distributed nature of a WZ coding system, the encoder has no access to the SI, since it is generated at the decoder. Hence, a WZ encoder is not in a position to determine the required channel rate exactly, since the conditional entropy H(XY) cannot be measured directly. Solely duplicating the operations performed at the decoder would provide the encoder an identical copy of the SI. However, this would involve complex SI generation operations at the encoder, which would (1) compromise the low encoding complexity benefit of DVC and (2) rather favour a traditional predictive coding approach from a compression performance point of view.
The majority of highperformance DVC systems make use of a feedback channel to solve the rate control problem. Such an approach, often referred to as decoderdriven rate control, sends nonsystematic information in chunks[5]. Should the channel rate prove insufficient for proper decoding, the decoder is able to request a larger amount of nonsystematic information from the encoder and attempt decoding anew. The process is repeated until decoding proves successful. In this way, the presence of a feedback channel not only guarantees decoding success but also ensures that this is achieved at a minimal channel rate. However, it is evident that a feedbackchannelbased rate control scheme is incompatible with unidirectional application scenarios. Moreover, decoderdriven rate control links the encoding and decoding process. Consequently, feedbackchannelbased DVC is unsuitable for offline applications, e.g. storage purposes, and may demonstrate excessive delay[9].
In feedbackchannelfree (alias unidirectional) DVC architectures, the encoder is responsible for determining the required channel rate for successful decoding, which is referred to as encoderdriven rate control. However, estimating the necessary channel rate at the encoder is a delicate problem; underestimation leads to a poor decoding of the source while overestimation results in wasted rate. Hence, such DVC systems suffer a performance loss with respect to feedbackbased schemes. The encoder's main obstacle to determine the necessary rate to guarantee decoding is the lack of access to the SI. Instead, the latter is approximated at the encoder, where special care must be taken as not to compromise the lowcomplexity encoding characteristics. Moreover, feedbackchannelfree systems may suffer from inflated decoding complexity[4, 10].
This paper introduces a novel feedbackchannelfree transformdomain WZ (TDWZ) video coding architecture^{a}. The core system is an efficient hashbased WZ video codec[11]. To avoid feedback, the proposed system creates an encoderside approximation of the SI using a novel technique that mimics the SI generation executed by the decoder, without undermining lowcomplexity encoding. Based on the correlation between the original frame and the estimated SI, a novel encoderdriven rate allocation scheme assigns an appropriate SW rate. To increase compression performance and reduce the effect of SW rate underestimation, the proposed architecture also features a novel encoderdriven mode decision process. If the quality of the corresponding SI is expected to be high, parts of the original frames may not be coded at all but rather skipped and reconstructed as the SI. Alternatively, conventional entropy (intra) coding may be applied when failure of proper SW decoding is likely or would result in severe distortion. At the decoder, a successive SI refinement scheme is exploited to minimize the distortion associated to SW rate underestimation. At every SI refinement stage, a higherquality version of the SI is generated. This creates the opportunity to reattempt to decode any SW coded information that failed to decode properly at the previous refinement stages.
A version of the feedbackfree DVC system proposed in this paper, excluding the encoderdriven mode decision process, was presented in[12]. The experimental results presented in this work illustrate the benefit of the different coding modes available to the proposed system and clarify their influence on the compression performance. Additionally, in contrast to[12], the experimental results include an analysis of the impact of the lowcomplexity SI approximation methods on the overall RD performance and the encoding complexity. The compression performance of the proposed feedbackfree system is compared to a collection of alternative feedbackbased DVC systems, including the benchmark DISCOVER[13] codec. The experimental results show that the proposed feedbackfree architecture achieves similar or superior compression performance with respect to DISCOVER[13], which is noteworthy considering that most feedbackchannelfree systems in the literature are significantly falling behind DISCOVER[14â€“16]. A last set of experiments confirms the lowcomplexity encoding characteristics and the beneficial effect of the proposed scheme on the decoding complexity.
The rest of this paper is organized as follows. Section 2 offers an overview of related work and highlights the novel features included in the proposed architecture. Section 3 details the proposed system broken down in its primary components. Experimental results are provided in Section 4, and finally, Section 5 concludes the paper.
2 Related work and contributions
2.1 Related work
2.1.1 Feedbackfree DVC solutions
The unidirectional pixeldomain DVC codec in[17] used two parallel WZ encoders where the original image is scattered by interleavers prior to encoding. The architecture was further enhanced[18] with an iterative decoding scheme where the SI was gradually updated using spatiotemporal predictions. In both schemes, the rate was an input parameter. An alternative encoderdriven rate control scheme for pixeldomain DVC was put forward in[16]. A coarse approximation of the SI is generated at the encoder by averaging the key frames in a group of pictures (GOP) of 2, after which the correlation noise is modelled by a zeromean Laplacian distribution. The Laplacian correlation noise model serves as a basis to derive the bitplane error probability, which is mapped to a bitrate using functions trained offline. The probabilities are calculated without taking any previously decoded bitplanes into account. The final rate calculation was modified in[19], where the offline module was replaced by machine learning. Considering a multiuser scenario, the feedback channel was removed from a pixeldomain architecture in[20].
However, the compression performance of pixeldomain DVC lags behind that of TDWZ architectures[5]. In[21], a feedbackchannelfree transformdomain architecture was designed, where a coarse version of the SI was generated by averaging the key frames in a GOP size of 2. Then, the SW rate is derived from the coarse SI based on empirical results obtained in offline experiments.
The first motion estimation algorithm to generate an approximation of the SI at the encoder was proposed in[14] and integrated in a TDWZ architecture. In essence, the algorithm constitutes a lowcomplexity variant of the motioncompensated interpolation (MCI) method employed by the decoder to generate SI. The technique performs MCI for a limited number of blocks based on the sum of absolute differences (SAD) criterion, while the SI approximation for the other blocks is the average of the colocated blocks in the reference frames. The resulting approximation of the correlation noise instantiates a Laplacian correlation noise model per frequency band, based on which a closedform formula determines the required SW rate. The conditional error probabilities are computed as in[22], where any already decoded bitplanes are taken into account. The scheme was extended in[15], where additional care was taken to reduce the probability of failed channel decoding. When a bitplane is not errorfree after the maximum number of decoding runs has been reached, the loglikelihood ratios (LLRs) for the bits that are most likely to be erroneous are flipped, after which channel decoding is attempted anew.
In[23], further tools were introduced to increase the performance. At the encoder, the quantized symbols of original WZ frames as well as the coarse SI frames undergo Gray mapping[24] prior to SW coding. Additionally, an updated form of the closedform formula in[14] yields the estimated SW rate. At the decoder, the reconstruction[25] of the coefficients was modified to cope with any bitplanes of the quantization indices that failed to decode. The final reconstruction is the weighted sum of the centroids of every individual bin, where the weights are assigned according to the bitplane LLRs after Turbo decoding[26]. Finally,[23] used a SI refinement stage after all frequency bands have been decoded based on overlapped block motion estimation (OBME). Based on the refined SI, new attempts are made to decode any erroneous bitplanes and reconstruction is performed again.
2.1.2 Coding modes and DVC architectures
When integrating multiple coding modes in DVC, a fundamental decision is the locality of the mode decision process, namely, at the encoder or at the decoder. Given the distributed nature of the system, optimized mode selection is challenged by the fact that the original signal is only available to the encoder, while the SI is only present at the decoder.
Regarding decoderdriven mode decision, a pixeldomain DVC architecture which skips bitplanes based on a ratedistortion model was presented in[27]. However, the skip mode only improved performance on low and medium motion sequences. The work in[28] includes a feedbackbased TDWZ architecture with decoderdriven skip, intra and SW modes. The skip mode is selected based on the tradeoff between rate and distortion derived from the SI and the virtual correlation channel. The decision whether to apply intra or SW coding is made by applying both modes to the colocated bitplane in the previous decoded frame, after which the mode yielding the lowest rate is selected. In this way, the complexity of the decoder is increased since encoding and decoding is duplicated at the decoder.
A decoderdriven blockbased mode decision scheme was proposed in[29]. By evaluating the linearity of the motion vectors, blocks with linear motion vectors were skipped, while blocks with highly nonlinear motion vectors were supported with additional hash information to help improve the SI quality. Alternatively, the blockbased DVC architecture in[28] allowed individual blocks to be skipped, intra or WZ coded, based on the estimated accuracy of the SI. This is achieved by assessing the mean squared error between the past and the future reference blocks. The decision between intra and WZ coding is determined on an RD basis, by selecting the mode with the lowest rate at equal distortion.
In all the above codecs, the outcome of the mode selection process must be signalled to the encoder via a feedback channel, rendering them unsuitable in a unidirectional context. An encoderside mode selection approach was followed in PRISM[4]. A total of 16 different coding modes or classes are available for every 8â€‰Ã—â€‰8 block, where each class corresponds to a different distribution of skipped, intra or SW coded bits. PRISM assigns coding modes based on thresholding the squared error difference between every block and its colocated block in the previous frame. In[30], blocks were coded either WZ or with a combination of WZ and intra coding. The encoder selects the mode that yields the lowest estimated rate. The hybrid coding mode sends a lowquality intracoded version of the block, which helps to improve the quality of the SI. In[31], a blockbased skip mode was integrated in a TDWZ codec. SW coding, using Turbo[26] codes, was used to compress the entire frame, where the Turbo decoder was modified as to cope with the skipped blocks. Both codecs in[30] and[31] employ a feedback channel for rate control.
2.2 Contributions
In contrast to existing systems, this is the first work to introduce an encoderdriven mode decision at the bitplane and frequency band level in a feedbackfree hashbased DVC system. To support unidirectional operation, the proposed codec includes several new features at the encoder and decoder.
First, the SI is approximated at the encoder by a lowcomplexity emulation of the hashbased OBME and compensation (OBMEC) technique used to generate the SI at the decoder. This approach conceptually differs from the fast MCI scheme used in[23], which coarsely matches the motioncompensated interpolation technique of[23] to generate SI at the decoder.
Second, the SI approximation at the encoder is used to compute the theoretical required SW rate, namely the conditional entropy H(XY), to represent the quantized source X. The pursued approach varies from existing schemes by relying on a SIdependent (SID) correlation channel[11, 32] to capture the dependency between the coarse SI and the WZ frame. To limit the likelihood that bitplanes fail to decode due to SW rate underestimation, the final SW rate is adjusted using a novel formula that takes the significance of the bitplane into account.
Third, based on the approximation of the SI and correlation channel model, a novel mode decision process is executed. Three coding modes are supported, namely, skip, intra and SW coding. The skip mode is applied per frequency band whereas intra and SW modes are assigned per bitplane. Since the proposed WZ architecture is feedbackfree, the full responsibility for selecting and signalling the coding modes falls to the encoder.
The proposed architecture also features specific measures at the decoder to reduce the suffered distortion due to any SW coded information that fails to decode properly. For this purpose, the principles of successive SI refinement[33] are adopted. The WZ frames are decoded in distinct stages called refinement levels[34], where at each stage, a higher quality version of the SI is generated. Given the improved SI at every refinement level, SW decoding is reattempted for all SW coded information that failed to decode at previous levels, when only a poorer version of the SI was available. The proposed decoding process thereby merges SI refinement, SW decoding and reattempts at SW decoding of any SW coded information that failed to decode at previous refinement levels. In contrast, the approach in[23] first decodes an entire WZ frame, after which additional SI updates create new opportunities to attempt decoding bitplanes that failed to SW decode successfully.
Finally, the proposed feedbackfree system is thoroughly evaluated. In this context, preliminary results of the proposed feedbackfree architecture without any encoderside mode decision were presented in[12]. The experimental results presented in this work, however, show the benefit of the different coding modes available to the proposed system and clarify their influence on the compression performance. Additionally, the compression performance of the proposed feedbackchannelfree DVC architecture is compared to the benchmark systems in DVC, that is, the DISCOVER codec[13] and H.264/AVC Intra[35], as well as our previous hashbased DVC system with feedback from[7]. In addition, compression results obtained using the proposed system including the presented mode decision process but configured with decoderdriven feedbackchannelbased rate allocation are included as well. The evaluation for a GOP size of 2, 4 and 8 shows comparable or superior performance compared to the DISCOVER[13] codec. Despite the additional tools, experimental results confirm that the proposed system maintains low encoding complexity.
3 Proposed feedbackfree distributed video coding architecture
The block diagram of the proposed feedbackfree WZ codec is presented in Figure 1.
3.1 The encoding procedure
Building on the architecture presented in[34], the encoder divides an input video sequence into GOPs. Every GOP contains a key frame I, which is coded using H.264/AVC Intra[35], and WZ frames X, which are WZ coded in the 4â€‰Ã—â€‰4 discrete cosine transform (DCT) domain. For the latter purpose, the quantization matrices from[36, 37] define uniform and doubledeadzone quantizers for the DC and AC frequency bands, respectively. The resulting quantization indices are grouped per band and organized into bitplanes, ready for SW coding based on LDPCA[38] codes. Additionally, the encoder creates a hash frame for every WZ frame, according to the technique presented in[11, 34] to enable hashbased SI generation at the decoder.
Since a feedback channel is not present, the encoder is forced to estimate the required channel rate for successful decoding per bitplane in every frequency band. For this purpose, the encoder generates a coarse approximation of the SI available to the decoder using a lowcomplexity SI generation technique that emulates the hashbased OBMEC used at the decoder. Using the approximated SI, the encoder computes the theoretical rate, that is, the conditional entropy given the coarse SI, for every bitplane in every quantized frequency band of the WZ frames. Finally, a rate formula is used to compute the final channel rate estimate from the conditional entropy in order to compensate for the mismatch between the SI estimated at the encoder and the real SI generated at the decoder.
Encoderdriven rate control is a sensitive process since underestimation leads to failed channel decoding and poorly reconstructed samples while overestimation wastes rate and no longer reduces the distortion level. To counter the effect of over and underestimation on the overall RD performance, the proposed system includes additional tools.
To mitigate the effect of overestimation, the encoder first applies a bandlevel mode decision process, referred to as skip mode selection. Skipped DCT bands are not actually coded but are substituted by the corresponding band in the SI at the decoder. The bitplanes of the bands that are not skipped are additionally subjected to a second mode decision process. The encoder decides whether a particular bitplane is SW encoded or encoded in intra mode using a binary arithmetic entropy coder. When a specific coding mode has been assigned to every bitplane, the bitplane is fed to the appropriate encoder (unless the band the bitplane belongs to is skipped). The resulting syndrome bits and binary arithmetic coded data are multiplexed with the hash bitstream, as well as the mode signalling information, and sent to the decoder or stored for offline decoding.
3.1.1 Lowcomplexity sideinformation generation
Reference frame averaging is a simple lowcomplexity technique to generate a coarse SI signal at the encoder to approximate the true SI. The result may resemble the SI generated at the decoder rather well for lowmotion sequences and small GOP sizes, e.g. GOP2. However, since mere averaging of the reference frames is incapable of capturing motion patterns, the SI estimate will significantly deviate from the true SI at the decoder when motion content or GOP size increase. Therefore, the proposed system features an alternative option to estimate the SI at the encoder, namely, a coarse approximation of the hashbased SI generation technique employed at the decoder. In other words, the encoder carries out a substantially simplified version of bidirectional OBMEC.
In detail, OBME is carried out on downscaled versions of the frames at the encoder in a hierarchical temporal prediction structure, similar to the prediction structure used in H.264/SVC[39]. Let Î¾â€‰=â€‰2^{k},k\xe2\u02c6\u02c6\mathbb{N} be the downscaling factor applied at the encoder side, resulting in frames with dimensions Wâ€²â€‰=â€‰W/Î¾, Hâ€²â€‰=â€‰H/Î¾. High downscaling factors Î¾ reduce the motion estimation complexity at the cost of reduced accuracy. Next, mimicking the SI generation process at the decoder[11, 34], the encoder divides every downscaled WZ frame into overlapping blocks Î² of Bâ€‰Ã—â€‰B pixels with an overlap step size of Îµ pixels, 1â€‰â‰¤â€‰Îµâ€‰<â€‰B.
For every such block, the best matching block is found in the reference frames R _{ n }, nâ€‰âˆˆâ€‰{0,â€‰1} within search range sr. To this end, the Hamming distance calculated from the most significant bit of the pixels values in the blocks is minimized. In other words, best matching blocks have a maximum number of colocated pixel values, for which the most significant bit is equal. To reduce the number of block matching operations per motionestimated block, the search range sr is kept low. Since, the downscaling process does not include filtering; actual downsampling is not required from an implementation point of view. Instead, the block matching process can only take samples located at the preserved row and column positions into account.
After OBME, every motion vector per overlapping block is upscaled by a factor Î¾, after which the upscaled motion field is used to motioncompensate blocks Î¾Î² of size Î¾Bâ€‰Ã—â€‰Î¾B from the original reference frames. Since these blocks are overlapping as well, every pixel position in the predicted frame belongs to a number of overlapping blocks Î¾Î², each of them linked with their best matching blocks in each of the reference frames. The pixels in these best matching blocks act as temporal predictors for the colocated pixels in the predicted block. In this way, every pixel in the predicted frame is linked to a set of candidate predictor pixels. Finally, the pixel values in the predicted WZ frame are calculated as the average of the candidate temporal predictors at every pixel position.
To reduce the involved computational complexity, particular measures are taken. Namely, (1) motion estimation is carried out on downscaled versions of the original frames, (2) the size of the overlapping blocks together with the overlap size is chosen to be large as to substantially reduce the number of motionestimated blocks, (3) the motion search range is kept small and (4) overlapping blocks with lowmotion characteristics are skipped and replaced by the average of the colocated blocks in the reference frames.
To reduce the complexity at the encoder, a block skip function is included. Namely, prior to motion estimation of a particular overlapping block, the Hamming distance between the colocated blocks in the reference frames is checked first. If the Hamming distance is smaller than a specific threshold T_{ H }, motion estimation is skipped and zero motion vectors are used instead. In this way, the parameter T_{ H } influences the number of skipped blocks and thereby the motion estimation complexity.
3.1.2 Determining the conditional entropy
Let X, á»¸ denote the random variables representing the transformdomain samples, that is, the transform coefficients, in the original WZ frame and the coarse SI frame, respectively. Duplicating the rationale at the decoder, the correlation between the transformed source X and coarse SI á»¸ is expressed as an additive noise channelX=\stackrel{\xcb\u0153}{Y}+\stackrel{\xcb\u0153}{Z}, where\stackrel{\xcb\u0153}{Z} is the random variable representing the samples of the estimated correlation noise.
Since both X and á»¸ are available at the encoder, the correlation noise\stackrel{\xcb\u0153}{Z} can be computed directly based on the histogram. Similar to the noise model established by the decoder, the SID correlation channel concept[11] is adopted. Specifically, the channel output X is modelled by a Laplacian distribution centred on the particular realization of the SI á»¹ with a standarddeviation Ïƒ(á»¹) that depends on á»¹. The different standard deviations are obtained using the offline SID correlation channel estimation (CCE) procedure, described in[11].
Once the correlation channel has been modelled, the conditional entropy of every bitplane in every frequency band (that is coded according to the QM) is determined. For simplicity, the presentation in the following is narrowed down to the coefficients X and á»¸ that belong to a specific frequency band Î². Not to overload the notation, the suffix Î² is omitted. Let M be the total number of bitplanes used to represent the coefficients in band Î². Denote by x _{ n }, á»¹ _{ n } the n th coefficient in the bands of the WZ and SI frame, respectively. Also, denote by q _{ n } the Mbit quantization index corresponding to x _{ n }. Finally, let{b}_{n}^{0},{b}_{n}^{1},\xe2\u20ac\xa6,{b}_{n}^{M1} represent the bits composing the binary representation of index q _{ n }, where{b}_{n}^{0} is the most significant bit. With these notations, the conditional probability{p}_{n}^{m} of bit m of the n th quantized coefficient in X is calculated according to[22],
wherep\left(\left({b}_{n}^{0},{b}_{n}^{1},\xe2\u20ac\xa6,{b}_{n}^{m1},{b}_{n}^{m}\right\stackrel{\xcb\u0153}{y}\right) andp\left(\left({b}_{n}^{0},{b}_{n}^{1},\xe2\u20ac\xa6,{b}_{n}^{m1}\right\stackrel{\xcb\u0153}{y}\right) are evaluated using the SID correlation channel model estimated at the encoder. Then, the conditional entropy{H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m} of the entire bitplane m, given the transformdomain coarse SI á»¸, is computed as[22]
where N is the total number of coefficients in frequency band Î².
3.1.3 Proposed coding mode selection process
Concerning mode decision, similar to[28], the skip mode is selected on a frequency band basis, in which case, entire frequency bands from the SI are substituted in the reconstructed WZ frames. Such an approach is advantageous in the sense that highfrequency components are often less important and can be replaced by the corresponding components at a relatively small distortion penalty while no rate is spent. Moreover, skipping entire frequency bands creates more consistent reconstructed coefficients compared to, for instance, a bitplanebased skip where potentially erroneous bits are introduced. Such erroneous bits undermine the successful decoding of any less significant SW bitplanes since during the creation of the softinput information every already decoded bitplane is assumed to be correct. What is more, even when decoding of subsequent bitplanes proves successful, be it intra or SW, any errors in more significant skipped bitplanes would push the reconstructed coefficient value into the wrong quantization bin which increases the incurred distortion. Moreover, the rate spent on any subsequent less significant bitplanes is used suboptimally. On the other hand, both intra and SW modes are assigned on a bitplane basis. Intra coding is an attractive alternative when SW coding is expected to be inefficient due to poor SI. Under this condition, SW decoding failure is a potential risk. In this context, intra coding is favoured for bitplanes with higher significance as to further reduce the danger of distortion due to significant SW coded bitplanes that fail to decode.
The proposed mode selection is performed on the fly at the encoder and the selected modes are signalled to the decoder. The mode signalling information (MSI) is compiled per frame and organized into a binary map. For every frequency band considered in the relevant QM, the MSI indicates whether the band is skipped or not, while for every bitplane in a coded band the MSI signals whether intra or SW coding is applied. The binary MSI string is compressed using binary arithmetic coding and the resulting bitstream is multiplexed with the intra, SW and hash bitstream.
3.2 Skip mode selection
For clarity of the ensuing discussion, the source data is confined to the transform coefficients of a single frequency band Î². The problem to solve is whether the coefficients in band Î² should be skipped or not. On the one hand, skipping a band does not spend any rate at the cost of the distortion incurred by using the SI as reconstruction. On the other hand, coding a band consumes rate with the benefit of reduced distortion. Such a balancing act can be expressed as a Lagrangian cost. The Lagrangian cost function C _{Skip}, when skipping band Î², is given by:
where R _{Skip}, D _{Skip} are, respectively, the required rate and suffered distortion. In a complementary manner, the cost function C _{NoSkip}, when frequency band Î² would be coded, is given by:
where R _{NoSkip} is the rate for coding the bitplanes of the band and D _{NoSkip} corresponds to distortion from quantization, under the assumption that all bitplanes are correctly decoded. The Lagrange multiplier Î» in Equations (3) and (4) controls the relative importance of rate versus distortion in the total cost.
In case the frequency band is skipped, no data is actually coded. Hence, it is trivial that R _{Skip} = 0. When the band is not skipped, the rate is approximated by the sum of the theoretically required SW rates for coding the bitplanes composing the quantization indices of the quantized transform coefficients in the band. The conditional bitplane entropy{H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m} of bitplane m, given the coarse SI á»¸ is given in Equation (1). Hence, the estimated total rate is the sum of the conditional entropies over all M bitplanes, that is,
Regarding the computation of the distortion contributions in the Lagrangian cost functions, the distortion suffered from skipping the band is due to the reconstruction at the colocated SI values. However, the true SI Y is not available at the encoder, where only its coarse approximation á»¸ is present. Hence, the mean square error (MSE) distortion D _{Skip} is estimated using the coarse SI á»¸ as:
where N is the number of coefficient samples in the frequency band and x _{ n }, y _{ n } are the n th sample value of the original coefficients X and á»¸, respectively.
In case the frequency band is not skipped, the expected MSE distortion D _{NoSkip} between the original coefficients X and their reconstruction\widehat{X} is expressed by:
where{\widehat{x}}_{n} is the reconstruction of the n th sample value x _{ n }. Mimicking the decoder operation under the assumption that all bitplanes representing X are properly decoded, the reconstruction points at the encoder are derived from coarse SI values á»¹ _{ n } and the encoderside correlation channel statistics{f}_{\left(X\right\stackrel{\xcb\u0153}{Y}}\left(\left(x\right\stackrel{\xcb\u0153}{Y}={\stackrel{\xcb\u0153}{y}}_{n}\right). In particular, the reconstruction of the n th sample is approximated by:
where u(q _{ n }), l(q _{ n }) are the upper and lower bound of the quantization interval defined by q _{ n }, respectively.
An appropriate Î» configuration was obtained as a result of offline experimentation on (1) a set of medium and highmotion sequences different from the ones reported in Section 4 and (2) over the entire rate range per sequence. The Î» parameter is calculated according to the form:
where the parameter Q _{ m } is the sequence number, ranging from 1 (lowest quality) to 8 (highest quality), of the QM[36, 37] used for quantizing the WZ frames and Î» _{1,} Î» _{2} are the model parameters. Then the final decision whether to skip frequency band Î² is made by comparing both Lagrangian cost functions C _{Skip} and C _{NoSkip}, the smaller of the two indicating the selected coding mode.
3.3 Intra mode decision
From a compression point of view, intra coding is theoretically less efficient given that the entropy H _{ X } 'the theoretical lower bound for intra codingâ€™ is always higher than or equal to the conditional entropy{H}_{X\stackrel{\xcb\u0153}{Y}} the theoretical limit for the SW mode. Nevertheless, the option of an intra coding mode is very attractive in the context of the proposed feedbackfree WZ architecture to reduce the suffered distortion. Indeed, intra decoding success is independent of the quality of the SI and does not depend on any encoderside rate estimation.
As before, the binary representation of every quantization index q is composed by M bits, b ^{0},â€‰b ^{1},â€‰â€¦,â€‰b ^{Mâ€‰â€‰1}, with b ^{0} the most significant bit. Then, the entropy{H}_{X}^{m} of bitplane mâ€‰=â€‰0,â€‰1,â€‰â€¦,â€‰Mâ€‰â€‰1 is
where the bit probabilities p(b ^{m}) are obtained directly from the histogram of the quantization indices q composing frequency Î² band.
The decision process whether to apply SW or intra coding to bitplane m is based on a comparison between the bitplane entropy{H}_{X}^{m} and the conditional bitplane entropy{H}_{X\left\stackrel{\xcb\u0153}{Y}\right.}^{m}, specifically,
with Î¼(m) â‰¥ 1 and of the form,
where Î¼ _{1} and Î¼ _{2} are the model parameters. If Equation (11) is true, bitplane m is SW coded, otherwise intra coding is applied.
Significant bitplanes that are SW coded but fail to decode properly due to channel rate underestimation introduce large reconstruction errors, even when subsequent bitplanes of lesser significance are successfully decoded. Additionally, the generation of the softinput information to initialize the LDPCAdecoder for decoding a particular bitplane supposes that all previous bitplanes have been correctly restored. Erroneously decoded bits distort the soft input information, that is, the LLRs are calculated from erroneous data, which could result in failure to decode even though the assigned channel rate would prove sufficient when the soft input information were derived under errorfree conditions.
Gradually decreasing Î¼(m) from the most to the least significant bitplane tends to concentrate the likelihood of intra coding at the more significant bitplanes. As a result, the chance of reconstructing coefficients with large distortion at the decoder is reduced, since proper intra decoding is guaranteed. Moreover, channel decoding of later SW coded bitplanes is less disrupted by softinput information derived from already decoded bitplanes that contain errors.
3.3.1 Finalizing the SlepianWolf rate
For those bitplanes mâ€‰âˆˆâ€‰{0,â€‰â€¦,â€‰Mâ€‰â€‰1} that are SW coded, the theoretically required channel rate for successful decoding given the coarse SI signal á»¸, i.e. the conditional entropy{H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m}, is adjusted. This compensates for the mismatch between the SI á»¸ approximated at the encoder and the SI Y generated at the decoder and the fact that the estimated virtual correlation channel, identified by{f}_{X\left\stackrel{\xcb\u0153}{Y}\right.}\left(x\left\stackrel{\xcb\u0153}{y}\right.\right) is not identical to the one actually used at the decoder, governed by f _{ XY }(xy).
Based on{H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m}, where0\xe2\u2030\xa4{H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m}\xe2\u2030\xa41, the following simple yet effective rate formula is used to calculate the final rate{R}_{\mathit{\text{SW}}}^{m} for the SWcoded bitplane m in band Î²,
where g(m) is a linearly increasing function given by g(m)â€‰=â€‰bâ€‰+â€‰(aâ€‰â€‰b)/Mâ€‰Â·â€‰(mâ€‰â€‰1), where a,â€‰bâ€‰âˆˆâ€‰[0,â€‰1], aâ€‰>â€‰b are the model parameters and M is the total number of bitplanes in Î². Under these conditions, 0â€‰<â€‰g(m)â€‰â‰¤â€‰1 and thereby,{R}_{\mathit{\text{SW}}}^{m}\xe2\u2030\yen {H}_{\left(X\right\stackrel{\xcb\u0153}{Y}}^{m} holds for every bitplane m. The rationale behind Equation (13) is the following. The incurred distortion due to decoding failure is higher for more significant bitplanes. Therefore, the exponent in Equation (13) increases with m, thus compensating more rate for more significant bitplanes. Finally, after the rate has been adjusted according to Equation (13), the closest supported syndrome for bitplane m, that is, with length closest to\mathit{\text{ceil}}\left[{R}_{\mathit{\text{SW}}}^{m}N\right] is sent to the decoder.
3.4 The decoding procedure
In the decoder, the key frames are decoded, reconstructed and stored in a buffer to serve as reference frames for motion estimation. The hash is decoded as well. Then, every WZ frame is decoded in distinct stages, called SI refinement levels (SIRLs), where after each stage, a higher quality version of the SI is generated by the decoder. Every SIRL is built around frequency bands of the 4Ã—4 DCT aggregated along the diagonal, as introduced in our previous work[34]. The proposed feedbackchannelfree DVC architecture takes advantage of the presence of the SI refinement scheme to reduce the distortion incurred at the decoder.
Figure 2 shows an overview of the proposed method. Suppose there is a total number of L refinement levels SIRL_{ l }, lâ€‰=â€‰{1,â€‰2,â€‰â€¦,â€‰Lâ€‰â€‰1}. At the first level, SIRL_{0}, the decoder creates the initial SI Y _{0}, using the designated hashbased OBMEC with subsampled matching (OBMEC/SSM) technique from[11]. The 4Ã—4 DCT is applied and the coding MSI for the bitplanes in the frequency bands that belong to SIRL_{0}, that is, the DC band, is addressed.
When the skip mode was selected for the DC frequency bands, no decoding, and as a consequence, no CCE or reconstruction takes place and the coefficients in the DC band of the current version of the SI are copied into the partially decoded frame in the transform domain. When the band was not skipped, every bitplane composing the band is passed to the appropriate decoder, as dictated by the MSI, with the understanding that SW coded bitplanes might fail to decode while the intracoded bitplanes are guaranteed to decode successfully. The success of SW decoding is determined as in[40].
As more bitplanes of a band are processed, the bitplaneperbitplane progressively refined CCE algorithm of[11] simultaneously updates the correlation channel estimate for that particular frequency band. For decoding SW bitplanes, the last update of the correlation channel estimate serves as basis to generate the softinput for the LDPCA decoder. When the bitplane fails to decode, the erroneous bitplane is still used to update the correlation channel estimate. For those bitplanes that require binary arithmetic decoding, CCE is irrelevant. However, after decoding, these bitplanes are valuable to the CCE algorithm to further refine the estimate.
When all bitplanes have been decoded, the coefficients are reconstructed at the centroid calculated over all quantization bins that match the correctly decoded bitplanes, using the available SI and CCE result. Specifically, the n th coefficient in a decoded band is reconstructed as:
where I _{ n } is the set of quantization indices q that agree with the successfully decoded bitplanes in the band and u(q), l(q) are the respective upper and lower edge of the interval designated by index q.
The frequency bands that have not yet been processed are substituted by their colocated counterparts in the SI and the application of the IDCT yields the partially reconstructed frame{\widehat{X}}_{0} in the spatial domain. The applied reconstruction technique is optimal in the MSE sense given the SI, even when SW coded bitplanes failed to decode and no unique quantization bin in which to reconstruct is available.
The primary objective, however, should be to minimize the total number of SW decoding failures. Therefore, with the intention of minimizing the number of bitplanes that fail to SW decode successfully, the proposed WZ system exploits the successive SI refinement loop at the decoder, by reattempting to decode any bitplanes that failed to decode at previous SIRLs using the updated SI available at the current refinement level. At the same time, the distortion due to skipped frequency bands can be mitigated by substituting the colocated frequency bands in the latest version of the SI into the partially decoded frame at every successive SIRL.
In detail, for every SIRL_{ i }, i > 0, the partially decoded frame{\widehat{X}}_{i1}, created at the previous level, is used in another round of SI generation by means of OBMEC[34], where the SAD criterion is used as the error metric during block matching since the hash frame is no longer involved in the motion estimation. The resulting motioncompensated frame serves as a new version of the SI information and is converted to DCT domain. Then, a new correlation channel estimate is executed for all the coded bands belonging to any previous SIRL_{ j }, j < i. The result is then used, together with the SI Y _{ i } to create the softinput information for any bitplanes of SIRL_{ j }, j < i, that failed to decode and reattempt SW decoding given the fixed number of received syndrome bits. These bands are indicated by the light grey cells in Figure 3 for every level in a configuration using six SIRLs. Although successful decoding is still not assured, any bitplanes that actually are decoded successfully reduce the distortion without additional rate. Next, the frequency bands that actually belong to SIRL_{ i } are handled (i.e. the dark grey cells in Figure 3).
The mode selection information determines whether a band is skipped or not and in the latter case decides whether a bitplane is passed to the LDPCA or binary arithmetic decoder. For every bitplane, a CCE is performed to generate softinput information to enable SW decoding or simply to update the CCE algorithm for the next bitplane. When all bitplanes have been processed, the coefficients in the band are reconstructed given the current SI Y _{ i }. Due to the CCE at every refinement level using the updated SI, the reconstruction of the coefficients in the already completed SIRL is further improved as well. At last, the coefficients of the bands of the processed SIRLs are assembled with the SI coefficients belonging to the as of yet not decoded bands, which after the IDCT yields the partially decoded WZ frame{\widehat{X}}_{i}. To sum up, the proposed decoding process merges SI refinement, CCE updating, SW decoding and reattempting to decode any SWcoded bitplanes that failed to decode at previous refinement levels. This contrasts the approach proposed in[23], where an entire WZ frame is first decoded completely after which additional SI updating and CCE runs enable new opportunities to attempt decoding bitplanes that failed to SW decode successfully.
Yet, when all SIRLs have been terminated, the proposed decoder architecture still does not guarantee all SW coded bitplanes have been decoded properly. Therefore, similar to[23], additional SIRLs are added. These supplemental SIRLs, that is, the SIRL_{ i }, i > 5 in the configuration depicted in Figure 3, solely consist of OMBECbased SI generation, CCE, reattempting to decode any failed bitplanes and reconstructing the coefficients in all frequency bands given the updated SI and correlation channel model. The number of these additional refinement levels is controlled by a fixed parameter or are skipped when all bitplanes happen to decode properly.
4 Experimental results
4.1 Experimental setup and codec configuration
The proposed feedbackchannelfree WZ codec is configured using the following settings. The parameters governing the hash formation process, as well as the hashbased SI generation method used to create the initial version of the SI at the decoder, are identical to the configuration used in[11]. A total number of seven SIRLs are considered. During the first five levels new frequency bands are decoded, while the last two SIRLs only serve to reduce the number of SW coded bitplanes that fail to decode.
Regarding the configuration of the encoderside components for rate control and coding mode selection, the following parameters are used. The coarse SI generation module uses a downscaling factor Î¾ = 4. The size of the overlapping blocks is B = 8 with an overlap step size of Îµ = 4. The search range is put to sr = 4 pixels. The resulting motion vectors are upscaled by a factor Î¾ = 4 to motioncompensate overlapping blocks of size Î¾Bâ€‰Ã—â€‰Î¾B pixels, i.e. 32Ã—32 pixels, from the original sized reference frames. Additionally, the threshold T _{ H } that controls whether a block is skipped during motion estimation is set to T _{ H } = 12. Namely, a block is skipped if the number of unequal bits at the same position in the two colocated blocks in the reference frame is lower than 12, which is equivalent to a pixel error ratio of 12/(8â€‰Ã—â€‰8)â€‰â‰ˆâ€‰0.18. Concerning the mode selection modules, the parameters that control the Lagrange multiplier Î» in Equation (9) are fixed to Î» _{1} = 0.03 and Î» _{1} = 0.5. Similarly, the parameters Î¼ _{1}, Î¼ _{2} in Equation (12) to derive Î¼(m), which governs the intra mode decision process, are set to Î¼ _{1} = 0.5 and Î¼ _{2} = 2.0. Finally, the model parameters of the exponent g (m) in the rate formula of Equation (13) are put to b = 0.4 and a = 1.0.
The parameters were derived heuristically based on offline experimentation on a training set, excluding the sequences reported in the experimental results. The values were selected to achieve good RD performance for various degrees of motion while not compromising the complexity at the encoder. In this context, the RD performance could be further optimized for different motion profiles. For instance, in case of highmotion sequences, the SI approximation module could be configured using a less strict, that is, lower, threshold T _{ H }. Skipping a lower number of blocks during coarse motion estimation would increase the accuracy of the resulting SI approximation, in particular when the motion content is high. Additionally, a smaller overlap step size Îµ and/or a smaller downscaling factor Î¾ would increase the accuracy of the temporal prediction. However, all these measures have to be applied carefully since these would increase the SI approximation complexity at the encoder. On the other side, in case of lowmotion sequences, more overlapping blocks could be skipped during the SI approximation, without significantly undermining the temporal prediction accuracy. Analogously, the number of overlapping blocks may be decreased. In this regard, a lowmotion profile would impose less complexity on the encoder.
Further room for optimization may be achieved by tuning the parameters controlling the mode decision processes. Indeed, a highmotion parameter profile may put less stress on skipping frequency bands but rather put more emphasis on intracoded bitplanes. Conversely, lowmotion profiling should be more advantageous towards skipping frequency bands while penalizing the intra coding mode. However, the single parameter profile presented in this work was determined to (1) achieve good RD performance over all motion profiles, while (2) containing the additional complexity imposed on the encoder such that the lowcomplexity encoding characteristics are not compromised.
4.2 Mode selection evaluation
In the first set of experiments, the influence of the different coding modes is illustrated. To this end, the compression performance of four versions of the proposed system is assessed. The first version only supports the SW coding mode and essentially corresponds to our previous system presented in[12]. The second and third versions support an additional skip (SW+Skip) or intra (SW+Intra) coding mode, respectively. The final version of the system features all three coding modes (SW+Skip+Intra). Figures 4 and5 show the compression performance of the proposed feedbackfree DVC system with the four configurations on Foreman and Soccer QCIF 15Hz, GOP 2, 4 and 8. For the system featuring all three coding modes, Table 1 reports the percentage of bitplanes assigned to each mode for every GOP size at each considered RD point. Table 2 zooms in on the skip mode and provides insight to which frequency bands are skipped at the lowest and the highest RD point, corresponding to QM 1 and 8, respectively. Table 2 presents the frequency bands similar to the presentation of a QM, where frequency bands that are not coded according to the QM are marked not applicable (na).
Regarding the results obtained on the Foreman sequence, a mediummotion sequence with complex facial expressions, SW coding plus the skip outperforms SW coding with the intra mode option in a GOP of 2. The quality of the SI of the skipped frequency bands is sufficient as not to have a negative impact on the distortion at no rate cost. For a GOP size of 4, the quality of the SI drops which results in comparable performance whether SW coding is supplemented with the skip or intra coding mode. When the GOP size increases to 8, resulting in a declining SI quality, the intra coding mode  added as an alternative to SW coding  results in superior performance. Although the influence of adding coding modes is notable, the performance of the proposed encoderdriven rate control by itself, that is, the version with SW coding only, is, although lower, still respectable. The configuration with all three coding modes available delivers the best performance, taking advantage of the effects of both supplemental coding modes, namely, (1) an increase in quality of the reconstructed WZ frames due to the guaranteed successful decoding of intracoded bitplanes at the expense of rate, which is compensated by (2) a reduction in rate by skipping bitplanes without notably compromising the distortion.
When all three coding modes are enabled, the statistics of the coding mode assignment for Foreman in Table 1 show that skip is dominant for the lower RD points. For instance, Table 1 reports that 84% of the bitplanes are actually skipped at the lowest RD point in a GOP of 2, while at the second RD point in a GOP of 2 the percentage of skipped bitplanes drops to 57%. In the higher rate region, that is, QM 7 and 8, SW coding dominates, accounting for around 50% of the bitplanes. The remaining bitplanes are more or less evenly distributed between the skip and intra modes. Regarding the distribution of the frequency bands affected by the skip mode, Table 2 reports a somewhat uniform distribution of the three frequency bands relevant to the lowest RD point all GOPs. At the highest rate point (QM8), however, it is clear that the majority of skipped bitplanes belong to high frequency bands. This is in line with traditional compression logic, where highfrequency information can often be discarded without causing an exaggerated increase in distortion.
The influence of the skip and intra coding options becomes more prominent in case of Soccer 'a highmotion sequenceâ€™ where SI generation is strained. Considering the system with SW coding only, Figure 5 shows that the RD performance experiences a saturation effect at the higher rate points, in particular for the larger GOPs. The encoder has difficulty to create an accurate approximation of the SI, and thereby accurate SW rate estimates. Adding only the skip mode does not increase the performance since for such a highmotion sequence, the quality of the SI at the decoder is lower. However, adding the intra coding mode drastically increases the compression performance and completely removes the performance saturation. Since intra coding is completely independent of the SI, it boosts the performance when the SI quality is low. Under such conditions, replacing SW coding with intra coding decreases the distortion by reducing the number of bitplanes that fail to decode, in particular significant ones. When all modes are turned on, the system takes full advantage of the effect of the skip and intra modes.
As expected, due to the increased motion content in Soccer, the skip mode is selected less often compared to Foreman, in particular in the lower rate region (see Table 1). Towards the higher rates, the percentage of skipped bitplanes approaches the skip level of Foreman. According to Table 2, the distribution of the skip mode over the frequency bands is similar to Foreman. The most important observation on Soccer is the increased occurrence of the intra coding mode, which is imperative to maintain competitive compression performance for highmotion sequences as seen Figure 5.
To sum up, the skip mode targets rate reduction without invoking a degree of distortion that undermines the RD performance. This is particularly useful at low rates or for low to mediummotion sequences. The intra coding mode targets bitplanes that are difficult or inefficient to code using SW principles. The SW coding mode holds the middle ground, offering good compression when the SI is of sufficient quality.
Since the entire mode decision process is encoderdriven, a rate overhead to signal the selected coding modes is present. As mentioned in Section 3, the MSI is compiled per frame and organized into a binary map, which is entropy coded using binary arithmetic coding. Per frequency band actually considered in the applied QM, the MSI indicates whether the band is skipped or not, while for every bitplane in band that is not skipped the MSI signals whether intra or SW coding has been assigned. Table 3 provides insight in the mode signalling rate overhead (kbps) at all RD points and GOP sizes for Foreman and Soccer. As expected, the required rate to code the MSI is largest for QM8, since both the number of frequency bands and the number of bitplanes assigned to each band is the largest. This setting corresponds to the largest number of mode decisions at the frequency level, in case of the skip mode, as well as the bitplane level, in case of the intra or SW mode. Similarly, the rate overhead increases with increasing GOP size, due to the increased number of WZ frames for which mode decision is executed. Nevertheless, the MSI rate overhead remains rather small for all RD points and all GOP sizes, with the maximum rate reported in Table 3 equalling 1.29 kbps for the highest RD point in Soccer GOP of 8.
Rate points are determined by the quality of the intra frames (QP_{Intra}) and the employed QM.
4.3 The effect of sideinformation approximation at the encoder
To create an approximation of the SI, the encoder of the proposed WZ codec features a lowcomplexity SI generation technique. However, the proposed SI approximation method based on motion estimation is inevitably more complex than reference frame averaging. It is therefore necessary to examine the performance gain brought by the proposed technique at the cost of increased complexity.
The compression performance on Foreman and Soccer, shown in Figure 6a,b, respectively, illustrates the effect of both encoderside SI approximation techniques on the performance of the proposed feedbackfree DVC system. The performance when using the proposed lowcomplexity technique 'marked MEâ€™ and reference frame averaging 'marked RAâ€™ nearly coincides for Foreman, GOP2. Consequently, the proposed encoderside SI estimation technique is not warranted in this case, since the limited gain in RD performance does not compensate for the increase in complexity. When the GOP size increases, however, the proposed technique clearly outperforms reference frame averaging. On Soccer, the proposed motion estimationbased method systematically asserts itself as the better of the two at all GOP sizes.
Overall, the proposed feedbackfree system configured with the lowcomplexity OBMEC method, outperforms the system where the SI is approximated by reference frame averaging. However, the difference in compression performance is not overwhelming at all times. For example, when the motion activity is low and the GOP size is small, reference frame averaging seems a good option given the lower computational cost. In general, the choice of the SI approximation method should be evaluated in an application context to establish whether increased compression performance outweighs the added encoding complexity.
4.4 Performance comparisons against benchmark techniques
Compression results are presented on the Foreman, Carphone, Soccer and Football sequences, all at QCIF 15Hz. All sequences are organized into GOPs of size 2, 4 and 8. The compression performance of the proposed feedbackfree architecture with all three coding modes activated is benchmarked against the feedbackchannelbased DISCOVER[13] codec, as well as H.264/AVC Intra[35]. In addition, compression results obtained with our previous hashbased DVC architecture from[7] are also included. The system generates SI by means of hashbased OBMEC, where the original WZ frames are downscaled and subsequently coded using H.264/AVC Intra coding at low bitrate to create the hash information. Similar to DISCOVER[13], rate allocation is implemented as a requestanddecode approach via a feedback channel. Finally, the compression performance of the proposed feedbackfree architecture is compared to the performance of the same system, including all three coding modes, when rate allocation is implemented using the feedback mechanism as in DISCOVER[13]. In this context, it is important to emphasize that systems with decoderdriven feedbackbased rate allocation enjoy the advantage in terms of RD performance, since the presence of a feedback channel not only guarantees decoding success but also ensures that decoding is achieved at a minimal rate. On the other hand, such systems are unable to support unidirectional application scenarios and link the encoding and decoding process, which may lead to excess delays.
A summary of the compression performance comparison of the proposed feedbackfree architecture, expressed in BjÃ¸ntegaard[41] rate reduction (%) and peak signaltonoise ratio (PSNR) improvement (dB), with respect to DISCOVER[13], our previous hashbased DVC system from[7] and the proposed system with feedback, is given in Table 4.
Regarding the performance on Foreman, shown in Figure 7, the compression performance of the proposed feedbackchannelfree architecture is roughly comparable to the performance of DISCOVER, H.264/AVC Intra and the hashbased system from[7] for a GOP of 2. Amongst these, the DVC systems slightly outperform H.264/AVC Intra at low rates but lag somewhat behind at the highest rates. However, the proposed DVC architecture using all three presented coding modes outperforms all other systems when rateallocation is performed at the decoder via a feedback channel. At GOP sizes 4 and 8, the proposed feedbackfree system sill outperforms DISCOVER at low to medium rates, steadily losing ground towards the highest rate point. Table 4 reports a BjÃ¸ntegaard[41] rate gain of DISCOVER with respect to the proposed unidirectional system of 4.21% for a GOP of 2 which turns into a rate loss of 1.60% and 10.33% for GOP4 and GOP8, respectively. Our previous hashbased system with feedback[7] consistently outperforms both DISCOVER and the proposed feedbackfree solution. When feedback is turned on in the proposed DVC architecture, the compression performance of the resulting system is boosted, regularly surpassing the performance of H.264/AVC Intra.
The results on Carphone, shown in Figure 8, are more or less similar to the results obtained on Foreman. The DVC systems have comparable performance at lower rates, superior to H.264/AVC Intra. In the higher rate region, H.264/AVC Intra becomes dominant. At the same time, DISCOVER outperforms the proposed feedbackfree WZ architecture. Comparing these WZ architectures, Table 4 reports BjÃ¸ntegaard[41] rate gains from 4.07% to 9.80% of DISCOVER compared to the proposed system. The two remaining feedbackbased DVC architectures outperform both DISCOVER and the feedbackfree solution, where the DVC system supporting the proposed mode decision process is consistently on top.
The complex motion content in Soccer is highly favourable to H.264/AVC Intra. Indeed, Figure 9 shows that the performance of H.264/AVC Intra is far better than the performance of the WZ codecs, which are hardpressed estimating the motion at the decoder. However, the proposed feedbackfree system outperforms the feedbackbased DISCOVER codec at all GOP sizes and RD points, save the highest rate point where a slight performance loss is incurred. In general, DISCOVER is outperformed by the proposed system with BjÃ¸ntegaard[41] rate losses ranging from 9.80% in a GOP of 2 up to 15.94% in a GOP of 8. Similarly, Football exhibits a very high degree of motion and complex camera movements. Again, these conditions are to the advantage of H.264/AVC Intra, as supported by the results in Figure 10. In general, DISCOVER is outperformed by the proposed feedbackfree WZ architecture in all GOP sizes, with BjÃ¸ntegaard[41] rate losses from 2.82% to 1.28% in a GOP of 2 and 8, respectively. As shown in Figure 5, the application of the intra mode safeguards the compression performance of the proposed codec when highquality SI generation is strained. Regarding the performance of our previous hashbased system from[7] and the performance of the proposed system configured with feedbackbased rate allocation, both systems significantly outperform DISCOVER and the proposed feedbackfree architecture. This is expected, since hashbased architectures in combination with decoderdriven rate allocation via feedback are known to perform particularly well compared to alternative DVC systems when the motion content is complex.
It is noteworthy that the reported average compression improvements over DISCOVER constitute an important achievement seeing the fact that the proposed system does not employ a feedback channel. Albeit rate losses versus DISCOVER are observed, they are not dramatic.
4.5 Complexity assessment
Since low encoding complexity is the prime motivation for DVC, an evaluation of the complexity characteristics of the proposed feedbackchannelfree architecture is unavoidable. To this end, the methodology used in[13, 42] is adhered to; that is, the encoding and decoding complexity are evaluated using execution time measurements under regulated conditions^{b}.
In addition to a regular TDWZ coding part, that is, blockbased DCT, quantization and SW encoding, the proposed feedbackfree DVC encoder also performs (1) coarse SI approximation, (2) bitplane entropy estimation and (3) mode decision. Moreover, the MSI, as well as the intracoded bitplanes, undergo (4) binary arithmetic coding. On the other hand, bitplanes belonging to skipped frequency bands are not coded at all, which has a beneficial impact on the encoding complexity. All things considered, the encoding complexity of proposed feedbackfree system is expected to be higher than a comparable architecture with feedback.
On the other hand, the pursued strategy is expected to have a beneficial effect on the decoding complexity. A decoderdriven rate control scheme is based on repeated 'requestanddecodeâ€™ operations until decoding proves successful. Although decoderdriven rate control guarantees decoding at a minimal SW rate, the series of decoding attempts severely increases the decoding complexity. In contrast, encoderdriven rate control assigns a fixed 'takeitorleaveitâ€™ amount of nonsystematic information and only one decoding attempt is made. Hence, the decoding complexity of the proposed feedbackfree system is expected to be significantly lower than that of the feedbackbased systems.
4.5.1 Encoding complexity
The execution times (s) for encoding the entire Foreman and Soccer sequences with H.264/AVC Intra and the proposed feedbackfree WZ codec are compared in Table 5. Results are presented for a GOP of 2 and 8, at every considered RD point, determined by the quantization parameter (QP) for the key frames and the QM used for the WZ frames.
The total execution time C _{FBF} of the proposed encoder is split into separate components. The part required to code the key frames is denoted by C _{Key}. The complexity contribution C _{WZ} encompasses the blockbased DCT, quantization, as well as the rate control algorithm, the mode decision process and the intra coding of the MSI. Additionally, the result of the mode decision process is also covered by C _{WZ}, that is, SW or intra coding of the bitplanes depending on the selected mode. The component C _{SI} corresponds to the time required to generate the coarse approximation of the SI at the encoder, while C _{Hash} represents the time to form and code the hash frames.
Table 5 learns that the encoding complexity C _{FBF} of the proposed feedbackfree system is still significantly lower than C _{Intra}, the encoding complexity of H.264/AVC Intra. Specifically, the complexity ratio C _{FBF}/C _{Intra} fluctuates around 65% to 69% for all RD points of Foreman and Soccer in a GOP of 2. The larger the GOP size, the larger the complexity reduction of DVC over conventional coding solutions, as more frames are coded using the WZ principle. Table 5 shows that the ratio C _{FBF}/C _{Intra} ranges from 42 down to 35% for Foreman and from 48% to 38% for Soccer in a GOP of 8.
To illustrate the consequence in terms of added complexity of the proposed hashbased motion estimation approach to estimate the SI at the encoder, the encoding time ratio of the proposed feedbackfree system using reference frame averaging and H.264/AVC Intra (C _{FBF + RA}/C _{Intra}) is presented as well. As expected, Table 5 shows that the encoding complexity when the SI is approximated by the average of the corresponding reference frames is lower than the proposed method, including motion estimation. The reduction in relative encoding complexity of reference frame averaging with respect to H.264/AVC Intra ranges from 4% to 13%.
4.5.2 Impact of encoderdriven rate control and mode decision on the decoding complexity
As explained above, the complexity associated with SW decoding is drastically reduced for the proposed system due to the reduction in soft decoding runs. Additionally, the proposed feedbackfree architecture includes a second source of SW decoding complexity reduction, namely, the presence of skip and intra coding modes. The proposed system implements SW coding by means of LDPCA[38] channel codes, which uses the message passing algorithm for decoding and is vastly more complex than binary arithmetic decoding. Skipped bands are simply replaced by the corresponding bands in the SI, further reducing the complexity of the decoder.
To quantify the aforementioned complexity reduction, three different WZ architectures are compared. The first system is the proposed feedbackfree DVC system, including all modes. The second system is identical to the first, save for the encoderdriven rate control, which has been replaced by a decoderdriven scheme using feedback. A comparison between the two focuses on the effect of feedbackchannel suppression on the SW decoding complexity. The third system performs decoderdriven rate control but the skip and intra modes are stripped. The last system provides insight in the effect of the coding modes on the decoding complexity.
For all three WZ codecs, the execution time for decoding all bitplanes of the WZ frames of Foreman and Soccer in a GOP of 2 and 8 at every considered RD point is summarized in Table 6. As expected, the feedbackfree version operates a little under 10% of the complexity of the version with feedback and including all coding modes, and even under 5% of the complexity of the system with feedback and SW coding only; this both for Foreman and Soccer, GOP2 and GOP8.
5 Conclusions
This work presented a novel encoderdriven rate control solution for unidirectional DVC. The proposed scheme first creates an encoderside approximation of the SI available to the decoder. As an alternative to reference frame averaging, a novel lowcomplexity SI estimation technique is proposed to imitate the hashbased SI generation technique used at the decoder. Next, the encoder determines the theoretical lower bound to represent the transformed and quantized original WZ frames, given the estimated SI. To this end, a SID correlation channel is estimated between the original frame and the SI, from which the conditional entropy is derived for every bitplane. To increase the performance, the proposed system features multiple coding modes. The original WZ frames, together with the coarse SI estimate and the derived correlation model forms the basis of the mode selection process. At a frequency band level, bands for which the quality of the SI is expected to be high are skipped. At a bitplane level, bitplanes for which the SI is believed to be of low quality, intra coding is applied. The remaining bitplanes are coded using SW coding principles. In this context, the final SW rate is adjusted using a novel formula, as to limit the number of bitplanes that fail to decode. At the decoder, an efficient SI refinement strategy is exploited. At every SI refinement stage, a SW coded bitplane that failed to decode properly at any previous level is decoded anew, given the updated SI of higher quality that is available at the current refinement level. Experimental results illustrated the effect of the proposed SI approximation technique, the proposed selection of distinct coding modes and mode decision process. From an RD perspective, the proposed feedbackfree WZ architecture clearly outperforms the feedbackbased benchmark DISCOVER codec, on highmotion sequences. When the motion content is medium or low, the proposed system outperforms DISCOVER at low rates but is outperformed at medium, and in particular high rates, albeit not in crushing fashion. Finally, the encoder complexity of the proposed system is still significantly lower than the complexity associated with H.264/AVC Intra only coding. On the other hand, the decoding complexity that corresponds to the bitplane decoding part is significantly reduced by the encoderdrive rate control scheme with a 'takeitorleaveitâ€™ approach instead of requestanddecode rounds in feedbackbased decoderdriven rate control, as well as by the inclusion of the skip and intra coding modes.
Endnotes
^{a}This paper has been presented in part in the Proceedings of SPIE, 2012[12]. ^{b}The execution time tests used the executables of the JM implementation of H.264/AVC and the proposed system with were conducted under the same hardware and software conditions. The employed hardware was a personal computer with IntelÂ® Coreâ„¢ i7 at 2.2 GHz and 16 GB of RAM. The executables where obtained using the Visual Studio C++ v8.0 compiler in release mode and run under the Windows 7 operating system.
Abbreviations
 AC:

Alternating current
 AVC:

Advanced video coding
 CCE:

Correlation channel estimation
 DC:

Direct current
 DCT:

Discrete cosine transform
 DISCOVER:

Distributed coding for video services
 DVC:

Distributed video coding
 GOP:

Group of pictures
 LDPCA:

Lowdensity paritycheck accumulate
 LLR:

Log likelihood ratio
 MCI:

Motioncompensated interpolation
 ME:

Motion estimation
 MSE:

Mean square error
 MSI:

Mode signalling information
 OBME:

Overlapped block motion estimation
 PDF:

Probability density function
 PRISM:

Powerefficient robust highcompression syndromebased multimedia coding
 PSNR:

Peak signaltonoise ratio
 QCIF:

Quarter common intermediate format
 QM:

Quantization matrix
 QP:

Quantization parameter
 RA:

Reference frame averaging
 RD:

Ratedistortion
 SAD:

Sum of the absolute differences
 SI:

Sideinformation
 SIRL:

Sideinformation refinement level
 SW:

SlepianWolf
 TDWZ:

Transformdomain WynerZiv
 WZ:

WynerZiv.
References
Slepian D, Wolf JK: Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19: 471480. 10.1109/TIT.1973.1055037
Wyner AD, Ziv J: The ratedistortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22: 110. 10.1109/TIT.1976.1055508
Xiong Z, Liveris A, Cheng S: Distributed source coding for sensor networks. IEEE Signal Process. Mag. 2004, 21: 8094. 10.1109/MSP.2004.1328091
Puri R, Majumdar A, Ramchandran K: PRISM: a video coding paradigm with motion estimation at the decoder. IEEE Trans. Image Process. 2007, 16: 24362448.
Girod B, Aaron A, Rane S, RebolloMonedero D: Distributed video coding. Proc. IEEE 2005, 93: 7183.
Pereira F, Torres L, Guillemot C, Ebrahimi T, Leonardi R, Klomp S: Distributed video coding: selecting the most promising application scenarios. Signal Process. Image Commun. 2008, 23: 339352. 10.1016/j.image.2008.04.002
Deligiannis N, Verbist F, Iossifides A, Slowack J, Van de Walle R, Schelkens P, Munteanu A: WynerZiv video coding for wireless lightweight multimedia applications. Special Issue on Recent Advances in Mobile Lightweight Wireless Systems: EURASIP Journal on Wireless Communications and Networking; 2012.
Deligiannis N, Verbist F, Barbarien J, Slowack J, Van de Walle R, Schelkens P, Munteanu A: Distributed coding of endoscopic video, in IEEE International Conference on Image Processing. Brussels: ICIP; September 2011:1114.
Brites C, Ascenso J, Pedro JQ, Pereira F: Evaluating a feedback channel based transform domain WynerZiv video codec. Signal Process. Image Commun. 2008, 23: 269297. 10.1016/j.image.2008.03.002
Stankovic L, Stankovic V, Wang S, Cheng S: Correlation estimation with particlebased belief propagation for distributed video coding, in IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP) . Prague May 2011, 22â€“27: 15051508.
Deligiannis N, Barbarien J, Jacobs M, Munteanu A, Skodras A, Schelkens P: Sideinformation dependent correlation channel estimation in hashbased distributed video coding. IEEE Trans. Image Process. 2012, 21: 19341949.
Verbist F, Deligiannis N, Satti SM, Munteanu A, Schelkens P: Iterative WynerZiv decoding and successive sideinformation refinement in feedback channelfree hashbased distributed video coding, in Proceedings of SPIE 8499. San Diego, CA: Applications of Digital Image Processing XXXV, 84990O; 2012.
Artigas X, Ascenso J, Dalai M, Klomp S, Kubasov D, Quaret M: The DISCOVER codec: architecture, techniques and evaluation, in Picture Coding Symposium (PCS). Lisboa; November 2007:0709.
Brites C, Pereira F: Encoder rate control for transform domain WynerZiv coding, in IEEE International Conference on Image Processing (ICIP) . San Antonio, TX September 2007, 16â€“19: 58.
Brites C, Pereira F: Probability updating for decoder and encoder rate control turbo based WynerZiv video coding, in IEEE International Conference on Image Processing (ICIP) . Hong Kong September 2010, 26â€“29: 37373740.
Morbee M, PradesNebot J, Pizurica A, Philips W: Rate allocation for pixeldomain distributed video coding without feedback channel, in IEEE International Conference on Acoustics Speech, and Signal Processing (ICASSP) . April 2007, 15â€“20: 521524.
Adikari ABB, Fernado WAC, Weerakkody WARJ, Iterative WZ: decoding for unidirectional distributed video coding. IEE Electronics Letters 2007, 43: 9395. 10.1049/el:20073675
Weerakkody WARJ, Fernando WAC, Adikari ABB: Unidirectional distributed video coding for low cost video encoding. IEEE Trans. Consum. Electron 2007, 53: 788795.
Martinez JL, FernandezEscribano G, Kalva H, Weerakkody WARJ, Fernando WAC, Garrido A: Feedback free DVC architecture using machine learning, in IEEE International Conference on Image Processing (ICIP) . San Diego, CA October 2008, 12â€“15: 11401143.
Yaacoub C, Farah J, PesquetPopescu B: Feedback channel suppression in distributed video coding with adaptive rate allocation and quantization for multiuser applications. EURASIP Journal on Wireless Communications and Networking 2008, 2008: 113.
Artigas X, Torres L: Improved signal reconstruction and return channel suppression in distributed video coding systems, in 47th International Symposium ELMAR . Zadar June 2005, 08â€“10: 5356.
Cheng S, Xiong Z: Successive refinement for the WynerZiv problem and layered code design. IEEE Trans. Signal Process. 2005, 53: 32693281.
Brites C, Pereira F: An efficient encoder rate control solution for transform domain Wynerâ€“Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 2011, 21: 12781292.
Chen D, Varodayan D, Flierl M, Girod B: Wynerâ€“Ziv coding of multiview images with unsupervised learning of disparity and Gray code, in IEEE International Conference on Image Processing (ICIP) . San Diego, CA October 2008, 12â€“15: 11121115.
Kubasov D, Nayak J, Guillemot C: Optimal reconstruction in WynerZiv video coding with multiple side information, in IEEE Multimedia Signal Processing Workshop (MMSP) . Chania October 2007, 01â€“03: 251254.
Berrou C, Glavieux A, Thitimajshima P: Near shannon limit errorcorrecting coding and decoding: Turbo codes, in IEEE International Conference on Communications (ICC) . Geneva May 1993, 23â€“26: 10641070.
Chien WJ, Karam LJ, BLASTDVC: bitplane selective distributed video coding. Multimedia Tools and Applications 2010, 48: 437456. 10.1007/s1104200903148
Mys S, Slowack J, Å korupa J, Deligiannis N, Lambert P, Munteanu A, Van de Walle, R: Decoderdriven mode decision in a blockbased distributed video codec. Multimedia Tools and Applications 2012, 58: 239266. 10.1007/s1104201007185
Do T, Shim HJ, Jeon B: Motion linearity based skip decision for Wynerâ€“Ziv coding, in IEEE International Conference on Computer Science and Information Technology . Beijing August 2009, 08â€“11: 410413.
Ascenso J, Pereira F: Low complexity intra mode selection for efficient distributed video coding, in IEEE International Conference on Multimedia and Expo (ICME) . Cancun Juneâ€“03 July 2009, 28: 101104.
Mys S, Slowack J, Å korupa J, Lambert P, Van de Walle, R: Introducing skip mode in distributed video coding. Signal Process. Image Commun. 2009, 24: 200213. 10.1016/j.image.2008.12.004
Deligiannis N, Munteanu A, Clerckx T, Cornelis J, Schelkens P: On the sideinformation dependency of the temporal correlation in WynerZiv video coding, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) . Taipei April 2009, 19â€“24: 709712.
Steinberg Y, Merhav N: On successive refinement for the WynerZiv problem. IEEE Trans. Inf. Theory 2004, 50: 16361654. 10.1109/TIT.2004.831781
Deligiannis N, Verbist F, Slowack J, Van R, de Walle P, Schelkens AM: Joint successive correlation estimation and side information refinement in distributed video coding, in 20th European Signal Processing Conference (EUSIPCO) . Bucharest August 2012, 27â€“31: 569573.
Wiegand T, Sullivan GJ, BjÃ¸ntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13: 560576.
Aaron A, Rane S, Setton E, Girod B: Transformdomain WynerZiv codec for video, in SPIE Visual Communications and Image Processing Conference, VCIP . San Jose, CA January 2004, 20â€“22: 520528.
Brites C, Ascenco J, Pereira F: Improving transform domain WynerZiv video coding performance, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (Toulouse, 14â€“19 May 2006);
Varodayan D, Aaron A, Girod B: Rateadaptive codes for distributed source coding. Signal Process. 2006, 86: 31233130. 10.1016/j.sigpro.2006.03.012
Schwarz H, Marpe D, Wiegand T: Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Transactions for Circuits and Systems for Video Technology 2007, 17: 11031120.
Å korupa J, Slowack J, Mys S, Lambert P, Van de Walle R, Grecos C: Stopping criterions for turbo coding in a WynerZiv video codec, in. Chicago, IL: Picture Coding Symposium (PCS); May 2009:1â€“46â€“8.
Bjontegaard G: Calculation of average PSNR differences between RDcurves (ITUT Video Coding Experts Group (VCEG). Document VCEGM33: Austin, TX, 2001); April 2001.
Pereira F, Ascenso J, Brites C: Studying the GOP size impact on the performance of a feedback channel based WynerZiv video codec, in IEEE Pacific Rim Symposium on Image Video and Technology . Santiago December 2007, 17â€“19: 801815.
Acknowledgements
This work is supported by the Fund for Scientific ResearchFlanders (projects G004712N and G014610N) and the iMinds Institute (ISBO Project Smartcam and the ICON project 'Little Sister: low cost monitoring for care and retailâ€™).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
Parts of the research presented in this article have been filled by IBBT under patent applications EP07120604.9 (T Clerckx, A Munteanu, Motion estimation and compensation process and device, November 2007) and PCT/EP2011/071296 (N Deligiannis, A Munteanu, J Barbarien, Method and device for correlation channel estimation, November 2011).
Authorsâ€™ original submitted files for images
Below are the links to the authorsâ€™ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Verbist, F., Deligiannis, N., Satti, S.M. et al. Encoderdriven rate control and mode decision for distributed video coding. EURASIP J. Adv. Signal Process. 2013, 156 (2013). https://doi.org/10.1186/168761802013156
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/168761802013156