- Research
- Open Access
Suppressing feedback in a distributed video coding system by employing real field codes
- Daniel J Louw1Email author and
- Haruhiko Kaneko1
https://doi.org/10.1186/1687-6180-2013-181
© Louw and Kaneko; licensee Springer. 2013
- Received: 15 April 2013
- Accepted: 19 November 2013
- Published: 5 December 2013
Abstract
Single-view distributed video coding (DVC) is a video compression method that allows for the computational complexity of the system to be shifted from the encoder to the decoder. The reduced encoding complexity makes DVC attractive for use in systems where processing power or energy use at the encoder is constrained, for example, in wireless devices and surveillance systems. One of the biggest challenges in implementing DVC systems is that the required rate must be known at the encoder. The conventional approach is to use a feedback channel from the decoder to control the rate. Feedback channels introduce their own difficulties such as increased latency and buffering requirements, which makes the resultant system unsuitable for some applications. Alternative approaches, which do not employ feedback, suffer from either increased encoder complexity due to performing motion estimation at the encoder, or an inaccurate rate estimate. Inaccurate rate estimates can result in a reduced average rate-distortion performance, as well as unpleasant visual artifacts. In this paper, the authors propose a single-view DVC system that does not require a feedback channel. The consequences of inaccuracies in the rate estimate are addressed by using codes defined over the real field and a decoder employing successive refinement. The result is a codec with performance that is comparable to that of a feedback-based system at low rates without the use of motion estimation at the encoder or a feedback path. The disadvantage of the approach is a reduction in average rate-distortion performance in the high-rate regime for sequences with significant motion.
Keywords
- Distributed video coding
- Feedback suppression
- Real field coding
- Compressive sensing
- Multi-hypothesis
1 Introduction
In a standard transform-domain distributed video coding (DVC) system, frames are split into key frames (which are similar to I-frames in H.264) and Wyner-Ziv (WZ) frames. The key frames are intra-coded, and the WZ frames are discrete cosine-transformed, quantized and encoded using a systematic error correction code (ECC). The parity symbols from the encoding are then transmitted. At the decoder, advanced motion compensation methods are used to produce an estimate of the frame, also known as the side information (SI), from the intra-coded key frames. Errors in the SI are corrected using the received parity symbols.
The encoder must operate at a high enough rate (transmit enough parity symbols) to ensure that the errors can be corrected. However, calculating this rate at the encoder is fundamentally impossible without calculating the SI used at the decoder. Doing this at the encoder would defeat the purpose of DVC as it would increase the computational complexity of the encoder to the same level as that of conventional video coding. The conventional method used to solve this problem is to introduce a feedback path from the decoder [1]. If the rate estimate is too low, more bits are requested. This requires the video sequence to be decoded in real time and renders the codec unsuitable for any application where the compressed sequence needs to be stored for decompression at a later time. It also introduces latency in the process which can become severe if the decoder is far away from the encoder. Constraining the use of the feedback channel has been shown to alleviate some of these problems [2], though real-time decoding is still required.
The second method used to determine the rate is called suppressed feedback DVC, where a low complexity estimate of the SI is created at the encoder and used to estimate the rate. If the rate estimate is too low, then decoding failures can lead to significant distortion in the decoded frame. Conversely, if the rate is overestimated, any extra bits are wasted. Typically, the rate estimate is good for low motion sequences and poor for high or complex motion sequences. In order to achieve robust performance, extra bits may be transmitted to increase the likelihood of successful decoding. However, a finite probability of a decoding failure remains. Increasing the number of extra bits to reduce this probability implies that a larger percentage of transmitted bits will be redundant, yielding reduced rate-distortion (R-D) performance.
In this paper, the authors propose to solve this problem by changing the codec structure so that the output distortion will be a smoother function of the rate. As a result, any extra bits will continue to improve the distortion, and if the rate is underestimated, the image quality will degrade gradually. The rate-distortion characteristics are modified by changing the nature of the quantization region that the Wyner-Ziv code imposes on the signal space. This change is implemented by reversing the order of the quantizer and the low-density parity check (LDPC) encoder and by defining the code over the real field instead of over a finite field.
The results will show that the system reduces the variance of the distortion and improves the perceptual video quality, in comparison with more conventional feedback-free methods without increasing the encoding complexity. The trade-off introduced by the proposed method is a reduction in the average R-D performance as compared to systems with perfect rate knowledge.
This paper is structured as follows: Section 2 provides a short review of the relevant literature. Section 3 discusses real field coding and describes how it alleviates the effects of an imperfect rate estimation. Section 4 describes the overall design of the system. Specific aspects of the encoder and its subsystems are described in Section 5, while the decoder with its subsystems are discussed in Section 6. The complexity of the proposed system is analysed in Section 7 and the performance of the system is analysed in Section 8, with the conclusion following in Section 9.
2 Overview of the previous work
DVC is based on the Wyner-Ziv [3] and Slepian-Wolf (SW)[4] coding theorems. There have been several DVC systems proposed in literature along with many improvements to individual subsystems. The first practical systems were the Stanford system [5] and PRISM (Berkeley) [6]. Since then many improvements have been introduced. Notable milestones include the DISCOVER [1] and VISNET [7] codecs, both of which are based on the architecture of the Stanford system. The authors have also presented a codec based on the Stanford architecture [8, 9], of which the SI creation method is used in this paper.
The majority of competitive systems require a feedback path. Methods to remove feedback are based on estimating the required rate at the encoder and attempting to mitigate the visual effects of decoding failures at the decoder. Estimating the rate requires estimating the SI at the encoder, which increases the encoder complexity. Some estimate must be made and thus the increased complexity at the encoder becomes a trade-off point for R-D performance.
There have been some notable proposals for feedback suppressed systems. A pixel-domain feedback suppressed system was developed in [10]. However, the simulation results did not consider the key frames in the R-D performance, making comparisons difficult. In [11] and [12] there were attempts to learn the required rate from the SI estimate using machine learning and neural networks, respectively. In [13] a pixel-domain system, exploiting spatial and temporal correlations along with iterative decoding was presented. A system with a structure not relying on conventional Wyner-Ziv techniques and exploiting overlapped block motion estimation with probabilistic compensation (OBMEPC) and SI dependent correlation channel estimation (SID-CE) was presented in [14]. This system produced excellent results at a reduced encoding complexity. A system similar to [10] but for transform-domain DVC was presented in [15], where a multi-mode SI method was used at the encoder. More advanced and complete attempts at creating and analysing a feedback-free DVC system were presented in [16] and [17]. These systems rely on performing reduced complexity motion estimation at the encoder. Sophisticated techniques related to the correlation noise modelling, improved side information generation, and mitigation of decoding errors allowed authors in [16] to achieve R-D performance approaching that of the DISCOVER codec. A useful addition in [17] is the use of a hash to improve the motion estimation at the decoder. Future research may consider the complexity vs. rate-distortion trade-off of including low complexity motion estimation at the encoder, but in this paper, it is assumed that motion estimation is not desired at the encoder.
Encoding over the real field, quantizing the result and decoding with side information can be shown to be mathematically similar to compressive sensing (CS), if the error signal is assumed to be sparse or compressible. The effects of quantization and the R-D performance of quantized CS were analysed in [18]. In the proposed system, we show how the use a binning quantizer [19] can improve the R-D performance when the rate estimate is accurate. CS has been applied to video compression previously, for example in [20]. However, previous systems are different to the one presented here, since in these systems CS is used to reduce the sampling requirements in the pixel domain and, as such, the discrete cosine transform (DCT) is not used.
3 Real field coding
The parity symbols identify a specific subspace with (N- M) dimensions, in where x must lie. For example, if N = 3 and M = 2, then s will describe a specific line along which x lies. Decoding will find the most likely point on this line. In finite field coding, the same is true over the finite-field-based signal space, but when the correct codewords are mapped back to the signal space, they map to non-contiguous quantization regions.
If the noise is not Gaussian, the decoded variance can be reduced even further. For example, if the noise is K-sparse, which means ||e||0 = K,K ≪ N, then with M ≥ K + 1, perfect reconstruction is possible, assuming an l 0-minimization decoder, though the problem is NP-complete [22]. If the noise is Laplace distributed, as is typically the case for DVC, or more generally compressible, then the performance lies between these two cases. Thus, decoding reduces the error variance in all cases and catastrophic decoding failures do not occur. Therefore, while there is no guarantee of a specific R-D performance due to a lack of knowledge about the channel, the method will always reduce the distortion in the SI, and there will be no catastrophic decoding failures due to noise variance underestimation.
The quantized parity symbols describe a cell in that is N -dimensional, but bounded in M of those dimensions. The size of the bounded dimensions depends on the quantization bin size. There will thus be an additional quantization noise component added to the distortion as described in (6). Quantizer design is considered in Sections 5.3 and 5.4. From this point, we will refer to P T as H to more closely align with notation used in the Wyner-Ziv coding literature.
4 Description of the proposed system
System diagram.
4.1 Encoder
where Q b (·) is the quantizer for the b th subband. If q b represents the number of bits assigned to Q b (·), then will have elements in {0,1,…,L b - 1}, where . Each quantized subband is then sent to the decoder along with the intra-coded key frames. Due to the lack of a feedback path, frames do not need to be buffered after encoding. The parameters required to describe the quantizer, as discussed in Section 5.3, must also be transmitted.
4.2 Decoder
we calculate an estimate . The method used to calculate the estimated error will differ for each hypothesis and depend on the method used to produce the hypothesis. From the error estimates, noise parameters are calculated at a coefficient level using the method developed in [23]. In order to improve the relative quality of the noise estimation for the different hypotheses, a motion-adaptive scaling factor is used to scale the Laplace noise parameters of the key frame and motion-based hypotheses as described in [8].
After decoding, the process is repeated in a method similar to successive refinement [24–27]. Using the decoded frame , a more accurate motion estimation and compensation algorithm is used to produce higher quality SI. New correlation noise estimates must then be produced for the SI, after which the GBP algorithm is used to decode the improved SI. This process may be iterated a number of times, though two iterations are typically sufficient. If the original decoding introduced an error, then the iterative process may worsen performance as the motion-compensated interpolation (MCI) process may produce worse quality SI. We will now discuss the subsections in more detail.
5 Encoder
In this section we discuss the specifics of the different parts of the system. Both the time and subbands indices have been dropped from most of the equations, since the same process is applied to all the subbands for each of the WZ coded frames.
5.1 Real field code design
Coding over the real field as opposed to a finite field has not been considered much due to the prevalence of digital systems. Some early analog codes were based on Bose-Chaudhuri-Hocquenghem (BCH) codes and the discrete Fourier transform (DFT) [28]. Large codes, analogous to finite field LDPC codes, were considered recently in the context of CS [29]. While exact design strategies are still being developed, it has been shown that codes that work well as binary LDPC codes also work well over the real field.
LDPC code design can be described as designing the decoding graph and choosing the connection weights. In this paper, we use the quasi-cyclic (QC) codes designed in [30] to design the decoding graph (the parity check matrix structure). QC codes were chosen because of their fast encoding algorithms and reduced storage requirements at the encoder, as compared with random codes. For a code with an M × N parity matrix H, the connection weights (non-zero elements) were selected at random from a Gaussian distribution such that the expected total energy of the encoded signal is the same as that of the original signal .
M corresponds to the number of transmitted symbols. These values are sufficient for the video resolution considered in this paper, where N = 1,584. If the higher resolution frames are encoded, then N and M might increase. Codes should be designed with the required size in mind. We perform decoding using a version of GBP described in Section 6.1.
Plot of ratio of the noise variance after decoding to the initial noise variance. Plot of ratio of the noise variance after decoding to the initial noise variance as a function of M for several SNR values. The signal and the noise are both Laplace distributed and ten decoding iterations were performed.
5.2 Rate estimation
The proposed system operates at an estimate of the theoretical Wyner-Ziv rate. This rate is calculated assuming a conventional system with a mid-rise symmetrical quantizer for the DC band and a dead-zone uniform quantizer for the AC bands. The bits assigned to each band for each rate-distortion point, are taken from the quantization profiles in [23]. While these quantizers are used to calculate the bit budget assigned to each subband, a different quantizer is used after encoding. This is because we are quantizing the encoded signal and not the original signal. The code size M b and number of quantization bits q b are the design parameters that affect performance where is the bit budget.
where H(X Q|Y Q) is the conditional entropy of quantized source, X Q, given the quantized side information Y Q. The final bit budget per subband will be .
5.2.1 Correlation channel model
5.2.2 Conventional bit plane based estimates
with WZ j representing all the symbols in the j th subband. This method requires N entropy calculations for each bit plane of each subband.
5.2.3 Symbol-wise conditional entropy estimation
which allows the required rate to be calculated as a function of the entropies of three single variables, each of which is simpler to calculate than the conditional entropy. This is a high-rate assumption which will typically be inaccurate for large Δ values.
These closed-form equations all assume that the quantizer is defined by the bin size and that the range of the variable is theoretically infinite. In reality, there will be a limited number of levels considered. We can thus also calculate H(X Q) by evaluating (37) and (39) over a finite number of levels. However, the difference between these is small enough to be ignored and we choose to use the closed-form solutions provided by the infinite quantizer instead.
5.3 Quantizer
The scaling parameter β allows for the quantizer to clip some measurements. The divisor D is a parameter that allows binning of the measurements to improve the rate performance. For D = 1, the quantizer is a standard uniform quantizer. The Δ, q b and D parameters for each subband are transmitted along with the encoded sequence. Δ can be represented with 8 bits, q b with 3 bits and D with 3 bits as well. As a result, the overhead, from transmitting these parameters, is negligible.
5.4 Bit allocation
The bit allocation should be chosen to minimise the expected distortion. In general, for a bounded quantization cell, the area in the cell and the resultant distortion can be halved by adding an additional quantization bit per symbol or by doubling the number of constraints (M b ). This would indicate that the number of quantization bits per symbol (q b ) should be maximised. However, this is only true once the quantization region is bounded. This indicates that the allocation decision function should be piecewise defined. The number of constraints should be increased until the region is bounded at which point the number of bits per constraint should be increased.
We also require that q eff ≥ 2. After adjusting q b to meet the above criteria, M b is recalculated.
6 Decoder
6.1 Belief propagation over real fields
Belief propagation (BP) is a decoding algorithm that calculates an estimate of a marginal probability by exploiting factorization to reduce computational complexity. BP can be performed over any arbitrary field. The update equations require the computation of the product and the convolution of the marginal probability density functions (pdfs) being passed in the graph. When the coding is performed over the binary field, the pdf becomes a probability mass function (PMF) and can be represented using a single number. In the case of the real field, a full pdf is required. Calculating the convolution of a set of arbitrary functions is difficult. Thus, we use a version of relaxed or Gaussian BP [36], which assumes that the messages internal to the graph are Gaussian distributed. This allows for simple closed-form solutions of the update equations that rely only on the mean and the variance of the messages, reducing computational complexity. It also means that only two values need to be passed along each edge of the graph, reducing memory requirements.
where s l , l ∈ {1,D} are the centres of the D bins corresponding to the received . Now, f S (s j ) is a uniform pdf over the range corresponding to the most likely bin.
6.2 Side information creation
6.2.1 Initial decoding
A multi-hypothesis SI creation method using four hypotheses as developed in [8] is used. The hypotheses are the two reference frames, and , a MCI-based frame using small blocks, , as well as a MCI-based frame using big blocks, . The method used to perform MCI is a variant of the one described in [37]. First the key frames are mean filtered. The filtered key frames are then used for bidirectional pixel level accuracy block matching (BM) using large blocks and the modified sum of absolute difference (SAD) metric in [37]. These vectors are used as starting points for a BM algorithm using smaller blocks with a fixed search range. The motion vector (MV) field generated with large blocks is quadrupled in density (oversampled by a factor of two in both directions), before being median filtered. By increasing the density, we allow the median filter to produce smoother edges and more precise vectors. The MV field generated with smaller blocks is also quadrupled in density before being median filtered. These two MV fields are then used to create two MCI hypotheses by interpolating from the original key frames.
6.2.2 Successive refinement
After BP decoding, the subbands are recombined and a pixel domain version of the WZ frame is produced. Using this decoded frame, the motion estimation process is repeated to produce better quality side information. This is similar to [24–27] where successive refinement was used to improve reconstruction distortion. Due to the use of real field coding, the effect is slightly different from conventional DVC as improved side information can result in decreased distortion from the GBP decoding process, whereas for finite field coding, there can be no further gain (other than improved reconstruction) after successful decoding of the bit planes.
Bidirectional motion estimation. The gray boxes in frames f[t - 1] and f[t + 1] indicate the search range.
where L,M and R represent the subblocks in the left key frame, the reconstructed frame and the right key frame, respectively. λ M is a scaling factor that may be optimised. For this paper we used λ M = 0.25.
Single directional motion estimation. The gray boxes in frames f[t - 1] and f[t + 1] indicate the search range.
The metrics are all adjusted by including a motion penalising term [37], which helps to ensure less noisy motion vectors.
6.3 Correlation noise modeling
The system uses a coefficient level noise model [23] which models the correlation noise for a single coefficient of a subband of a hypothesis as being Laplace distributed. The method for calculating this noise parameter (α) requires an estimate of the error in the hypothesis. This error is termed as the residual frame (r). For the multi-hypothesis system, a residual frame estimate is required for each hypothesis.
6.3.1 Residual estimation for the first decoding
before being passed to the GBP algorithm for decoding.
6.3.2 Residual estimation for later iterations
Here, and indicate the motion-compensated key frames used to create , M is the previous decoded frame, and the λ s are tunable parameters. While they may be optimised in the future, we used .
where λ is a tunable variable, that we set to one half.
7 Complexity analysis
In this section, we will discuss the complexity of the proposed system relative to conventional DVC as well as H.264 intra coding.
7.1 Encoder complexity
The overall complexity is a function of the GOP size, since it will be a combination of the key frame encoding complexity as well as the WZ encoding complexity. Initially, we consider only the complexity of the WZ frames, since the key frames have the same complexity in H.264 (intra) and in conventional DVC methods.
The complexity of the encoder can be described as the sum of the complexity of the subblocks. The main points where differences between the proposed method and conventional DVC may occur is in the conditional entropy estimation block, the quantizer and the real field encoder. We do not consider the complexity of motion estimation at the encoder and do not compare against systems that employ motion estimation. The complexity of the DCT operation will be the same for all transform domain systems.
Real field encoding compared to finite field coding will have the same number of operations (assuming the same code size), but the type of operation will differ: Real field (floating point) multiplication and addition versus Galois field multiplication and addition. The complexity of the different operations will be platform and implementation specific. Typically one would expect finite field operations to be faster than the real field equivalent, however many modern processors are optimised to perform floating point operations and can perform them at high speed.
A similar point can be made for bit-plane-based systems with binary codes. These codes will have a larger number of operations compared to symbol-based coding, but the operations will be simple binary XOR and AND operations which are much faster.
The proposed conditional entropy estimator is faster than the conventional method (see Section 5.2), O(1) vs. O(q b N) per subband, but the complexity of creating the side information estimate is also at least O(N), so this does not represent a very large saving. The quantizers will also be similar in complexity though the proposed system only quantizes (M b < N) values compared to the N values of a conventional system. Overall, we expect the encoding complexity of the proposed system to be similar to that of conventional DVC without motion estimation at the encoder. Exact comparisons will be implementation specific.
To experimentally evaluate the complexity we performed execution-time experiments. Simulation-based comparisons are difficult and always imperfect, but may still provide some insight into the expected performance of the systems. For these simulations we used the reference implementation of the H.264 intra codec. The proposed codec was implemented in C++. We also compare with the DISCOVER codec.
Encoder run-time simulation
Foreman | Coastguard | Soccer | Hall monitor | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RD | H.264 | Prop | Disc | H.264 | Prop | Disc | H.264 | Prop | Disc | H.264 | Prop | Disc |
1 | 112 | 14 | 21 | 120 | 10 | 21 | 104 | 23 | 21 | 121 | 8 | 21 |
2 | 114 | 16 | 21 | 124 | 13 | 21 | 104 | 24 | 21 | 124 | 9 | 21 |
3 | 116 | 16 | 21 | 127 | 13 | 21 | 107 | 24 | 22 | 126 | 9 | 22 |
4 | 128 | 26 | 21 | 136 | 25 | 22 | 114 | 38 | 22 | 133 | 10 | 22 |
5 | 131 | 26 | 22 | 141 | 25 | 21 | 116 | 39 | 22 | 136 | 10 | 23 |
6 | 140 | 35 | 24 | 152 | 39 | 22 | 122 | 48 | 22 | 143 | 11 | 24 |
7 | 148 | 49 | 23 | 157 | 52 | 23 | 132 | 69 | 24 | 147 | 20 | 24 |
8 | 172 | 86 | 24 | 184 | 98 | 25 | 159 | 110 | 25 | 170 | 43 | 26 |
The table shows the average encoding time for a WZ frame as measured when N G = 2. The overall encoding time for any GOP size can be estimated by combining the running time for the key frames and the WZ frames. From the table, one can see that the proposed system has a reduced run time when measured against the H.264 intra codec. The running time for the proposed system increases as the rate increases (larger RD points), since more subbands are encoded. A rate increase for a given subband also increases the size of the encoding matrix, which leads to a run-time increase.
The proposed system has approximately the same run time as the DISCOVER codec in the low-rate region. However, for the higher RD points, the proposed system has a longer run time than the DISCOVER codec. It should be mentioned that no attempt was made to optimise the encoder implementation, and it is expected that the run time can be significantly reduced in the future.
7.2 Decoder complexity
Decoding complexity is generally not considered a limitation for DVC and is not often considered when evaluating the performance in DVC systems. However, we will briefly discuss the effect of real field coding on the decoder complexity. The reduced complexity GBP algorithm used for decoding is similar to a binary BP decoding algorithm in terms of memory requirements and computational complexity. In GBP each edge requires two values to be transmitted while for binary BP one value is required. The update rules are also simple equations that scale with the check and variable node degree distributions. However, binary-coded systems decode each bit plane independently, thus it has to perform more than one decoding for each subband. The GBP also converges in fewer than ten iterations, which is less than most binary BP algorithms. GBP is thus expected to compare favourably to binary BP in terms of decoding speed. Non-binary BP is typically slower than binary BP and requires more memory. Each edge must transmit the entire PMF (L values). The update equations at the check node are also slow. A faster fast Fourier transform belief propagation (FFT-BP) algorithm still requires two FFT operations for every edge. Typically, FFT-BP is considered to be O(q b) times slower than binary BP. There are lower complexity non-binary BP algorithms in the literature, but discussing their relative merits is beyond the scope of this article. In general, GBP is expected to be faster than non-binary BP.
Decoder run-time simulation
Foreman | Coastguard | Soccer | Hall monitor | |||||
---|---|---|---|---|---|---|---|---|
RD | Prop | Disc | Prop | Disc | Prop | Disc | Prop | Disc |
1 | 2.133 | 6.528 | 1.830 | 4.835 | 2.910 | 8.323 | 1.580 | 4.984 |
2 | 2.164 | 7.541 | 1.830 | 5.780 | 2.940 | 8.955 | 1.590 | 5.780 |
3 | 2.180 | 8.533 | 1.850 | 6.291 | 2.940 | 10.869 | 1.790 | 6.394 |
4 | 4.990 | 12.933 | 5.400 | 8.927 | 6.430 | 15.586 | 2.950 | 7.964 |
5 | 5.080 | 13.909 | 5.480 | 9.138 | 6.560 | 16.249 | 4.070 | 8.018 |
6 | 7.000 | 17.825 | 8.180 | 11.821 | 8.950 | 20.301 | 6.120 | 9.820 |
7 | 10.040 | 21.682 | 11.190 | 14.349 | 12.570 | 23.737 | 6.450 | 11.251 |
8 | 17.200 | 31.127 | 18.910 | 21.704 | 20.130 | 29.926 | 10.880 | 14.061 |
The table shows the average decoding time for a WZ frame as measured when N G = 2. From the table one can see that the proposed system runs faster than the DISCOVER codec in all simulations. While this is not proof that the system will always be faster, it does indicate that real field decoding does not represent a complexity increase.
8 Simulation results and discussion
The system presented in this paper is tested on four standard test sequences: foreman, coastguard, soccer and hall monitor. The sequences are all QCIF format with a frame rate of 15 Hz. All the rate-distortion curves show results for the average bit rate and the distortion as calculated over all the frames in the entire sequence (key and WZ frames). All sequences consist of 150 frames.
The system will be compared against the H.264 intra codec, the DISCOVER codec [1], which is a feedback-based benchmark system in DVC literature, and a system previously developed by the authors [8], adapted for this paper, which will be called the ‘FF system’ in this section. All the results for the proposed system were obtained without binning, thus D=1, to ensure that there are no decoding failures.
Key frame quantization parameters for the RD points
RD number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
Hall monitor | 37 | 36 | 35 | 33 | 32 | 30 | 29 | 24 |
Coastguard | 38 | 37 | 36 | 34 | 33 | 31 | 30 | 26 |
Foreman | 40 | 39 | 38 | 34 | 33 | 31 | 29 | 25 |
Soccer | 43 | 42 | 40 | 36 | 35 | 33 | 30 | 25 |
Comparison of the performance of the proposed system on the foreman sequence. Performance of the proposed system on the foreman sequence compared to the DISCOVER codec, the H.264 intra codec, and the FF system.
Comparison of the performance of the proposed system on the coastguard sequence. Performance of the proposed system on the coastguard sequence compared to the DISCOVER codec, the H.264 intra codec, and the FF system.
Comparison of the performance of the proposed system on the soccer sequence. Performance of the proposed system on the soccer sequence compared to the DISCOVER codec, the H.264 intra codec, and the FF system.
Comparison of the performance of the proposed system on the hall monitor sequence. Performance of the proposed system on the hall monitor sequence compared to the DISCOVER codec, the H.264 intra codec, and the FF system.
Bjontegaard metric results
GOP size | 2 | 4 | 8 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sequence | Type | Disc | Prop | FF | Disc | Prop | FF | Disc | Prop | FF | |
Foreman | Full | Rate | 10.40 | 13.80 | 26.05 | 42.23 | 106.74 | 1567.70 | 91.31 | 206.78 | - |
DSNR | -0.40 | -0.89 | -1.47 | -1.55 | -3.60 | -5.35 | -2.76 | -5.70 | -8.74 | ||
Mid | Rate | 5.13 | 8.37 | 18.45 | 33.87 | 99.27 | -48.09 | 78.63 | 277.07 | - | |
DSNR | -0.25 | -0.50 | -0.91 | -1.43 | -3.16 | -4.49 | -2.65 | -5.45 | -8.00 | ||
Coastguard | Full | Rate | -17.88 | -9.88 | -3.41 | -9.04 | 37.46 | 77.84 | 26.65 | 193.56 | 378.17 |
DSNR | 0.98 | 0.14 | -0.08 | 0.41 | -1.61 | -2.03 | -0.90 | -3.89 | -5.20 | ||
Mid | Rate | -19.33 | -12.71 | -7.42 | -12.93 | 30.60 | 48.49 | 18.68 | 48.83 | -91.21 | |
DSNR | 1.07 | 0.50 | 0.27 | 0.58 | -1.14 | -1.47 | -0.75 | -3.44 | -4.45 | ||
Soccer | Full | Rate | 90.04 | 115.72 | 310.00 | 167.53 | 279.95 | - | 233.58 | 449.35 | - |
DSNR | -2.86 | -3.74 | -5.89 | -4.35 | 6.36 | -9.99 | -5.24 | -8.06 | -12.79 | ||
Mid | Rate | 85.63 | 123.33 | 321.26 | 164.38 | 304.88 | - | 230.12 | 571.79 | - | |
DSNR | -2.69 | -3.61 | -5.65 | -4.07 | -6.20 | -9.41 | -4.93 | -7.95 | -12.10 | ||
Hall | Full | Rate | -28.44 | -36.57 | -35.37 | -42.01 | -40.34 | -41.54 | -41.67 | -15.18 | -30.27 |
Monitor | DSNR | 2.64 | 3.04 | 2.77 | 3.94 | 2.26 | 1.98 | 3.96 | 0.63 | 0.66 | |
Mid | Rate | -30.52 | -39.49 | -38.42 | -44.77 | -45.54 | -45.96 | -44.75 | -28.28 | -24.17 | |
DSNR | 2.83 | 3.68 | 3.20 | 4.58 | 4.24 | 2.75 | 5.27 | 3.23 | -0.47 |
The same trends are apparent in all the sequences. The proposed system performs comparably with the DISCOVER codec in the low-rate regime, but then deviates and performs less well in the high-rate regime. The FF-based system and the proposed system perform similarly in the coastguard sequence for the GOP size equal to 2 and 4. In all other cases, the proposed system has better performance. The performance gain when the GOP size is equal to 2 is small, but increases at higher rates. The performance gain is also larger for GOP sizes equal to 4 and 8. In general, the performance loss of the proposed system as measured against the DISCOVER codec increases with the GOP size.
Comparison of the performance of the proposed system on all four sequences. Performance of the proposed system on all four sequences compared to the FF system using the SSIM metric. The results are shown for GOP sizes of 2 and 4.
Standard deviation for all sequences, GOP = 2. Standard deviation of the PSNR over the four test sequences for both the proposed system as well as the FF system as a function of the profile number (rate). The same rate is used for both systems and the GOP size is 2.
Standard deviation for all sequences, GOP = 4. Standard deviation of the PSNR over the four test sequences for both the proposed system as well as the FF system as a function of the profile number (rate). The same rate is used for both systems and the GOP size is 4.
Standard deviation for all sequences, GOP = 8. Standard deviation of the PSNR over the four test sequences for both the proposed system as well as the FF system as a function of the profile number (rate). The same rate is used for both systems and the GOP size is 8.
Comparison of the same frame encoded and decoded using FF system and proposed system. Both frames used the same rate.
Though the results are not plotted, the systems presented in [16] and [17] both have better average R-D performance than the proposed system and achieve similar average R-D performance as the DISCOVER codec. Using low complexity motion estimation at the encoder, they are able to achieve accurate rate estimation. The addition of hashes further improves the performance of these systems. The exact complexity and run-time losses incurred by the motion estimation are not available. The SSIM and PSNR variance results for these systems are also not available.
For the proposed system, it is expected that employing a similar low complexity motion estimation algorithm at the encoder will improve the accuracy of the rate estimation and by extension the average R-D performance of the proposed system as well. Furthermore, accurate rate estimation will allow for binning as described in Section 5.4, which can further improve the average rate-distortion performance. Hashing and more advanced SI creation methods could also be employed.
9 Conclusion
This paper proposed a new approach to feedback suppression in DVC systems that relies on codes defined over the real field. Despite the removal of the feedback path, the encoder complexity was not significantly increased, since no motion estimation was performed at the encoder. The system showed average R-D performance comparable to that of a feedback-based system at low rates. At high rates, there was a reduction in performance compared to feedback-based systems. However, compared to conventional finite-field-based feedback-free systems, the variance in the distortion was reduced. This resulted in improved perceptual visual quality.
Declarations
Authors’ Affiliations
References
- Artigas X, Ascenso J, Dalai M, Klomp S, Ouaret M: The DISCOVER codec: architecture, techniques and evaluation. Picture Coding Symposium (PCS) (Lisbon,7–9 November 2007)Google Scholar
- Slowack J, Skorupa J, Deligiannis N, Lambert P, Munteanu A, Van De Walle R: Distributed video coding with feedback channel constraints. Circuits Syst. Video Technol., IEEE Trans 2012, 22(7):1014-1026.View ArticleGoogle Scholar
- Wyner A, Ziv J: The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22: 1-10. 10.1109/TIT.1976.1055508MathSciNetView ArticleMATHGoogle Scholar
- Slepian D, Wolf J: Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19(4):471-480. 10.1109/TIT.1973.1055037MathSciNetView ArticleMATHGoogle Scholar
- Aaron A, Zhang R, Girod B: Wyner-Ziv coding of motion video. Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, 3–6 November 2002, Volume 1 (2002), pp 240–244Google Scholar
- Puri R, Ramchandran K: PRISM: a new robust video coding architecture based on distributed compression principle. Annual Allerton Conference on Communication, Control, and Computing (ACCC) (Urbana, 2–4 October 2002)Google Scholar
- Ascenso J, Brites C, Dufaux F, Fernando A, Ebrahim T, Pereira F, Tubaro S: The VISNET II DVC codec: architecture, tools and performance. European Signal Processing Conference (EUSIPCO) (Aalborg, 23–27 August 2010)Google Scholar
- Louw D, Kaneko H: A multi-hypothesis non-binary LDPC code based distributed video coding system. Proceedings of the 13th IASTED International Conference on Signal and Image Processing, Volume 74, Dallas, 14–16 December 2011Google Scholar
- Louw D, Kaneko H: A system combining extrapolated and interpolated side information for single view multi-hypothesis distributed video coding. Proceedings International Symposium on Information Theory and its Applications (ISITA), Honolulu, 28–31 October 2012, pp. 779–783Google Scholar
- Morbée M, Prades-Nebot J, Roca A, Pižurica A, Philips W: Improved pixel-based rate allocation for pixel-domain distributed video coders without feedback channel. In Proceedings of the 9th international conference on Advanced concepts for intelligent vision systems, ACIVS’07.. Springer, Berlin, Heidelberg; 2007:663-674.View ArticleGoogle Scholar
- Martinez J, Fernandez-Escribano G, Kalva WARJWeerakkody H, Fernando WAC, Garrido A: Feedback free DVC architecture using machine learning. 15th IEEE International Conference on Image Processing, 2008. ICIP 2008, San Diego, 12–15 October 2008, pp. 1140–1143Google Scholar
- Nickaein I, Rahmati M, Ghidary S, Zohrabi A: Feedback-free and hybrid distributed video coding using neural networks. 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), Montreal 3–5 July 2012, pp. 528–532Google Scholar
- Weerakkody WARJ, Fernando WAC, Adikari ABB: Unidirectional distributed video coding for low cost video encoding. IEEE Trans. Consumer Electron 2007, 53(2):788-795.View ArticleGoogle Scholar
- Deligiannis N, Munteanu A, Clerckx T, Cornelis J, Schelkens P: Overlapped block motion estimation and probabilistic compensation with application in distributed video coding. IEEE Signal Process. Lett 2009, 16(9):743-746.View ArticleGoogle Scholar
- Sheng T, Zhu X, Hua G, Guo H, Zhou J, Chen C: Feedback-free rate-allocation scheme for transform domain Wyner–Ziv video coding. Multimedia Syst 2010, 16(2):127-137. 10.1007/s00530-009-0179-8View ArticleGoogle Scholar
- Brites C, Pereira F: An efficient encoder rate control solution for transform domain Wyner-Ziv video coding. IEEE Trans. Circuits Syst. Video Technol 2011, 21(9):1278-1292.View ArticleGoogle Scholar
- Verbist F, Deligiannis N, Satti SM, Munteanu A, Schelkens P: Iterative Wyner-Ziv decoding and successive side-information refinement in feedback channel-free hash-based distributed video coding. Proceedings of SPIE 8499, Applications of Digital Image Processing XXXV, 84990O–1–84990O–10 (2012)Google Scholar
- Goyal V, Fletcher A, Rangan S: Compressive sampling and lossy compression. Signal Process. Mag. IEEE 2008, 25(2):48-56.View ArticleGoogle Scholar
- Kamilov U, Goyal V, Rangan S: Message-passing de-quantization with applications to compressed sensing. IEEE Trans. Signal Process 2012, 60(12):6270-6281.MathSciNetView ArticleGoogle Scholar
- Baig Y, Lai E, Punchihewa A: Distributed video coding based on compressed sensing. Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Melbourne, 9–13 July 2012, pp. 325–330Google Scholar
- Liu Z, Cheng S, Liveris A, Xiong Z: Slepian-wolf coded nested lattice quantization for Wyner-Ziv coding: high-rate performance analysis and code design. IEEE Trans. Inf. Theory 2006, 52(10):4358-4379.MathSciNetView ArticleMATHGoogle Scholar
- Donoho D: Compressed sensing. IEEE Trans. Inf. Theory 2006, 52(4):1289-1306.MathSciNetView ArticleMATHGoogle Scholar
- Brites C, Pereira F: Correlation noise modeling for efficient pixel and transform domain Wyner-Ziv Video coding. IEEE Trans. Circuits Syst. Video Technol 2008, 18(9):1177-1190.View ArticleGoogle Scholar
- Martins R, Brites C, Ascenso J, Pereira F: Refining side information for improved transform domain Wyner-Ziv video coding. IEEE Trans. Circuits Syst. Video Technol 2009, 19(9):1327-1341.View ArticleGoogle Scholar
- Ailah AAE, Petrazzuoli G, Farah J, Cagnazzo M, Pesquet-Popescu B, Dufaux F: Side information improvement in transform-domain distributed video coding. Proceedings of the SPIE - Applications of Digital Image Processing XXXVII, San Diego, 17–21 August 2012Google Scholar
- Deligiannis N, Verbist F, Slowack J, Van De Walle R, Schelkens P, Munteanu A: Joint successive correlation estimation and side information refinement in distributed video coding. 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, 27–31 August 2012, pp. 569–573Google Scholar
- Ye S, Ouaret M, Dufaux F, Ebrahimi T: Improved side information generation for distributed video coding by exploiting spatial and temporal correlations. EURASIP J. Image Video Process 2009, 2009: 683510.View ArticleGoogle Scholar
- Marshall T Jr: Coding of real-number sequences for error correction: a digital signal processing problem. IEEE J. Selected Areas Commun 1984, 2(2):381-392. 10.1109/JSAC.1984.1146063View ArticleGoogle Scholar
- Dimakis A, Smarandache R, Vontobel P: LDPC codes for compressed sensing. IEEE Trans. Inf. Theory 2012, 58(5):3093-3114.MathSciNetView ArticleGoogle Scholar
- Song S, Zhou B, Lin S, K Abdel-Ghaffar K: A unified approach to the construction of binary and nonbinary quasi-cyclic LDPC codes based on finite fields. IEEE Trans. Commun. 2009, 57: 84-93.View ArticleGoogle Scholar
- Lam E, Goodman J: A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Process 2000, 9(10):1661-1666. 10.1109/83.869177View ArticleMATHGoogle Scholar
- Fu C, Kim J: Encoder rate control for block-based distributed video coding. 2010 IEEE International Workshop on Multimedia Signal Processing (MMSP), Saint Malo, 4–6 October 2010, pp. 333–338Google Scholar
- Cheng S, Xiong Z: Successive refinement for the Wyner-Ziv problem and layered code design. IEEE Trans. Signal Process 2005, 53(8):3269-3281.MathSciNetView ArticleGoogle Scholar
- Yaacoub C, Farah J, B Pesquet-Popescu B: Feedback channel suppression in distributed video coding with adaptive rate allocation and quantization for multiuser applications. EURASIP J. Wireless Commun. Netw 2008., 2008: doi:10.1155/2008/427247Google Scholar
- Louw D, Kaneko H: Efficient conditional entropy estimation for distributed video coding. Proceedings of the 30th Picture Coding Symposium (PCS), San Jose, 8–11 December 2013Google Scholar
- Rangan S: Estimation with random linear mixing, belief propagation and compressed sensing. 2010 44th Annual Conference on Information Sciences and Systems (CISS), Princeton, 17–19 March 2010, pp. 1–6Google Scholar
- Ascenso J, Brites C, Pereira F: Content adaptive Wyner-ZIV video coding driven by motion activity. IEEE International Conference on Image Processing, Atlanta, 8–11 October 2006, pp. 605–608Google Scholar
- Pereira F, Ascenso J, Brites C: Studying the GOP size impact on the performance of a feedback channel-based Wyner-Ziv video codec. In Advances in Image and Video Technology, Volume 4872 of Lecture Notes in Computer Science. Ed. by D Mery, L Rueda. Springer, Berlin, Heidelberg; 2007:801-815.Google Scholar
- Bjontegaard G: Calculation of average PSNR differences between RD-curves. the 13th VCEG-M33 Meeting, Austin, April 2001Google Scholar
- Sullivan G, Bjontegaard G: Recommended simulation common conditions for H.26L coding efficiency experiments on low-resolution progressive-scan source material. VCEG-N81, 14th Meeting, Santa Barbara, September 2001Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.