# Distributed video coding with block mode decision to reduce temporal flickering

- Chang-Ming Lee
^{1, 2}Email author, - Zhi-Heng Chiang
^{1}, - Dung-Chan Tsai
^{2}and - Wen-Nung Lie
^{2, 3}

**2013**:177

https://doi.org/10.1186/1687-6180-2013-177

© Lee et al.; licensee Springer. 2013

**Received: **20 July 2013

**Accepted: **13 November 2013

**Published: **1 December 2013

## Abstract

The common distributed video coding (DVC) systems treat input video frames group by group. In each group of pictures (GOP), usually the first frame, called key frame, is intra-coded and the others are Wyner-Ziv (WZ)-coded. This GOP coding structure presents a fixed and inefficient coding mode switch (key or WZ) at group boundaries, thus preventing a DVC system from adapting the rate-distortion (R-D) performance to varying video content. In this work, a structure of temporal group of blocks (TGOB) with dynamic coding mode (key/WZ) decision is presented. This dynamic TGOB coding architecture determines each image block to be a key block or a WZ block based on spatiotemporal image analysis, resulting in a mode switch of fine granularity in both the spatial and temporal domains. As a consequence, not only the overall coding efficiency is improved, but also the temporal flickering artifacts for the reconstructed video are reduced substantially. Experimental results show that our proposed DVC scheme with block mode decision achieves up to 2.85 dB of quality gain and also up to 51% of temporal flickering reduction, compared to the well-known DISCOVER system.

### Keywords

Distributed video coding Wyner-Ziv theorem Low-density parity-check accumulate codes Flickering## 1 I ntroduction

In the past, the development of video coding techniques aimed at the broadcasting applications (i.e., one transmitter and a plurality of receivers). Hence, a fast and low-cost design of a video decoder at the receiver part is emphasized. This implies that the encoder at the broadcast transmitter, which is in charge of the coding efficiency, might be subject to high computing complexity. However, this heavy-encoder light-decoder architecture is inappropriate for emerging applications, such as video surveillance systems or wireless sensor networks, where distributed video sensors are responsible for collecting video information there around and transferring them to a centralized controller. In those cases, the requirement of designing a low-complexity video encoder and a high-complexity video decoder overthrows the traditional concept. This encourages the development of distributed video coding (DVC) [1] that is based on the famous Slepian-Wolf (SW) [2] and Wyner-Ziv (WZ) [3] theorems. The principal idea behind DVC is to encode correlated sources *X* and *Y* separately, while to decode them jointly with a decoded source (denoted as *Ŷ*) being probably processed and then considered in decoding the other source (i.e., *X*) as the side information (denoted as *X*′). The decoder for the source *X* differs from the traditional one (e.g., H.264/AVC video decoder) in an aspect that *X* is channel-coded and parity bit information of *X* is transmitted to the receiver for error decoding with the side information *X*′. In this way, the dominant computational loading is shifted from the encoder to the decoder. Although many applications of distributed source coding (DSC) seem practical, most DVC systems suffer from degraded coding efficiency compared to H.264/AVC. Currently, many researches are devoted to improving the coding performance of DVC systems. For example, to fulfill WZ decoding, a statistical correlation noise modeling (CNM) [4], also considered as a virtual channel, is proposed to describe the dependency between a source and its side information in WZ coding. A DVC decoder usually contains a function module to generate side information of *X* from the other decoded source *Ŷ*. Basically, the efficiency of WZ coding is dominated by the accuracy of both the generated side information and the CNM. The more accurate these two estimates are, the fewer parity bits are needed to reconstruct *X* and better rate-distortion (R-D) performance can be achieved. Many research works are dedicated to generate accurate side information. The motion-compensated temporal interpolation (MCTI) [5, 6] is a sophisticated technique to produce the side information at the DVC decoder, and it would result in a reduced amount of parity information requested by the decoder through a feedback channel. Moreover, an improved fusion of global and local side information at the decoder side is also proposed in [7]. The review of several important side information generation methods can be found in [8]. Additionally, new applications of DVC in [9, 10] are inspired with these investigations.

Typically, WZ coding is mainly realized by channel codes with capacity near the Shannon limit such as turbo codes [11, 12] and low-density parity-check (LDPC) codes [13]. In a practical DVC codec, the structure of channel codes needs to be adapted to the varying statistics in the CNM. The first work about LDPC-based rate-adaptive codes which can be utilized in DSC is the construction of low-density parity-check accumulate (LDPCA) codes [14], in which a number of accumulated bits is sent to the decoder to accomplish the WZ decoding according to varying correlation noises. The corresponding LDPC-based systems outperform the turbo-based schemes in terms of compression performance. However, several techniques, considering flexibility and granularity of coding units (e.g., a hybrid block-frequency architecture [15]), implement WZ coding with block-based turbo codes. This work, though adopting the LDPCA code which has been used in many frame-based DVC schemes, however, determines coding modes on block basis (instead of frame basis). This is advantageous for maintaining good coding performance on one hand and reducing flickering artifact on the other hand, which will be discussed later in this paper.

Generally, the encoder lacks *a priori* knowledge about the number of parity bits required at the decoder to reconstruct *X*. Hence, parity bits are generated in batches, partitioned into packets, and stored in encoder buffers. Based on the side information *X*′ generated from *Ŷ*, the decoder requests parity bits packet by packet via a feedback channel until the success of channel decoding of *X*. This indicates a delay, and channel decoding is performed iteratively with newly requested parity bits. Such procedures inevitably result in a huge computing load at the decoder. Several works suggest realizing bit rate estimation at the encoder [16–19]. A rate estimation based on the residue between the current frame and the side information is proposed [16], where the parity bits are allocated dynamically according to the motion activity. An entropy-based encoder rate estimation in transform domain is also proposed [17], where the Laplacian correlation model is used to estimate the entropy. Robustness of the rate estimation is improved by the use of an error detection method based on cyclic redundancy checksum (CRC).

Temporal smoothness plays an important role for the human visual system in evaluating visual quality of a video. A decoded video sequence with lower flicker is more preferable, whether for distributed or conventional video coding systems. Existing DVC systems [18, 19] mostly divide the whole video sequence into predefined groups of pictures (GOPs) for encoding, where the GOP size is fixed and frames in a GOP are organized as one intra-coded key frame (denoted as KF) followed by others which are WZ-coded. In general, KFs are intra-encoded based on a MPEG- or H.264/AVC-like standard, and WZ frames (denoted as WZFs) are based on channel coding techniques (e.g., LDPCA or turbo codes). To reconstruct WZFs at the decoder, decoded KFs act as the sources for side information generation (via, e.g., interpolation techniques). In general, temporal correlation between consecutive frames may be substantially high for static scenes and low for dynamic ones. Fixed and improper selection of the coding modes (KF or WZF) in a sequence incurs not only temporal flickering artifacts, but also degradation of the coding efficiency. Thus, the frame-based DVC schemes usually suffer from noticeable temporal fluctuation due to image quality unbalance caused by different coding methods between KFs and WZFs. Traditional KF/WZF coding schemes also cannot adapt to frames of varying content. Therefore, the fixed GOP structure is not nimble for varying video content, and an adaptive framework to overcome the above shortcoming is imperative. To deal with this situation, block-based, instead of traditional frame-based, WZ video coding schemes are proposed [20, 21]. In our preliminary work on a pixel-domain DVC system [21], each block in a frame is categorized into a key block (denoted as KB) or a WZ block (denoted as WZB) after taking into consideration its temporal and spatial statistics. A better coding performance compared to conventional frame-based DVC schemes can be achieved, owing to the dynamically spatial key block decision adapting to varying content in each frame. Another advantage is to provide enhanced temporal consistency owing to the dynamically temporal key block decision adapting to varying content along the sequence. That is, finer granularities in both the spatial and temporal directions can be achieved in determining the coding modes. In this paper, the prior DVC scheme with block mode decision is further extended to the transform domain, which ensures better temporal consistency and hence further improves visual experience during the playback of the reconstructed video. Some comparisons with the distributed coding for video services (DISCOVER) codec [22], as well as a more complete and rigorous performance analysis of the proposed scheme, are addressed in this paper.

The rest of this paper is organized as follows. Section 2 overviews DVC systems based on flexible GOP structures in the literature. Section 3 provides details about the proposed framework which is based on the concept of temporal group of blocks (TGOB). Section 4 presents our experiment results with comparisons to the DISCOVER codec [22] in R-D performance and resulting temporal flickering artifacts. Finally, Section 5 draws a conclusion and future work.

## 2 DVC system with flexible GOP structure

In conventional video codecs (e.g., MPEG-X/H.26X/H.264), a GOP structure is predefined to consist of an intra-coded frame and several inter-coded frames that follow. The main advantages of such a GOP structure are to avoid the propagation of channel errors due to motion compensation and to offer random access of each distinct GOP. However, this kind of fixed GOP structure is not suitable for DVC systems that aim to provide a simple encoder, where exhaustive exploration of temporal redundancy is not allowed.

### 2.1 Conventional frame-based DVC scheme

### 2.2 Block-based DVC scheme

The frame-based WZ video codec with fixed GOP size assumes spatially stationary correlation noise statistics between a WZF and its side information generated by exploiting nearby KFs. This assumption however degrades accuracy of the side information for regions of unpredictable instances, like active motions and dis-occlusions, hence resulting in more parity bits required for decoding WZF.

In [23], the GOP coding structure is retained, but intra-coding is possibly performed for some highly dynamic blocks for non-key frames. Henceforth, block-based encoding with mixed modes (intra/WZ) is conducted in their work. On the other hand, Puri et al. [20] classify blocks by estimating their temporal correlation noises and enable different levels of channel coding accordingly. In contrast to traditional frame-based coding schemes, the above two methods perform WZ coding (based on, e.g., turbo code) locally on the block basis. These, though solving the problem of spatial non-stationarity, still retain the problem of low coding efficiency for static regions (e.g., backgrounds) in KFs. The mixed block-based intra/inter mode selection in traditional video coding standards motivates us to explore both temporal and spatial granularity to further improve the coding efficiency and flickering artifacts in traditional DVC systems.

## 3 P roposed framework

**Notations of symbols**

Notation | Definition |
---|---|

| Predefined upper bound for the length of a TGOB |

TD | Temporal difference between the current block and the preceding KB |

| Threshold of TD |

| Spatial variance of a block |

| Threshold of |

| Temporal distance between the current block and the preceding KB |

| Number of bit planes selected for transmission of the DCT coefficients in the |

| Residual between a WZB and its corresponding side information |

| Parameter of Laplacian distribution |

QP | Quantization parameter for intra-coding |

QM | Quantization matrix for WZ coding |

| Deviation of a coefficient from the |

| Mean of coefficients in the |

| Variance of coefficients in the |

### 3.1 TGOB structure

*σ*

^{2}:

where (*i*, *j*) indicates the block location, *t* is the time index, (*m*, *n*) represents a pixel location in the current block *B*
_{
i,j,t
}, *x*
_{
t
} is the pixel value, *M* is the number of pixels within a block, and *d* is the temporal distance between the current block and the preceding KB. Two thresholds, *T*
_{
t
} and *T*
_{
s
}, are given for TD and *σ*
^{2}, respectively. Blocks with large TD or small *σ*
^{2} are identified as KBs for intra-coding, due to the observation that blocks with higher motion activity or higher spatial correlation are more suitable for intra-coding. Note that in our DVC system, side information is generated by interpolation from two bounding KBs. The accuracy of interpolation can be guaranteed if these two bounding KBs are similar in content. Hence, the block preceding (in time) the one satisfying the TD or *σ*
^{2} condition should be intra-coded, too. In view of this concept, three coding cases are designed in our scheme, including WZB mode, KB mode, and double-KB instance. The difference between the latter two cases is that two consecutive co-located blocks in time are assigned as KBs for the double-KB instance. Consecutive co-located WZBs in temporal direction, together with the preceding KB, form a TGOB. The TGOB size is at least 1 (i.e., containing only a leading KB and no WZBs) and large up to a predefined upper bound *U*
_{TGOB} to avoid a long delay during side information generation at the decoder.

*T*

_{ t }, the current block and its co-located one in the previous frame are both identified as KBs (denoted as KB2 and KB1, respectively, i.e., the double-KB instance). Since a large motion or scene change occurs at KB2 (current block), KB1 (in the previous frame) will be paired with its preceding KB, while KB2 will be paired with another succeeding KB, both for side information generation. If TD is not higher than

*T*

_{ t }, the current block might be WZ-coded. However, we have to check whether the TGOB size achieves the upper bound

*U*

_{TGOB}. If yes, a KB is assigned so that the TGOB size constraint is satisfied and a long delay is prevented. If not, a test on

*σ*

^{2}is subsequently performed. For blocks whose variances

*σ*

^{2}are small, they are identified as KBs, as mentioned earlier. Otherwise, a WZB mode is identified. Note that the WZB mode is assigned if both tests on TD and

*σ*

^{2}in Figure 2 are satisfied and the length bound

*U*

_{TGOB}is not achieved. All the block coding modes (WZB or KB) are recorded as a block mode map (BMM) for each frame and transmitted to the decoder for reconstruction guidance.

KBs will be H.264/AVC intra-coded, while WZBs will be channel-encoded. Each WZB is decoded with the help of side information generated by bilinear interpolation of two co-located and reconstructed KBs that precede and succeed the current WZB, respectively. Different from our previous work [21], the double-KB instance is newly introduced in this new BMD process and aims at enhancing the side information quality for WZBs (i.e., ensuring that the two KBs used for side information generation are similar in content). Note that the goal of the upper bound for TGOB size is to avoid a long delay during side information generation at the decoder. Consequently, the TGOB size will be small for dynamic regions and large for static content.

*U*

_{TGOB}is set to be 4. In this sequence, the first and the last frames are all intra-coded. The (0, 1)th block in frame #3 is identified as KB2 in the double-KB instance, and the co-located block in frame #2 is forcedly changed to be a KB1 though it is initially identified as a WZB. Hence, the series of TGOB sizes at block position (0, 1) is {2, 1, 4, 1}; the side information for corresponding WZBs in frame #1 and frames (#4, #5, #6) are generated from pairs of KBs in frames (#0, #2) and (#3, #7), respectively.

Fine granularity in both the spatial and temporal domains provides a dynamic switch between WZBs and KBs, which offers better visual experience in terms of quality smoothness by reducing fluctuations between consecutive frames. This will be demonstrated in later sections.

### 3.2 TGOB-based DVC encoder

#### 3.2.1 New BMD

The BMM determined based on the decision rules of Figure 2 is sent to the BMM queue at the DVC decoder. The load for BMM transmission is light with only 1,485 bits per second (QCIF format at a frame rate of 15 Hz and 16 × 16 block size). The identified KBs are, on one hand, sent for intra-coding and, on the other hand, replaced with zero blocks, which are then combined with the quantized DCT coefficients of WZBs for channel coding. Note that our DVC system, though performing BMD, still adopts frame-based LDPCA codes (see next subsection) to preserve its coding efficiency.

#### 3.2.2 LDPCA encoder

*k*th band is uniformly quantized to obtain a symbol

*q*representing one of ${2}^{{M}_{k}}$ levels, where

*k*∈ {0,1,…15} is the band index (

*k*= 0 represents the DC band and

*k*= 15 represents the highest AC frequency) and

*M*

_{ k }∈ {0,1,2,3,…8} denotes the number of bit planes selected for transmission.

*M*

_{ k }for each band is selected according to a predefined quantization matrix (QM) proposed in the DISCOVER codec [22], and each selected bit plane of each band is processed by the LDPCA encoder to produce parity bitstream which is then stored in an output buffer and partially sent on demand to the decoder in response to the feedback signal. The QMs are illustrated in Figure 5, where the number in each entry denotes the quantization level applied for each band. For example, in QM

_{8}, the DC band has 128 quantization levels and seven bit planes will be encoded as

*M*

_{0}equals 7. For these matrices, QM

_{8}results in smaller quantization errors and higher bit rates, while QM

_{1}corresponds to larger quantization errors and lower bit rates. Note that though the number of identified WZBs in each frame is varying due to non-stationary video content, the filled zero blocks for KBs make the input to the LDPCA encoder a fixed size (i.e., full-frame size). This results in LDPCA inputs of regular size, at a cost of probable sacrifice in coding efficiency (i.e., more parity bits are required for transmission). Fortunately, this increased bit rate can be partly eliminated: the parity bits fully connected to 0 bits of the KBs could be discarded during the accumulation of parity bits in the LDPCA encoder. As shown in Figure 6, three parity bits (dash squares in the second column) completely connected to the 0 bits from KBs are neither accumulated during encoding nor sent to the DVC decoder. Additionally, the inserted 0 bits will also yield a faster convergence of channel decoding due to the exactly correct side information at the decoder provided by the received BMMs (see next subsection). In our DVC system, the CRC bits for each bit plane are also multiplexed with the LDPCA bitstreams to verify the correctness of channel decoding convergence.

### 3.3 TGOB-based DVC decoder

At the DVC decoder, the intra-coded bitstream for KBs is first decoded and then sent to the decoded KB queue. With the received parity bits and the side information generated from the decoded KBs, DCT coefficients of WZBs can be decoded bit plane by bit plane by the LDPCA decoder. The reconstructor is used to reconstruct DCT coefficients from the decoded bit planes and de-group the bands before the 4 × 4 inverse DCT transform (IDCT) is applied. The reconstructed KBs and WZBs are finally combined, according to BMMs, to form consecutive frames of a complete video sequence. The details about DVC decoding are described as follows.

#### 3.3.1 BMM queue

BMM for each frame is received and stored in the queue, where it can be accessed by the LDPCA decoder and the multiplexer MUX to realize LDPCA decoding and to complete the whole frame reconstruction, respectively. Moreover, with the guidance of BMMs, each TGOB can be identified with a leading KB possibly followed by several WZBs (might be 0). Side information generation for WZBs within a TGOB can then be carried out based on the interpolation between the leading KB and the next KB that follows.

#### 3.3.2 Side information generator

Against the traditional DVC decoder which is based on MCTI techniques [5, 6] for side information generation, a simple bilinear interpolation [21] technique is adopted here to estimate WZBs based on two decoded KBs. Since each WZB needs its two bounding KBs for side information generation, a buffer (the decoded KB queue in Figure 4) of 2 · *U*
_{TGOB} - 1 frames is required at the decoder. When a WZB of the current frame is considered for reconstruction, the temporal positions of its two bounding KBs are known from the received BMM, by which decoded KB data can be accessed from the buffer and side information can be calculated. In our system, simple bilinear interpolation makes the complexity of the DVC decoder much reduced, thanks to the dynamic TGOB structure which ensures content similarity between two bounding KBs for any considered WZB.

#### 3.3.3 CNM

*r*between the original WZB and the corresponding side information block (SIB). In this paper, a similar Laplacian model is adopted for the statistics of each DCT coefficient

*c*with mean

*μ*, which is expressed as

*α*at the decoder, without knowing the original WZB. An approximation is to estimate

*r*as the half of the difference between two bounding KBs, KB

^{front}and KB

^{back}, i.e.,

where *μ*
_{
k
} and *σ*
_{
k
}
^{2} are mean and variance of $\left|\widehat{c}\left(u,v\right)\right|$'s in the *k* th band for all WZBs of the current frame. *D*
_{
k
}(*u*,*v*) measures the deviation of the coefficient $\widehat{c}\left(u,v\right)$ from the corresponding band mean. If [*D*
_{
k
}(*u*, *v*)]^{2} ≤ *σ*
_{
k
}
^{2}, $\widehat{c}\left(u,v\right)$ is considered as an inlier coefficient and *σ*
_{
k
}
^{2} is used to compute $\widehat{\alpha}$. Otherwise (i.e., $\widehat{c}\left(u,v\right)$ is an outlier), [*D*
_{
k
}(*u*, *v*)]^{2} is used instead.

#### 3.3.4 LDPCA decoder

Based on the requested parity bits transmitted from the encoder and the received BMM, each bit plane can be decoded with reference to the side information and the CNM characterizing the statistical dependency between WZBs and corresponding SIBs. This decoding process considers not only the parity bits, the temporal characteristics, and the already decoded bits in the more significant bit planes, but also those 0 bits contributed by KBs and indicated by BMM. Note that more KBs in a frame actually lead to more efficient LDPCA decoding, thanks to the reliable side information contributed by KBs. As mentioned in Figure 6a, parity bits fully connected to the 0 bits of KBs are neither accumulated nor transmitted. Hence, to realize LDPCA decoding with a regular size, these non-transmitted bits should be recovered by inserting 0's into right positions according to the received BMM, as illustrated by the dashed hexagon in Figure 6b. Moreover, transmitted parity bits which are partly connected to 0 bits of KBs are helpful for fast convergence in decoding. For example, the first accumulated parity bit, denoted as the first hexagon shown in the second column of Figure 6b, is connected to one WZB bit and three KB bits. Due to reliable 0 bits from KBs, determination of the WZB bits becomes easier during message passing iterations. Finally, CRC bits are used to confirm the convergence of LDPCA decoding.

#### 3.3.5 Reconstruction

*l*,

*u*) (indicated by the decoded quantization level

*q*′), and the side information

*x*′, used in the LDPCA decoding, are also exploited for optimal reconstruction of DCT coefficients in the sense of minimum mean square errors (MMSE) [24]. Each DCT coefficient in a WZB can be optimally reconstructed as ${\widehat{x}}_{\mathrm{opt}}$ by Equation 7:

## 4 Simulation results

*U*

_{TGOB}) of the TGOB size and thresholds for TD in Equation 1 and

*σ*

^{2}in Equation 2, namely

*T*

_{ t }and

*T*

_{ s }. Values of

*T*

_{ t }and

*T*

_{ s }were experimentally obtained and set to 1,000 and 10, respectively. An increase of

*T*

_{ t }or a decrease of

*T*

_{ s }will cause a decrease of KB percentage (or an increase of WZ percentage), which would be more suitable for sequences of stationary and simple content (e.g., indoor video surveillance applications). To make the performance comparisons fairer, the GOP size (2 ~ 10) for the DISCOVER system is determined by optimizing the coding efficiency for each individual sequence. Similarly, our proposed scheme also optimizes

*U*

_{TGOB}(2 ~ 10) of the TGOB size for each individual sequence. Configurations of QPs for KFs/KBs at specified QMs for WZFs/WZBs to achieve variable bit rates and values of selected GOP size and

*U*

_{TGOB}are listed in Table 2 for comparison between DISCOVER and proposed schemes.

**Configurations of GOP size,**
U
_{
TGOB
}
**, and (QM, QP) quantization parameter pairs for DISCOVER and proposed schemes**

Sequence | Foreman | Hall monitor | Soccer | Coastguard |
---|---|---|---|---|

GOP | 2 | 8 | 2 | 2 |

| 9 | 10 | 2 | 2 |

QM | ||||

QP (DISCOVER) | 40 | 37 | 44 | 38 |

QP (proposed) | 35 | 29 | 33 | 34 |

QM | ||||

QP (DISCOVER) | 39 | 36 | 43 | 37 |

QP (proposed) | 32 | 28 | 31 | 30 |

QM | ||||

QP (DISCOVER) | 38 | 36 | 41 | 37 |

QP (proposed) | 31 | 28 | 31 | 30 |

QM | ||||

QP (DISCOVER) | 34 | 33 | 36 | 34 |

QP (proposed) | 30 | 27 | 30 | 27 |

QM | ||||

QP (DISCOVER) | 34 | 33 | 36 | 33 |

QP (proposed) | 30 | 27 | 30 | 27 |

QM | ||||

QP (DISCOVER) | 32 | 31 | 34 | 31 |

QP (proposed) | 29 | 26 | 29 | 26 |

QM | ||||

QP (DISCOVER) | 29 | 29 | 31 | 30 |

QP (proposed) | 26 | 23 | 27 | 26 |

QM | ||||

QP (DISCOVER) | 25 | 24 | 25 | 26 |

QP (proposed) | 25 | 21 | 25 | 25 |

### 4.1 Comparison on R-D performance

**Quality gains (in BDPSNR) for the proposed DVC scheme with new BMD against other DVC schemes**

Foreman (dB) | Hall monitor (dB) | Soccer (dB) | Coastguard (dB) | |
---|---|---|---|---|

DISCOVER | 0.61 | 2.85 | 2.19 | -1.45 |

Proposed with original BMD | -0.04 | -0.06 | 0.16 | -0.03 |

H.264/AVC Intra | 0.55 | 7.18 | 0.04 | -0.28 |

The percentages of KBs for the proposed scheme are found to be 91.11% (Foreman), 29.77% (Hall monitor), 96.57% (Soccer), and 98.17% (Coastguard), which are higher, by 10.35%, 1.93%, 4.12%, and 3.38%, than when the original BMD [21] (without double-KB instance) is adopted. This increase in KB percentages, though possibly increasing the bit rates, results in a comparable BDPSNR performance (see Table 3), all thanks to the accurate and improved side information generation by forcedly inserting KB1. Combining the above KB percentage statistics, QM/QP configurations in Table 2, and PSNR figures in Table 3, it is deduced that intra-coded KBs do not necessarily enhance or degrade the coding efficiency, but the proper mode switch that determines the accuracy of side information and thus the coding efficiency.

### 4.2 Comparison on temporal flicker

#### 4.2.1 Temporal flickering reduction by enlarging GOP size

Temporal smoothness is a key factor for the human visual system to evaluate the visual quality during the playback of a video sequence. Generally, a decoded video sequence with lower flicker is more preferable, whether for conventional video coding schemes or DVC schemes. In particular, a DVC scheme will suffer from quality unbalance and thus annoying visual perception between KFs and WZFs if the quantization parameters are not well chosen. The DISCOVER system suggested eight quantization parameter pairs (QM, QP) for WZFs and KFs according to heuristic experiments which target at maximizing the R-D performance while keeping image qualities of the WZFs and KFs similar. Even though the reconstructed WZFs and KFs have similar PSNRs, it is not guaranteed to have low flicker while playing the decoded video sequences. Intuitively, flickering artifacts can be reduced if the coding mode switch is kept less frequent [26]. This can normally be accomplished by choosing a larger GOP size.

where *f*
_{
t
}(*x*) and *f*
_{
t- 1}(*x*) represent the values of the *x* th pixel in the *t* th and (*t*-1)th frame of the original video sequence, respectively, *e*
_{
t
}(*x*) represents the difference of co-located pixels between adjacent frames, and *f* ′ _{
t
}(*x*), *f* ′ _{
t - 1}(*x*), and *e* ′ _{
t
}(*x*) have similar meanings, but defined for the reconstructed video sequence. *I*
_{
i
} denotes the *i* th stationary block satisfying low-motion check with a threshold *τ*, *N* denotes the number of static blocks in a frame, card(*I*) represents the cardinality of the set *I*, and SSD_{
f
} stands for the flickering intensity between *f*
_{
t
}(*x*) and *f*
_{
t- 1}(*x*). Note that blocks of higher motion activity are excluded for flickering calculation by choosing *τ* = 500 (as suggested in [28]) in our system.

_{1}and GOP8.

**Flickering reduction for Hall monitor encoded by DISCOVER with a larger GOP size**

QM | QM | QM | QM | QM | QM | QM | QM | |
---|---|---|---|---|---|---|---|---|

GOP2 | 1,557 | 1,467 | 1,470 | 1,078 | 1,078 | 880 | 812 | 558 |

GOP8 | 533 | 636 | 637 | 557 | 557 | 512 | 519 | 426 |

Flickering reduction | 66% | 57% | 57% | 48% | 48% | 42% | 36% | 24% |

#### 4.2.2 Performance comparison on temporal flicker

_{ f }definition in Equation 8. The flickering reduction is defined as

**Comparisons on measures of flickering intensities for Foreman against DISCOVER at eight given bit rates**

Proposed DVC with original BMD | Proposed DVC with new BMD | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bit rate | 150 | 205 | 235 | 261 | 265 | 299 | 390 | 450 | 165 | 223 | 255 | 282 | 286 | 320 | 419 | 474 |

SSD (proposed) | 543 | 465 | 444 | 416 | 416 | 378 | 346 | 297 | 463 | 374 | 339 | 343 | 343 | 303 | 275 | 259 |

SSD′ (DISCOVER) | 762 | 857 | 752 | 661 | 646 | 567 | 432 | 341 | 765 | 794 | 680 | 592 | 587 | 536 | 389 | 306 |

ΔSSD (%) | 29 | 46 | 41 | 37 | 36 | 33 | 20 | 13 | 39 | 53 | 50 | 42 | 42 | 44 | 29 | 15 |

Average (%) | 32% | 39% |

**Comparisons on measures of flickering intensities for Hall monitor against DISCOVER at eight given bit rates**

Original BMD | New BMD | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bit rate | 91 | 103 | 107 | 123 | 127 | 147 | 178 | 231 | 99 | 112 | 116 | 132 | 136 | 157 | 190 | 243 |

SSD (proposed) | 350 | 349 | 349 | 347 | 347 | 338 | 331 | 319 | 347 | 343 | 343 | 342 | 342 | 332 | 327 | 314 |

SSD′ (DISCOVER) | 556 | 539 | 533 | 513 | 514 | 516 | 482 | 426 | 544 | 526 | 520 | 515 | 517 | 505 | 470 | 413 |

ΔSSD (%) | 37 | 35 | 34 | 32 | 32 | 35 | 31 | 25 | 36 | 35 | 34 | 34 | 34 | 34 | 30 | 24 |

Average (%) | 33% | 33% |

**Comparisons on measures of flickering intensities for Soccer against DISCOVER at eight given bit rates**

Original BMD | New BMD | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Bit rate | 159 | 204 | 208 | 234 | 238 | 271 | 337 | 430 | 160 | 206 | 209 | 232 | 235 | 267 | 334 | 426 |

SSD (proposed) | 560 | 548 | 548 | 551 | 548 | 472 | 461 | 319 | 477 | 457 | 457 | 467 | 467 | 491 | 428 | 433 |

SSD′ (DISCOVER) | 1,095 | 957 | 960 | 975 | 978 | 997 | 881 | 717 | 1,090 | 958 | 960 | 974 | 976 | 996 | 886 | 723 |

ΔSSD (%) | 49 | 43 | 43 | 43 | 44 | 45 | 46 | 36 | 56 | 52 | 52 | 52 | 52 | 51 | 52 | 40 |

Average (%) | 44% | 51% |

Our proposed DVC scheme presents fine coding mode (KB or WZB) granularity in both the temporal and spatial domains, hence resulting in a highly temporal smoothness, compared to the conventional frame-based schemes. In particular, the dynamic TGOB structure is capable of adapting the DVC system to varying video content so that the KB/WZB mode switch only occurs at regions of scene change or high motion. Our new BMD, introducing the double-KB instance, ensures that two bounding KBs of each WZB are similar in content and thus is seen to have an improvement against the original BMD in terms of flickering reduction.

### 4.3 Complexity analysis for DVC encoder

*α β*, and

*γ*, respectively. Assume that the average percentage of KBs in a frame is

*ρ*. The encoding time for a frame for our proposed scheme can be expressed as

Note that Equation 11 is estimated by assuming a GOP size of 2. According to experiments on our PC platform, WZ encoding actually has a lower complexity than intra-coding (*β* = 0.2647*α*) and the overhead caused by BMD is extremely low (*γ* = 0.0023*α*). From Equation 11, it is obvious that there is a significant fluctuation (switching between *α* and *β*) among frames in computing load for the conventional frame-based system. Contrarily, load fluctuation of our proposed system depends on the variation of KB percentage *ρ*, which is small compared to |*α* - *β*|/*α* ≈ 0.7353 associated with the traditional DVC encoders. Loading fluctuation between two successive frames incurs difficulties in designing a stable frame-based DVC encoder.

For the average computing load, our scheme is measured at *ρ* · *α* + *β* + *γ* = (0.2670 + *ρ*) · *α* for one frame, while the conventional DVC encoder is measured at 0.5*α* + 0.5*β* = 0.6324*α*. That is to say, our scheme will have a lower average complexity when the KB percentage *ρ* is below 36.54%. Recalling from the prior statement in Section 4.1 where *ρ* = 91.11%, 29.77%, 96.57%, and 98.17% for the four test sequences, our scheme has a lower computing complexity for the Hall monitor. For the other three test sequences, our scheme, though having a higher complexity, is beneficial in less loading fluctuation. Theoretically, to lower down the complexity, the two thresholds *T*
_{
t
} and *T*
_{
s
} of our scheme can be further fine-tuned.

## 5 Conclusions

The proposed DVC scheme based on a dynamic TGOB structure obviously adapts our system to varying video content and achieves less flickering artifact while maintaining high coding performance. The encoder in the proposed DVC system assesses the spatial-temporal properties (i.e., TD and *σ*
^{2} in Equations 1 and 2) of each image block to dynamically determine suitable modes (KB/WZB) for encoding. This fine granular, accurate, and dynamic mode switch based on blocks also simplifies side information generation at the decoder, resulting in a simple bilinear interpolation from two bounding KBs for each WZB. Further analyses reveal that our block mode decision, though might increase the computing load slightly, actually results in a less loading fluctuation along the encoding process. Experimental results show that R-D performance is enhanced and temporal flickering artifacts are reduced significantly, compared to the well-known DISCOVER system. However, our DVC design is not suitable for videos of high dynamics like Coastguard. In that case, all intra-coding could be conducted instead.

To further advance the R-D performance of this TGOB structure, block mode decision based on R-D optimization, instead of the threshold-based rule, can be considered. This R-D optimization, however, should be simplified and effective so that encoder complexity will not be increased too much. A research based on three or more modes (i.e., more than KB and WZB) is also ongoing, aiming to further enhance the R-D and flickering performances.

## Declarations

### Acknowledgments

This work was supported by the National Science Council, Republic of China, under contracts NSC-101-2221-E-194-034 and NSC98-2221-E-194-037-MY3.

## Authors’ Affiliations

## References

- Girod B, Aaron A, Rane S, Rebollo-Monedero D: Distributed video coding.
*Proc. IEEE, Special issue on advance in video coding and delivery*2005, 93(1):71-83.MATHGoogle Scholar - Slepian JD, Wolf JK: Noiseless coding of correlated information sources.
*IEEE Trans. Inform. Theory*1973, IT-19(4):471-480.MathSciNetView ArticleMATHGoogle Scholar - Wyner D, Ziv J: The rate-distortion function for source coding with side information at the decoder.
*IEEE Trans. Inform. Theory*1976, 22(1):1-10. 10.1109/TIT.1976.1055508MathSciNetView ArticleMATHGoogle Scholar - Brites C, Pereira F: Correlation noise modeling for efficient pixel and transform domain Wyner–Ziv video coding.
*IEEE Trans. Circuits Syst. Video Technol.*2008, 18(9):1177-1190.View ArticleGoogle Scholar - Ascenso J, Brites C, Pereira F: Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding. In
*Proc. of 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services*. Smolenice, Slovak Republic; 2005.Google Scholar - Ascenso J, Brites C, Pereira F: Content adaptive Wyner-Ziv video coding driven by motion activity.
*Proc. Int. Conf. Image Process.*2006, 20: 605-608.Google Scholar - Abou-Elailah A, Dufaux F, Farah J, Cagnazzo M, Pesquet-Popescu B: Fusion of global and local motion estimation for distributed video coding.
*IEEE Trans. Circuits Syst. Video Technol.*2013, 20(1):158-172.View ArticleGoogle Scholar - Brites C, Ascenso J, Pereira F: Side information creation for efficient Wyner–Ziv video coding: classifying and reviewing. Signal Process.
*Image Commun.*2013, 28: 689-726.Google Scholar - Petrazzuoli G, Cagnazzo M, Pesquet-Popescu B: Novel solutions for side information generation and fusion in multiview DVC.
*EURASIP J. Adv. Signal Process.*2013, 2013: 154. 10.1186/1687-6180-2013-154View ArticleGoogle Scholar - Chien FS-Y, Cheng T-Y, Ou S-H, Chiu C-C, Lee C-H, Srinivasa Somayazulu V, Chen Y-K: Power consumption analysis for distributed video sensors in machine-to-machine networks.
*IEEE J. Emerg. Select Top. Circuits Syst.*2013, 3(1):55-64.View ArticleGoogle Scholar - Javier G-F: Compression of correlated binary sources using turbo codes.
*IEEE Comm. Lett.*2001, 5(10):417-419.View ArticleGoogle Scholar - Aaron A, Girod B: Compression with side information using turbo codes. In
*Proc. of IEEE Data Compression Conference*. Snowbird, UT; 2002.Google Scholar - Liveris A, Xiong Z, Georghiades C: Compression of binary sources with side information at the decoder using LDPC codes.
*IEEE Comm. Lett.*2004, 6(10):440-442.View ArticleGoogle Scholar - Varodayan D, Aaron A, Girod B: Rate-adaptive codes for distributed source coding.
*EURASIP Signal Process.*2006, 86: 3123-3130.View ArticleMATHGoogle Scholar - Škorupa J, Slowack J, Mys S, Deligiannis N, De Cock J, Lambert P, Grecos C, Munteanu A, Van de Walle R: Efficient low-delay distributed video coding.
*IEEE Trans. Circuits Syst. Video Technol.*2012, 22(4):530-544.View ArticleGoogle Scholar - Badem M, Fernando WAC, Kondoz AM: Unidirectional distributed video coding using dynamic parity allocation and improved reconstruction. In
*Proc. of IEEE Int'l Conf. on Information and Automation for Sustainability (ICIAFs)*. Colombo; 2010.Google Scholar - Kubasov D, Lajnef K, Guillemot C: A hybrid encoder/decoder rate control for a Wyner-Ziv video codec with a feedback channel. In
*Proc. of IEEE Int'l Workshop on Multimedia Signal Processing*. Crete; 2007.Google Scholar - Artigas X, Ascenso J, Dalai M, Klomp S, Kubasov D, Ouaret M: The DISCOVER codec: Architecture, techniques and evaluation. In
*Proc. of Picture Coding Symposium*. Lisbon; 2007.Google Scholar - Aaron A, Rane S, Setton E, Girod B: Transform-domain Wyner-Ziv codec for video. In
*Proc. of The Visual Communications and Image Processing Conference*. San Jose; 2004.Google Scholar - Puri R, Majumdar A, Ramchandran K: PRISM: A video coding paradigm with motion estimation at the decoder.
*IEEE Trans. Image Process.*2007, 16(10):2436-2448.MathSciNetView ArticleGoogle Scholar - Tsai D-C, Lee C-M, Lie W-N: Dynamic key block decision with spatio-temporal analysis for Wyner-Ziv video coding. In
*Proc. of IEEE Int'l Conf. on Image Processing (ICIP)*. San Antonio, TX; 2007.Google Scholar -
*DISCOVER official site*. . Accessed on July 14, 2013 http://www.discoverdvc.org/cont_Codec.html - Tagliasacchi M, Trapanese A, Tubaro S, Ascenso J, Brites C, Pereira F: Intra mode decision based on spatio-temporal cues in pixel domain Wyner-Ziv video coding. In
*Proc. of IEEE Int'l Conf. Acoustics, Speech, and Signal Processing*. Toulouse; 2006:57-60.Google Scholar - Nayak J, Guillemot C: Optimal reconstruction in Wyner-Ziv video coding with multiple side information. In
*Proc. of IEEE Int'l Workshop on Multimedia Signal Processing*. Crete; 2007:183-186.Google Scholar - Bjontegaard G: Calculation of average PSNR differences between RD-curves. In
*VCEG Contribution VCEG-M33*. Texas; 2001.Google Scholar - Yang JX, Wu HR: Robust filtering technique for reduction of temporal fluctuation inH.264 video sequences.
*IEEE Trans. Circuits Syst. Video Technol*2010, 20(39):458-462.View ArticleGoogle Scholar - Fan X, Gao W, Lu Y, Zhao D: Flicking reduction in all intra frame coding. In
*ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT 5th Meeting*. Geneva; 2002.Google Scholar - Chun SS, Kim J-R, Sull S: Intra prediction mode selection for flicker reduction in H.264/AVC.
*IEEE Trans. Consum. Electron*2006, 52(4):1303-1310.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.